various lookup key and UV stage problem

ppalka · Post by **ppalka** » Mon Aug 22, 2005 12:40 pm

Hi,
I have to read data from a seq file to a stage which will allow me to make a lookup. This input data has a complex key of two first columns:
name nr col1 col2 ... col3
primary key: name + nr
So I would like to create Hashed file. But in later job I need to make two different lookups using different columns:
1. by name = sth and nr = 1
2. by name = sth and date1 > col1 and date1 < col2
So I tried to use UV stage(I set as key: name + nr), cause it allows to change key columns when I am making lookup in transformer stage(for each lookup I set a different key columns). Am I right? If not please correct me.
To set a key expression for a lookup table I can only make it for key columns? Can I make a key expression that accept every value for "nr" column and some value for "name"( I want not to have passing any value in key expression for "nr" column)?
And the second issue is when I load 1.5 milion records to that UV stage and use it as lookup my job stacked when trying to start. When I have not so many records everything is ok. Has UV stage any constraints on the number of records?
Please help.
Thanks.

Regards,
Piotrek

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Mon Aug 22, 2005 3:56 pm

Your assumptions are correct.

Check the space in the tmp and project dir.

kduke · Post by **kduke** » Mon Aug 22, 2005 5:02 pm

my job stacked

I am not sure what that means. You can write to a hash file and then read it with a UV stage. Some of your assuptions are correct. You can do a range lookup on a UV stage. You have to do it with User Defined SQL. Loading hash files takes a while. It is faster to load it as a hash file than as a UV stage.

A hash file needs a unique key. If name + nr is not unique then your range lookup will not work as expected because you removed some records. If it is unique then write it that way. Next just change the metadata on the UV lookup to key "Yes" on name, col1 and col2. Next modify the SQL generated from = to < and >. Easy. Try it.

There are several tips posted to improve performance of this kind of lookup and load. You need to cache your hash file. You also need to load the minimum amount of columns needed for the transform. As far as this lookup you could also use multivalued fields. This a lot more complex so make this work first. If you want to improve the lookup performance then ask again after you get this much working. We will walk you through multivalues. You might want to search for CREATE.INDEX as well.

ray.wurlod · Post by **ray.wurlod** » Mon Aug 22, 2005 6:11 pm

Think of it like this:

What you specify as "key" in the Transformer stage means "search key" rather than "primary key".

If you're using a Hashed File stage, you can only search on the entire primary key because that is the way that stage type works.

But if you're using a UniVerse stage, you end up using SQL to query the hashed file as a database table, so search key values end up bound to markers in the WHERE clause.

ppalka · Post by **ppalka** » Tue Aug 23, 2005 4:56 am

kduke wrote: I am not sure what that means.

When I start my job without UV lookup it ends after 8min. When I start my job with UV lookup it ended after 15 hours <wow>
Any ideas about what cause the problem?

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Tue Aug 23, 2005 6:26 am

Did you try using any other db? You may be able to analyze the performance with help of the DBA.

What is your disk and memory performance like?

You can do a selective load into the uv by joining the source for uv with the source for your job.

More information on the data may assist others in giving better answers.

chulett · Post by **chulett** » Tue Aug 23, 2005 6:59 am

ppalka wrote:When I start my job without UV lookup it ends after 8min. When I start my job with UV lookup it ended after 15 hours <wow>
Any ideas about what cause the problem?

I've seen the same behaviour. UV lookups can be horribly slow out of the box if you drop it over a hashed file. As Kim noted, you need to create an index on the field you are using as your key field, that will work wonders.

Search the forum, it has been discussed many (many) times and the syntax / procedure is out there. Off the top of my head you'll need two commands:

CREATE.INDEX
BUILD.INDEX

Note that with a hashed ( still takes a moment to remember to put the 'ed' at the end, Ray

) file accessed via a UV stage you can only create an index over a single field. If you need to build a composite key, it will need to start off life as a UV table, not a hashed file. You find more information about this when you search for the above commands.

ppalka · Post by **ppalka** » Tue Aug 23, 2005 8:31 am

Sainath.Srinivasan wrote:Did you try using any other db? You may be able to analyze the performance with help of the DBA.

Not yet, but I will try Oracle DB.

What is your disk and memory performance like?

I don't have such informations, but I think the server is running on very fast machine.

ppalka · Post by **ppalka** » Tue Aug 23, 2005 8:52 am

chulett wrote: I've seen the same behaviour. UV lookups can be horribly slow out of the box if you drop it over a hashed file.

But I'm not using it over hashed file. I create UV table and use it as UV lookup.

As Kim noted, you need to create an index on the field you are using as your key field, that will work wonders.

So I execute CREATE.INDEX PD_PAYMENT_DUE_HIS_UV ID CURR.NO and then BUILD.INDEX PD_PAYMENT_DUE_HIS_UV ID CURR.NO commands (the second one takes few minutes). Now I will test :D

chulett · Post by **chulett** » Tue Aug 23, 2005 8:58 am

Actually, that syntax was specifically for hashed files under a UV stage. For a UV table, the other posts on the subject would have told you to use 'normal' create index syntax...

ppalka · Post by **ppalka** » Tue Aug 23, 2005 9:02 am

chulett wrote:Actually, that syntax was specifically for hashed files under a UV stage. For a UV table, the other posts on the subject would have told you to use 'normal' create index syntax...

hmm, but this command works fine and it makes a litlle faster to execute my job (but it is still only 4-5 rows/s)
PS. Is this possible to create index in designer without using administrator?

chulett · Post by **chulett** » Tue Aug 23, 2005 9:07 am

ppalka wrote:PS. Is this possible to create index in designer without using administrator?

Nope.

ray.wurlod · Post by **ray.wurlod** » Tue Aug 23, 2005 5:14 pm

You can execute CREATE.INDEX and BUILD.INDEX via the before/after subroutine ExecTCL; coordination of these after the hashed file is created will be problematic.

Or you can create your own before/after subroutine to do it all - delete and create the hashed file and its indexes. If you do this you can use hashed file syntax (DELETE.FILE, CREATE.FILE, CREATE.INDEX) or UV SQL syntax (DROP TABLE, CREATE TABLE, CREATE INDEX).

You do not require BUILD.INDEX if there are no data in the hashed file.

Kryt0n · Post by **Kryt0n** » Wed Aug 24, 2005 7:14 pm

I have a similar issue. I have an existing hash file that doesn't actually contain the field I need (as a key) but is related (they come from the same source).

Would it be more efficient for me to add this new column to the existing hash file and when I want to do a lookup with this new column as a key, load it through a UV stage. Or should I just create a separate hash file with this column as the key?

If the UV stage is not going to cost me performance then this would be my preferred choice. However, would I be able to create the index AND build it via the Before-job subroutine?

Thanks

ray.wurlod · Post by **ray.wurlod** » Wed Aug 24, 2005 8:18 pm

Can you formulate the relationship as an expression? If so, all you need to do is to use that expression as the reference key expression in your Transformer stage.

If you are trying to say that the value you want to look up exists as a non-key column in the hashed file, your best solution is to create a separate hashed file in which this column is the key value, and access this via a Hashed File stage.

Otherwise, and less desirable, create an index on the non-key column in the hashed file and access it via a UV stage.

DSXchange

various lookup key and UV stage problem

various lookup key and UV stage problem

It's HASHED file, dammit, not hash file!