Page 1 of 1

Multiple lookups , same file

Posted: Mon Feb 14, 2005 4:03 am
by datastagepraba
In my project we have one big classifiction table where we store all the lookups. Is it possible to export it to a sequential file and use it as the only lookup(for all the classification codes) in all the jobs? Each record has to access the file number of times to lookup and get the corresponding codes. Or is it advisable to create multiple lookup files( one for gender , one for role etc.,) and use these as inputs to the lookup stage?

Posted: Mon Feb 14, 2005 5:27 am
by Sainath.Srinivasan
If you have the information in the same location (i.e. db, connection etc) as that of your source data, you can use SQL to combine them.

Alternatively, if the size of separate lookup (obtained as breakdown of classification types) are much smaller to be loaded in memory and accessed from there from thereon, you can download them and use as lookups.

But it is better to avoid a large lookup.

Posted: Tue Feb 15, 2005 5:56 pm
by vmcburney
I would load it into a single lookup fileset and place it on all partitions. I assume when you use it you have two key fields, one being the lookup id and the other being the lookup code. A single large lookup file is going to be easier to manage then lots of small ones.

In your lookup stage you should be able to type a code into the lookup id derivation field and link the lookup code field to the input value.

Posted: Tue Feb 15, 2005 6:28 pm
by vmcburney
Did some additional testing and discovered my recommendation wouldn't work! While server job transformers let you type in a constant into a lookup derivation field the parallel lookup stage does not. So if you have one extended lookup file your problem is going to be identifying which type of lookup code you want to get. Most methods of passing the lookup type to the lookup stage are quite messy, such as adding a new input data field with a constant in it, or concatenating the lookup type with the lookup code.

I think given the restrictions of the lookup stage you might be better off going for lots of different lookup files.

Posted: Tue Feb 15, 2005 10:21 pm
by nkreddy
Is there a possibility to add that lookup column in the transformer (before the lookup stage) and then do the lookup? We can add all the key columns from all the database lookups in addition to the columns present in the Transformer stage...

For eg:

source -----> Transformer -------> LookUp Stage ------->DataSet or SeqFile

|
|
|
LookUp FileSet
or
Database LookUp

Thanks,
NK

p.s: The LookUp FileSet or Database lookup should be linked to the LookUp Stage....I could never get the drawing straightened out...

Posted: Tue Feb 15, 2005 10:44 pm
by vmcburney
Yes, that's one of the messy workarounds! I would use the Modify stage instead of the Transformer using a new column specifications for each lookup field that are hard coded to the lookup id (PRODCODE, DEPTCODE etc). Modify may be a bit faster then a transformer but a transformer tends to be easier to use.

Posted: Wed Feb 16, 2005 9:18 am
by nkreddy
Vincent,

Would you please clarify what you meant: I wanted to use Modify Stage rather than the Transformer...just to test it.

Let us say, I have 10 columns coming from the source and none of them can be used as a lookup columns.

In that situation, I have to add the lookup column from the database lookup (say APPL_CODE -- VarChar50) to the columns that are going out of the Modify Stage.

So, I would have 11 columns coming into the LookUp Stage now...Is that right?


In the specification part of the Modify Stage, I would add

APPL_CODE:VarChar50 = APPL_CODE
(new_name = old_name)

and then enable the column propagation..


Regards,
NK

Posted: Wed Feb 16, 2005 2:17 pm
by vmcburney
In the Modify stage I would set the new field, APPL_CODE, to a constant:
APPL_CODE:VarChar50 = "PRODCODE"

This will create a new field called APPL_CODE.

Posted: Wed Feb 16, 2005 5:33 pm
by nkreddy
Vincent,

I tried and tried...but couldn't solve this problem. It works with the transformer..

The error messages are as follows:

Error when checking operator: Could not find input field "APPL_CODE"
When checking operator: Modify operator keeps field "APPL_CODE" not present in output interface

Specification: KEEP 'Column1', 'Column2'
Specification: APPL_CODE:VarChar50 = #SRC_APPL_N#

#SRC_APPL_N# is also in the lookup table....

Am I missing something here??

Thanks,
NK

Posted: Thu Feb 17, 2005 4:39 am
by vmcburney
Is APPL_CODE listed on the output columns for your Modify stage? Check on the output tab.