look ups in parrelel jobs

sudheepkv · Post by **sudheepkv** » Fri Jan 20, 2006 3:44 am

Could You help me in the following issue in parrelel job

I have .
file A contains 1 millon records
file B,contains 1000 records.
Oracle Table C contains 1000 records
I want to select some columns from A,B and C based on a condition
A.id=B.id and B.id=C.id

1.Could you suggest the best way of choosing lookups ?
2.Is there any performance bottlenecks when we choose heterogenious lookups ( 1 in Flat file and 1 in oracle)
3.Is there any advatage of chhoosing lookup_file set as lookup/s?
4.There will be some chance of addition of new records to oracle table
how can i make the changes visible in look up automatically.

kumar_s · Post by **kumar_s** » Fri Jan 20, 2006 4:22 am

Hi,
You can use the File B and the oracle stag as reference links for lookup. Less volume data should be chosed as reference data. Because this will be loaded in to the memory during lookup.
Yes you can chose the lookup file set for the file B, which can improve the performance. Since the oracle is slowly changing, if the lookup file set is use, then it should be updated when ever the data is changed. Else it can be used directly.

-Kumar

keshav0307 · Post by **keshav0307** » Sat Jan 21, 2006 2:11 am

if source data is large and reference data is samll, then normal lookup is faster

vmcburney · Post by **vmcburney** » Sat Jan 21, 2006 4:02 am

Anything under 10,000 lookup reference rows is quite small and will fit into memory so no need to go to the effort of preloading lookup filesets or datasets, just turn your 1000 row sources into straight lookup sources. Lookup stage should be faster then the join or merge options.

Very tricky to modify your Oracle lookup while you use it. On the one hand with a normal lookup the entire 1000 rows is loaded into memory at the job startup time so any changes to the Oracle table will be ignored. Even with sparse lookups it's tricky due to the speed of the nodes at adding rows.

In Hawk there is something called up updatable in memory lookup that will help you but in the current release it's just too difficult. You are better off putting all your Oracle lookup inserts into a dataset (via a remove duplicates stage) and then loading it into the Oracle table after your job has finished.