look ups in parrelel jobs

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sudheepkv
Participant
Posts: 12
Joined: Tue Jan 03, 2006 3:03 am

look ups in parrelel jobs

Post by sudheepkv »

Could You help me in the following issue in parrelel job

I have .
file A contains 1 millon records
file B,contains 1000 records.
Oracle Table C contains 1000 records
I want to select some columns from A,B and C based on a condition
A.id=B.id and B.id=C.id

1.Could you suggest the best way of choosing lookups ?
2.Is there any performance bottlenecks when we choose heterogenious lookups ( 1 in Flat file and 1 in oracle)
3.Is there any advatage of chhoosing lookup_file set as lookup/s?
4.There will be some chance of addition of new records to oracle table
how can i make the changes visible in look up automatically.
sudhi
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Hi,
You can use the File B and the oracle stag as reference links for lookup. Less volume data should be chosed as reference data. Because this will be loaded in to the memory during lookup.
Yes you can chose the lookup file set for the file B, which can improve the performance. Since the oracle is slowly changing, if the lookup file set is use, then it should be updated when ever the data is changed. Else it can be used directly.


-Kumar
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

if source data is large and reference data is samll, then normal lookup is faster
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Anything under 10,000 lookup reference rows is quite small and will fit into memory so no need to go to the effort of preloading lookup filesets or datasets, just turn your 1000 row sources into straight lookup sources. Lookup stage should be faster then the join or merge options.

Very tricky to modify your Oracle lookup while you use it. On the one hand with a normal lookup the entire 1000 rows is loaded into memory at the job startup time so any changes to the Oracle table will be ignored. Even with sparse lookups it's tricky due to the speed of the nodes at adding rows.

In Hawk there is something called up updatable in memory lookup that will help you but in the current release it's just too difficult. You are better off putting all your Oracle lookup inserts into a dataset (via a remove duplicates stage) and then loading it into the Oracle table after your job has finished.
Post Reply