LOOKUP HELP

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kumar444
Premium Member
Premium Member
Posts: 62
Joined: Wed Jan 12, 2011 10:01 am

LOOKUP HELP

Post by kumar444 »

I have 26 million rows in oracle refrence table and around 2 lakh rows in a source (in my case ouput from a join). I need to implement a left outer join with driving table as a source(in my case ouput from a join). Is it better to go for a join stage with left outer condition or a lookup stage? If lookup stage it should be a sparse lookup or normal lookup?

Appreciate any help to sovle this issue.
ThilSe
Participant
Posts: 80
Joined: Thu Jun 09, 2005 7:45 am

Post by ThilSe »

I would suggest you to go thorugh the DS manual for these stages.

To answer your question Lookup stage is used only when reference data is small (less than 20000 records in my dictionary) and can fit in memory.

Thanks,
Senthil
kumar444
Premium Member
Premium Member
Posts: 62
Joined: Wed Jan 12, 2011 10:01 am

Post by kumar444 »

Thanks Senthil. But how abt sparse lookup?
Is it possible to make reference table as a source and vice versa and do a lookup.
mavrick21
Premium Member
Premium Member
Posts: 335
Joined: Sun Apr 23, 2006 11:25 pm

Post by mavrick21 »

kumar444,

I would use sparse lookup.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Join is likely the better approach. Sparse lookup is slow, particularly when the database server is not on the same machine as the DataStage server.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mavrick21
Premium Member
Premium Member
Posts: 335
Joined: Sun Apr 23, 2006 11:25 pm

Post by mavrick21 »

Ray,

The reference table has 26 million rows. Based on your reply I suggested Sparse lookup
kumar444
Premium Member
Premium Member
Posts: 62
Joined: Wed Jan 12, 2011 10:01 am

Post by kumar444 »

Thanks Guys.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

200000 sparse lookups (2 lakh) is quite a lot. If you follow that route, you should ensure that the reference table is properly indexed and that statistics are kept up to date.

What would the size of one of your lookup records be (average size of it contains varchars)? Size of key columns + size of data columns. You may be able to support a standard lookup instead of a sparse and possibly see better overall performance. Only way to know for certain is to try...it'll basically be down to the time it takes to extract the table compared to performing 200000 sparse lookups.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Join (preceded by "don't sort, already sorted" Sort stage and relying on sorted data from the database only grabs one key value at a time from each link. Indeed, that's why it mandates sorted inputs; it would be horribly memory hungry otherwise, needing to load the entire right input into memory and probably build an index on the join keys.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar444
Premium Member
Premium Member
Posts: 62
Joined: Wed Jan 12, 2011 10:01 am

Post by kumar444 »

I have tested this case with join(left outer) and with lookup stage
(condition not met=continue,lookup failure=continue) .
In both of these cases the resulting records is not equal.

Suppose if have 1000 records in driving table and 2000 records in the right table. Join with left outer condition returns more than 1000 records (this is possible) but with lookup its returning only 1000 records. Do we have to change any lookup property to get desired result as in the case of left outer join.
mavrick21
Premium Member
Premium Member
Posts: 335
Joined: Sun Apr 23, 2006 11:25 pm

Post by mavrick21 »

Yes. Sorry I don't have access to DataStage Parallel jobs but here is what I remember - It's in the link properties where you can specify if lookup fails.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Lookup with Continue will only return one row from the reference input unless you enable "multiple rows returned" on that input. This can only be enabled on one reference input per Lookup stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar444
Premium Member
Premium Member
Posts: 62
Joined: Wed Jan 12, 2011 10:01 am

Post by kumar444 »

Thanks Ray.
ThilSe
Participant
Posts: 80
Joined: Thu Jun 09, 2005 7:45 am

Post by ThilSe »

Hi Kumar,

Can you please let us know if you found any performance differences between the lookup and join solutions?

Thanks,
Senthil
kumar444
Premium Member
Premium Member
Posts: 62
Joined: Wed Jan 12, 2011 10:01 am

Post by kumar444 »

Senthil,
In my case join won the race. Lookup consumed more time then join.
As per the volume of data mentioned above in my topic, join proved to be efficient.
Post Reply