Complex lookup against huge data

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kiranredz
Premium Member
Premium Member
Posts: 21
Joined: Wed Aug 06, 2008 8:34 am
Location: United States

Complex lookup against huge data

Post by kiranredz »

Hi,

We have a scenario where the source would have around 2 million rows, which needs to be looked up against 3 tables based on a composite key (3 key fields)

These lookups against the tables are pretty complex, I mean its not a direct lookup against the tables. There's a huge join query that return rows for the lookup. The number of rows returned by each query are around 5 million rows.

So the source data of say 2 Million rows, should be looked up against 3 references (the tables I have mentioned above - with complex join queries) where each reference would have around 5 million rows.

Currently, there's a lookup stage with 3 reference links to the Oracle Ent stage which returns data from the join queries. This is not a feasable solution, since it can run into problems of performance, insufficient memory etc.

The options I am thinking of is
- to write reference data from tables into a lookup fileset and pass it as a reference to the lookup stage in the next job (Not sure about the drawbacks, since the data is huge)
- Or use join stage.

What would be the best way to perform this functionality? Please share your thoughts
Kiran
nitkuar
Participant
Posts: 46
Joined: Mon Jun 23, 2008 3:09 am

Post by nitkuar »

I would go for join option and also would first extract the reference data in datasets in separate jobs.
Vinothbaskaran
Participant
Posts: 22
Joined: Mon Aug 25, 2008 4:01 am

Post by Vinothbaskaran »

nitkuar wrote:I would go for join option and also would first extract the reference data in datasets in separate jobs.
I would strongly recommend join option if the volume is high.
Look ups should be carefully used in the design, otherwise it could be a nightmare to support when the volume goes up fro reason in the future.

More of a personal experience though :cry:
Vinothbaskaran
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

how about sparse lookup.
kiranredz
Premium Member
Premium Member
Posts: 21
Joined: Wed Aug 06, 2008 8:34 am
Location: United States

Thanks!

Post by kiranredz »

Hi all, thanks for your help.
We loaded the lookup data into datasets and used the join to perform the lookup operation.

Thanks!
Kiran
Post Reply