Hi,
We have a scenario where the source would have around 2 million rows, which needs to be looked up against 3 tables based on a composite key (3 key fields)
These lookups against the tables are pretty complex, I mean its not a direct lookup against the tables. There's a huge join query that return rows for the lookup. The number of rows returned by each query are around 5 million rows.
So the source data of say 2 Million rows, should be looked up against 3 references (the tables I have mentioned above - with complex join queries) where each reference would have around 5 million rows.
Currently, there's a lookup stage with 3 reference links to the Oracle Ent stage which returns data from the join queries. This is not a feasable solution, since it can run into problems of performance, insufficient memory etc.
The options I am thinking of is
- to write reference data from tables into a lookup fileset and pass it as a reference to the lookup stage in the next job (Not sure about the drawbacks, since the data is huge)
- Or use join stage.
What would be the best way to perform this functionality? Please share your thoughts
Complex lookup against huge data
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 22
- Joined: Mon Aug 25, 2008 4:01 am
I would strongly recommend join option if the volume is high.nitkuar wrote:I would go for join option and also would first extract the reference data in datasets in separate jobs.
Look ups should be carefully used in the design, otherwise it could be a nightmare to support when the volume goes up fro reason in the future.
More of a personal experience though
Vinothbaskaran
-
- Premium Member
- Posts: 783
- Joined: Mon Jan 16, 2006 10:17 pm
- Location: Sydney, Australia