Complex lookup against huge data
Posted: Fri Oct 08, 2010 2:17 am
Hi,
We have a scenario where the source would have around 2 million rows, which needs to be looked up against 3 tables based on a composite key (3 key fields)
These lookups against the tables are pretty complex, I mean its not a direct lookup against the tables. There's a huge join query that return rows for the lookup. The number of rows returned by each query are around 5 million rows.
So the source data of say 2 Million rows, should be looked up against 3 references (the tables I have mentioned above - with complex join queries) where each reference would have around 5 million rows.
Currently, there's a lookup stage with 3 reference links to the Oracle Ent stage which returns data from the join queries. This is not a feasable solution, since it can run into problems of performance, insufficient memory etc.
The options I am thinking of is
- to write reference data from tables into a lookup fileset and pass it as a reference to the lookup stage in the next job (Not sure about the drawbacks, since the data is huge)
- Or use join stage.
What would be the best way to perform this functionality? Please share your thoughts
We have a scenario where the source would have around 2 million rows, which needs to be looked up against 3 tables based on a composite key (3 key fields)
These lookups against the tables are pretty complex, I mean its not a direct lookup against the tables. There's a huge join query that return rows for the lookup. The number of rows returned by each query are around 5 million rows.
So the source data of say 2 Million rows, should be looked up against 3 references (the tables I have mentioned above - with complex join queries) where each reference would have around 5 million rows.
Currently, there's a lookup stage with 3 reference links to the Oracle Ent stage which returns data from the join queries. This is not a feasable solution, since it can run into problems of performance, insufficient memory etc.
The options I am thinking of is
- to write reference data from tables into a lookup fileset and pass it as a reference to the lookup stage in the next job (Not sure about the drawbacks, since the data is huge)
- Or use join stage.
What would be the best way to perform this functionality? Please share your thoughts