Best Ideas to solve a specific performance bottleneck needed
Posted: Tue Oct 02, 2007 8:25 am
Here is my performance bottleneck.
Input = 2 billion records.
Reference Data = 10 million records.
We currently are using a lookup stage as opposed to a join stage to avoid having to sort the input 2 billion records from the mainframe file.
The Lookup Stage is set to sequential to not load 10 million records into memory for each partition we are running on. The fact that this is sequential is killing me. The job is currently looking up to the the Oracle Stage, which is doing a table read (aka Normal Lookup instead of Sparse)
What ideas do you have in order to perform this an alternative way. The only idea I can have left to try is to try using a lookupfileset. Should this help. Any other ideas?
Input = 2 billion records.
Reference Data = 10 million records.
We currently are using a lookup stage as opposed to a join stage to avoid having to sort the input 2 billion records from the mainframe file.
The Lookup Stage is set to sequential to not load 10 million records into memory for each partition we are running on. The fact that this is sequential is killing me. The job is currently looking up to the the Oracle Stage, which is doing a table read (aka Normal Lookup instead of Sparse)
What ideas do you have in order to perform this an alternative way. The only idea I can have left to try is to try using a lookupfileset. Should this help. Any other ideas?