HASH File between read and write improved performance Why?
Posted: Tue Aug 27, 2013 8:03 am
Hi Everyone,
Just wanted to see what people thought of this. I have a Datstage 8.1 Server job. The source is Oracle, and the target is Oracle. The source and target are on 2 different DB servers. The job reads from a DRS Stage, and then looks up against a Hash file to determine Inserts, and updates.
Based on what it finds it will either Insert via a DRS stage, or Update via a different DRS stage. Here is the funny part the job will read roughly 380,000 rows from input, and it looks up against the Hash file which contains 4 million entries. If there are no inserts/updates to be done the job finishes in seconds. Woohoo! If there is even 1 insert the job will run for 30 minutes. Boo!
So what I did was add a Hash file stage just after the lookup is done, and prior to where the split is for Inserts, and updates. I thought a "Hard Stop" would prevent both connections from being open at the same time. Now there is no problem, even with 5,000 inserts the job only takes seconds which is what I expected for such a low volume of data.
Does any one have any thoughts on why this would be? As it stands I happy that the job runs very fast, but we have hundreds of jobs that all use this same pattern, and I do not want to have to change them all. The pattern is Datastage Maps that are part of the Peoplesoft EPM package.
I look forward to seeing your replies.
Just wanted to see what people thought of this. I have a Datstage 8.1 Server job. The source is Oracle, and the target is Oracle. The source and target are on 2 different DB servers. The job reads from a DRS Stage, and then looks up against a Hash file to determine Inserts, and updates.
Based on what it finds it will either Insert via a DRS stage, or Update via a different DRS stage. Here is the funny part the job will read roughly 380,000 rows from input, and it looks up against the Hash file which contains 4 million entries. If there are no inserts/updates to be done the job finishes in seconds. Woohoo! If there is even 1 insert the job will run for 30 minutes. Boo!
So what I did was add a Hash file stage just after the lookup is done, and prior to where the split is for Inserts, and updates. I thought a "Hard Stop" would prevent both connections from being open at the same time. Now there is no problem, even with 5,000 inserts the job only takes seconds which is what I expected for such a low volume of data.
Does any one have any thoughts on why this would be? As it stands I happy that the job runs very fast, but we have hundreds of jobs that all use this same pattern, and I do not want to have to change them all. The pattern is Datastage Maps that are part of the Peoplesoft EPM package.
I look forward to seeing your replies.