HASH File between read and write improved performance Why?

epsidude · Post by **epsidude** » Tue Aug 27, 2013 8:03 am

Hi Everyone,

Just wanted to see what people thought of this. I have a Datstage 8.1 Server job. The source is Oracle, and the target is Oracle. The source and target are on 2 different DB servers. The job reads from a DRS Stage, and then looks up against a Hash file to determine Inserts, and updates.

Based on what it finds it will either Insert via a DRS stage, or Update via a different DRS stage. Here is the funny part the job will read roughly 380,000 rows from input, and it looks up against the Hash file which contains 4 million entries. If there are no inserts/updates to be done the job finishes in seconds. Woohoo! If there is even 1 insert the job will run for 30 minutes. Boo!

So what I did was add a Hash file stage just after the lookup is done, and prior to where the split is for Inserts, and updates. I thought a "Hard Stop" would prevent both connections from being open at the same time. Now there is no problem, even with 5,000 inserts the job only takes seconds which is what I expected for such a low volume of data.

Does any one have any thoughts on why this would be? As it stands I happy that the job runs very fast, but we have hundreds of jobs that all use this same pattern, and I do not want to have to change them all. The pattern is Datastage Maps that are part of the Peoplesoft EPM package.

I look forward to seeing your replies.

ArndW · Post by **ArndW** » Tue Aug 27, 2013 8:26 am

What is your DRS setting for Transaction Isolation Level? What about the array and commit sizes - does your original job still take 30 minutes for the first row if you set 1 & 1?

The behaviour is not normal - something is odd and you shouldn't need a hashed file for temporary storage.

epsidude · Post by **epsidude** » Tue Aug 27, 2013 8:44 am

Thank you for the reply! The settings you asked about are:
Array_Size = 32,767
Trans_Size = 0

Transaction Isolation Level = Insert without clearing
or
Transaction Isolation Level = Update existing rows only

I reran the job using 1 & 1, and no Hash file, and it only took seconds to run!

Could you help me to understand why this occurred? I did not think this would have any effect.

ArndW · Post by **ArndW** » Wed Aug 28, 2013 2:08 am

You had a deadlock situation between your two output stages. Since your transaction size was "0" (meaning don't commit anything until all the data has been processed) if you have the same row in both output links they would block each other. You probably had a DB error message somewhere where the Database would have written that one of the transactions was chosen as the survivor when it did the deadlock resolution for you (after 30 minutes).