Hash File Performance

DSbox61 · Post by **DSbox61** » Mon Feb 12, 2007 11:32 am

Hi ppl,

I am using hash file in my jobs but i got a very poor performance. I am running jobs that would work for initial as well as delta load.

DB2===>Tx==>DB2
^
||
||
DB2==>Hash FIle

I am fetching about 600K records from Mainframe and doing a check if it is existing in my DB2(based on key columns). If yes then they go for updates and if not they go for inserts. This i have to do everytime to fetch the delta. I end up comparing 600K records on hash File even though there are on 4K records for delta. The hash file gets loaded with 600K records evrytime and I also see that this has an effect on performance. For other users even compiling a job is taking ages.

Is ther anything i can do to improve performance here keeping the same design.

Thanx folks....

DSbox61 · Post by **DSbox61** » Mon Feb 12, 2007 11:34 am

sorry folks the design is wrong

here is the corrected one.

DB2===>Tx===>DB2
^
||
||
DB2==>Hash File

DSbox61 · Post by **DSbox61** » Mon Feb 12, 2007 11:36 am

...i'm getting it wrong when i submit the post.......i better put it in words....i'm doing a look up with hash file in transformer stage and not in DB2...........

ArndW · Post by **ArndW** » Mon Feb 12, 2007 11:38 am

DSbox61, I don't see a design difference between your original post and the correction; it seems like you aren't using the hashed file for reference, just loading it.

Why are you re-loading the hashed file completely each run? It doesn't look like performance of hashed files is the issue, but that the job design precludes efficient running and short runtimes.

ArndW · Post by **ArndW** » Mon Feb 12, 2007 11:41 am

I didn't see your 3rd post until I replied. I'm still a bit confused. If you remove the hashed file lookup from you job and ran it you would see the speed with which you can read from DB2. How many rows a second is that? If you then add the lookup, what rows/second does that change to? Are you pre-loading your hashed file to memory (you should be, if possible)?

DSbox61 · Post by **DSbox61** » Mon Feb 12, 2007 12:15 pm

I'm able to fetch at more than 1000 records/second from DB2 and the same while loading the hash file. More than 1000 records/second is fine with me. But, if i try to compile another job while the hash file is getting loaded, then it takes a long time. Any other activities such as taking a look at logs and all is also taking a long time.

ArndW · Post by **ArndW** » Mon Feb 12, 2007 12:18 pm

So the Hashed file lookup isn't your bottleneck.

Sounds like your machine is being overloaded when this job is running. Have you or your system administrator checked to see what is causing the machine to slow down? If you are runnin both DB2 and DataStage on the same server I can guess why you are getting slowdowns. Is the system's CPU at capacity or is the box swapping?

ray.wurlod · Post by **ray.wurlod** » Mon Feb 12, 2007 2:17 pm

Use Code tags to make your design clear.

Code: Select all

     DB2  -------->  HashedFile
                            :
                            : 
                            V
     DB2  -----> TransformerStage  ----->  DB2

You are populating, then using, the hashed file in the same job. The rows/sec figure on the hashed file output link is totally misleading, because its clock has been running since the job started and because it reports only the rows sent in that direction (lookup succeeded), further invalidating rows/sec because the clock is running even though no rows are being processed. Do not use rows/sec as a measure of performance in this case.

What is the elapsed time of the job without and with the hashed file? What is the elapsed time if you populate the hashed file in a separate job? (In your present design, the DB2-to-DB2 stream has to wait until the hashed file loading stream finishes.)