Hash File Performance

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
DSbox61
Premium Member
Premium Member
Posts: 62
Joined: Mon May 22, 2006 10:18 am

Hash File Performance

Post by DSbox61 »

Hi ppl,

I am using hash file in my jobs but i got a very poor performance. I am running jobs that would work for initial as well as delta load.

DB2===>Tx==>DB2
^
||
||
DB2==>Hash FIle

I am fetching about 600K records from Mainframe and doing a check if it is existing in my DB2(based on key columns). If yes then they go for updates and if not they go for inserts. This i have to do everytime to fetch the delta. I end up comparing 600K records on hash File even though there are on 4K records for delta. The hash file gets loaded with 600K records evrytime and I also see that this has an effect on performance. For other users even compiling a job is taking ages.

Is ther anything i can do to improve performance here keeping the same design.

Thanx folks....
DSbox61
Premium Member
Premium Member
Posts: 62
Joined: Mon May 22, 2006 10:18 am

Post by DSbox61 »

sorry folks the design is wrong

here is the corrected one.

DB2===>Tx===>DB2
^
||
||
DB2==>Hash File
DSbox61
Premium Member
Premium Member
Posts: 62
Joined: Mon May 22, 2006 10:18 am

Post by DSbox61 »

:? ...i'm getting it wrong when i submit the post.......i better put it in words....i'm doing a look up with hash file in transformer stage and not in DB2...........
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

DSbox61, I don't see a design difference between your original post and the correction; it seems like you aren't using the hashed file for reference, just loading it.

Why are you re-loading the hashed file completely each run? It doesn't look like performance of hashed files is the issue, but that the job design precludes efficient running and short runtimes.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I didn't see your 3rd post until I replied. I'm still a bit confused. If you remove the hashed file lookup from you job and ran it you would see the speed with which you can read from DB2. How many rows a second is that? If you then add the lookup, what rows/second does that change to? Are you pre-loading your hashed file to memory (you should be, if possible)?
DSbox61
Premium Member
Premium Member
Posts: 62
Joined: Mon May 22, 2006 10:18 am

Post by DSbox61 »

I'm able to fetch at more than 1000 records/second from DB2 and the same while loading the hash file. More than 1000 records/second is fine with me. But, if i try to compile another job while the hash file is getting loaded, then it takes a long time. Any other activities such as taking a look at logs and all is also taking a long time.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

So the Hashed file lookup isn't your bottleneck.

Sounds like your machine is being overloaded when this job is running. Have you or your system administrator checked to see what is causing the machine to slow down? If you are runnin both DB2 and DataStage on the same server I can guess why you are getting slowdowns. Is the system's CPU at capacity or is the box swapping?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use Code tags to make your design clear.

Code: Select all

     DB2  -------->  HashedFile
                            :
                            : 
                            V
     DB2  -----> TransformerStage  ----->  DB2

You are populating, then using, the hashed file in the same job. The rows/sec figure on the hashed file output link is totally misleading, because its clock has been running since the job started and because it reports only the rows sent in that direction (lookup succeeded), further invalidating rows/sec because the clock is running even though no rows are being processed. Do not use rows/sec as a measure of performance in this case.

What is the elapsed time of the job without and with the hashed file? What is the elapsed time if you populate the hashed file in a separate job? (In your present design, the DB2-to-DB2 stream has to wait until the hashed file loading stream finishes.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply