Hi ppl,
I am using hash file in my jobs but i got a very poor performance. I am running jobs that would work for initial as well as delta load.
DB2===>Tx==>DB2
^
||
||
DB2==>Hash FIle
I am fetching about 600K records from Mainframe and doing a check if it is existing in my DB2(based on key columns). If yes then they go for updates and if not they go for inserts. This i have to do everytime to fetch the delta. I end up comparing 600K records on hash File even though there are on 4K records for delta. The hash file gets loaded with 600K records evrytime and I also see that this has an effect on performance. For other users even compiling a job is taking ages.
Is ther anything i can do to improve performance here keeping the same design.
Thanx folks....
Hash File Performance
Moderators: chulett, rschirm, roy
DSbox61, I don't see a design difference between your original post and the correction; it seems like you aren't using the hashed file for reference, just loading it.
Why are you re-loading the hashed file completely each run? It doesn't look like performance of hashed files is the issue, but that the job design precludes efficient running and short runtimes.
Why are you re-loading the hashed file completely each run? It doesn't look like performance of hashed files is the issue, but that the job design precludes efficient running and short runtimes.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
I didn't see your 3rd post until I replied. I'm still a bit confused. If you remove the hashed file lookup from you job and ran it you would see the speed with which you can read from DB2. How many rows a second is that? If you then add the lookup, what rows/second does that change to? Are you pre-loading your hashed file to memory (you should be, if possible)?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
I'm able to fetch at more than 1000 records/second from DB2 and the same while loading the hash file. More than 1000 records/second is fine with me. But, if i try to compile another job while the hash file is getting loaded, then it takes a long time. Any other activities such as taking a look at logs and all is also taking a long time.
So the Hashed file lookup isn't your bottleneck.
Sounds like your machine is being overloaded when this job is running. Have you or your system administrator checked to see what is causing the machine to slow down? If you are runnin both DB2 and DataStage on the same server I can guess why you are getting slowdowns. Is the system's CPU at capacity or is the box swapping?
Sounds like your machine is being overloaded when this job is running. Have you or your system administrator checked to see what is causing the machine to slow down? If you are runnin both DB2 and DataStage on the same server I can guess why you are getting slowdowns. Is the system's CPU at capacity or is the box swapping?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Use Code tags to make your design clear.
You are populating, then using, the hashed file in the same job. The rows/sec figure on the hashed file output link is totally misleading, because its clock has been running since the job started and because it reports only the rows sent in that direction (lookup succeeded), further invalidating rows/sec because the clock is running even though no rows are being processed. Do not use rows/sec as a measure of performance in this case.
What is the elapsed time of the job without and with the hashed file? What is the elapsed time if you populate the hashed file in a separate job? (In your present design, the DB2-to-DB2 stream has to wait until the hashed file loading stream finishes.)
Code: Select all
DB2 --------> HashedFile
:
:
V
DB2 -----> TransformerStage -----> DB2
You are populating, then using, the hashed file in the same job. The rows/sec figure on the hashed file output link is totally misleading, because its clock has been running since the job started and because it reports only the rows sent in that direction (lookup succeeded), further invalidating rows/sec because the clock is running even though no rows are being processed. Do not use rows/sec as a measure of performance in this case.
What is the elapsed time of the job without and with the hashed file? What is the elapsed time if you populate the hashed file in a separate job? (In your present design, the DB2-to-DB2 stream has to wait until the hashed file loading stream finishes.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.