Hash File Build

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
marc_brown98
Premium Member
Premium Member
Posts: 67
Joined: Wed Apr 14, 2004 11:33 am

Hash File Build

Post by marc_brown98 »

I have a server job that is trying to build a HF for lookup purposes. It seems to slow down immensely as it progresses. The source file has approx. 1.2 mil records in it. I am using the type 30 dynamic hash file and the records being returned are 3 fields, 2 keys and the lookup value. The job starts out very fast but after 400k records it begins to slow and progressively slows to less than 700 records per sec. Any suggestions?

Thanks
Marc
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Up the minimum modulus from 1. What you're probably seeing is the resize by doubling effect degrade performance. Checkout this post as well:
viewtopic.php?t=85364
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
marc_brown98
Premium Member
Premium Member
Posts: 67
Joined: Wed Apr 14, 2004 11:33 am

Post by marc_brown98 »

Thanks Kenneth!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Dynamic hashed files don't resize by doubling, they resize by adding one group (logically - probably about eight group buffers at a time physically). This actually gives you more pain, as you're taking the hit of restructuring every N records loaded, where N is the number of records per group.

As Ken says, if you create the hashed file with its minimum modulus set to approximately what you'll need at the end, you take the hit of allocating this disk space up front, so that the load should proceed more quickly and at non-diminishing rate.

Do you use write caching? This, too, can help load performance.

Another possibility, if your hashed file is large, is to use a static hashed file. This is one where the disk space is necessarily pre-allocated, and you get more control over the size of groups and the hashing algorithm used. Empirical evidence suggests that these perform slightly better than the equivalent dynamic hashed file for larger sizes; the downsize is that they require more calculation and more maintenance.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
marc_brown98
Premium Member
Premium Member
Posts: 67
Joined: Wed Apr 14, 2004 11:33 am

Post by marc_brown98 »

Ray,
Thanks for your input. I do not consider this hash file to be very large, around 65 Mb, approx 1.2 mil. records. I will try to use the write caching, right now, it takes around 30 minutes of wall clock time to build this.
marc_brown98
Premium Member
Premium Member
Posts: 67
Joined: Wed Apr 14, 2004 11:33 am

Post by marc_brown98 »

Ray & Kenneth,
Thanks much for the help, the suggestions pointed me in the right direction, build time is less than 2 minutes.

Cheers
8)
Post Reply