Maximum Hash file size

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Maximum Hash file size

Post by gateleys »

1. What is the maximum size of a hash file that can still give a good performance? Lets assume we have 2 to 3 fields, each of length < 10 and integer/varchar type.

2. To put from the other perspective, when would a relational stage be on par with hash file for reference? Assume the relational reference has about 10-15 fields with other properties as above.

3. What should be relation between hashfile size and physical memory of the processing system for preloading it to memory?

4. Any ways by which I can leverage a degraded hash file performance?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's hashed file, not "hash" file. A hash file is a tool for reducing the size of a block of hash.

1. What is the maximum size of a hash file that can still give a good performance? Lets assume we have 2 to 3 fields, each of length < 10 and integer/varchar type.
It depends on the operating system. Up to 19 million TB. Define "performance". For lookups I presume that's getting as close to one logical I/O operation per lookup as possible. That's achievable irrespective of hashed file size. The fastest hashed files are those that are small enough to fit into memory cache; the upper limit on the size of the cache is set by you, but can be up to 999MB.

2. To put from the other perspective, when would a relational stage be on par with hash file for reference? Assume the relational reference has about 10-15 fields with other properties as above.
If you can cache the hashed file (which depends on your settings) never.

3. What should be relation between hashfile size and physical memory of the processing system for preloading it to memory?
Hashed file cache is one of many competing demands for memory. You need to balance the competing demands. This is a "how long is a piece of string?" question.

4. Any ways by which I can leverage a degraded hash file performance?"Leverage" as in achieve?!! Hashed files can be tuned for optimum performance, primarily by reducing or eliminating overflowed groups and oversized records. However, there are usually other factors that far outweigh any slowing effect an imperfectly tuned hashed file may have. It's these upon which you ought to concentrate, and possible managing expectations so that they're reasonable.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

Hey Ray,
Thanks again for your prompt and precise response on my 'hashed' file queries.
Post Reply