Having problems proving latest claims in "Tips: Hash fi

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ggarze
Premium Member
Premium Member
Posts: 78
Joined: Tue Oct 11, 2005 9:37 am

Having problems proving latest claims in "Tips: Hash fi

Post by ggarze »

In the article "Tech Tips: Hash Files in DataStage" the author claims that a way to tune your hash files to reduce data being written to the overflow file is to increase the 'Minimum modulus' of the hash file. I was at first real happy to see this but now I'm having a hard time getting it to happen. I've adjusted my modulus from 1 to 100 and in between with no effect on the data ditribution between the data.30(34,136 kb) and over.30(11,872 kb). He seemed to be talking about on a UNIX platform where I'm running on windows. Don't know if that matters. Am I missing something? What I did find is that if I decrease my 'Split load' regardless of what the modulus is it does start to reduce the the size of the over.30. However, am I wasting space by telling the system to keep adding space when the split level is so low. My understanding of split level is to add more space once the current space reaches the percent of the 'split load'.

Thanks :?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

ggarze,

there is no difference between the methodologies used in hashed files on Windows and UNIX. The minimum modulus is used only in type 30 hashed files and starts the file off with that modulus as opposed to the default value of 1. If your file contains enough records so that at the minimum modulus the load is more than 80% then DataStage will perform an automatic split - it sounds like this is happening in your case.

Basically using the MINIMUM.MODULUS helps with faster file loading when you start with an empty file or do a CLEAR on it. It will not change the number of overflows encountered in a type 30 dynamic file.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Up your modulus higher than 100. Your file is currently too small and is dynamically growing with the data. The point of a minimum modulus is to set the watermark high enough to where the file (1) doesn't grow and (2) doesn't need to put much data into the overflow.

You will still get data into the overflow if particular groups are full, but the point is a smaller overflow, ideally 0, but not required for a more optimal populating and reference experience.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
sun rays
Charter Member
Charter Member
Posts: 57
Joined: Wed Jun 08, 2005 3:35 pm
Location: Denver, CO

Re: Having problems proving latest claims in "Tips: Has

Post by sun rays »

ggarze wrote: My understanding of split level is to add more space once the current space reaches the percent of the 'split load'.

Thanks :?
I guess Split doesn't mean to add more space, split means when a bucket reaches its threshold level, it splits into two distributing the data between the buckets, instead of forming long chain of overflow pages.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

And that would be why when you can reliable predict the average size that the hash file will obtain every time the job runs you should set the minimum modulus high enough so that you don't incur constant hash file dynamic resizing as the hash file grows. Set the minimum high enough to improve performance without unnecessarily delaying the job at startup with creating/allocating the file to the minimum modulus size.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Still "Having problems proving latest claims in "Tips: Hash..." :?:
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply