You only have a 55% load with a modulus of 678494; try using a MINIMUM.MODULUS of 284969. Also, what does your key look like? Are the rightmost couple of character/digits evenly spread? If so, you might try using the SEQ.NUM hashing algoritm.
What does Evenly spread key mean.My key columns are an ID like an SSN and a Coverage Type. I am having 2.4 million records of this combination here the ID would be a 22 byte number and coverage type would be a 2 byte number. Also the minimum modulus that i put was computed from one of the posts I read in the forum.What does actual meean.If you look at the Data.30 and Over.30 there has been a considerable amount of overflow. Please tell me is there a way to calculate the modulus apart from using the HFC as I do not have teh install CD.
Look at the rightmost characters of your key, would they be very similar on all keys or spread out - i.e. a sequential series of numbers would be well-spread out but if the last character is always 'X' or 'Y' that would not be well distributed. I've often found that using SEQ.NUM even on a string key gives good distribution and, particularly with static hashed files, the algorithm is much more efficient.
There are so many well-thought out and descriptive posts on DSXchange regarding file sizing that I am not going to attempt to go into detail.
For dynamic files the modulus is dynamically computed according to your SPLIT and MERGE settings; using a large initial MINIMUM.MODULUS saves time required when doing SPLITs during data load, plus also pre-allocates much of the disk space used for storing the hashed file so that doesn't need to be done at runtime.
Setting the MINIMUM.MODULUS too high is not fatal, but can impact performance.
Now i got what you are saying.My combination of ID and Covg type would be teh Key.But teh last 6 digits of ID are always 0 out of (22) and covg type should be between 01 to 06. so it would be like
.....000001 here 01 is the covg type.I would try to decrease the modulus and see i fi can get any gain,bit arnd I have been working on this for 2 days and my hash file takes 3 hours to build and when i use it a as look up it takes 7 hours for the job to complete. I habe also reduced the Columns from 67 to 55.Is there any more tuning that I can do.I also enable the Stage write cache.I am not sure what else i need to do.
Vinodanand wrote:...But teh last 6 digits of ID are always 0 out of (22)
If you strip this dummy data out of your key you would save 18Mb of key space alone.
So what I need to do is to ignore this 6 bytes write out the Id and change the modulus and alogrithm to sequential and see for the same data set.I would do that to see if it helps out my performance in any way.I als feel becaus eof the overflow when I look up data it is very slow. Thanks Arnd,let me try this first.
Not exactly what I was thinking, but not really wrong, either.
First of all, if you have any 'unused' characters in the key or columns then remove them in order to reduce the file size. Delete any columns you aren't using in your lookup from the hashed file.
The GENERAL and SEQ.NUM hashing algorithms changes aren't going to make as much of a difference as even removing a couple of bytes per row.
The MINIMUM.MODULUS setting might help your write times but won't affect your read times.
Your write speed is about 200Kb per second - I don't know your system layout so cannot comment if that is a good speed or not.
Reading should be faster than writing.
How are you reading this file, i.e. a Hashed file stage doing a read? If so, do a test job with just the read stage and an output sequential file stage that is directed to /dev/null and see what the speed is. If it is high (which I suspect it will be) then your bottleneck is not the actual reading but some other stage in the job.
If this read speed remains only 1/2 of the write speed then I'd check out the hardware layer (i.e. is it on a SAN with some funky layout?).