HASHED Files and hard Drive Fragmentation

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
epsidude
Premium Member
Premium Member
Posts: 17
Joined: Fri Feb 27, 2009 10:14 am

HASHED Files and hard Drive Fragmentation

Post by epsidude »

Hi All,

Got a question regarding Hashed Files. I have been reading alot of posts here on creating them, and tuning and sadly it does not work to well in my case due to architecture of these Peoplesoft provided jobs.

However here is what I do not get. I have several hashed files that are around 1.6gb is size. Originally they were created with a modulus of 1, and group size of 1. They were badly overflowed, and got re-created each day. When I checked the hard drive for Fragmentation it was high. So I defragged, and ran the update. Once the update was done the fragmentation was back to about 15%?!?!

So I have since changed the code to pre-allocate the hashed file using a modulus of 400003, and group size of 2. Still keep it as 32bit. I unchecked the create file option so the disk space would remain.

So I defragged again, and reran the update but to my surprise it was back at 15% fragmentation.

What am I missing here? I thought the pre-allocating would sort of lock the space, and prevent it from fragging. What causes this fragmentation?

The DATA.30 is 1.6gb and the OVER.30 is 347mb. The file has 35 fields of which 20 are part of the key.

Any insight would be appreciated.
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

It is good that you allocated more space (via minimum modulus) to the hash file. That change should improve performance significantly. With most hash files you can never get rid of all fragmentation. If I remember correctly, most of that is due to overflow area used when a group exceeds the allocated space (in your case 4096 bytes).

If you are still having performance issues, Ken Bland wrote a really nice overview of Hash Files in an old DSXchange newsletter that has some good tips:

learningcenter/newsletters/200510/Content/techtips.php
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
epsidude
Premium Member
Premium Member
Posts: 17
Joined: Fri Feb 27, 2009 10:14 am

Post by epsidude »

Thank you Andy for the reply. Looks like I will need to time the defrag with the full re-create of these HF.

I did read that article and it was a huge help in my understanding. It drives me crazy that I cannot get the OVER.30 smaller. If I allocated an un-godly amount of space I was able to get it down, but it took hours to load.
Post Reply