Hash File

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Rajendran
Participant
Posts: 16
Joined: Wed Jul 28, 2004 7:56 am
Location: Dubai

Hash File

Post by Rajendran »

Hi,

i have doubts in Static & Dynamic hashfiles.

- What is the difference between this two types.
- When should use Static & When should use Dynamic.

by,
Rajendran.
rasi
Participant
Posts: 464
Joined: Fri Oct 25, 2002 1:33 am
Location: Australia, Sydney

Post by rasi »

Search in this forum there were so many topics discussed about this.

Rasi
bibhudc
Charter Member
Charter Member
Posts: 20
Joined: Thu Jun 19, 2003 12:26 pm

Re: Hash File

Post by bibhudc »

A hash file has 2 basic building parameters:
1) modulus [how many groups/bins to create - 1 ? 100 ? N (unknown)?] and
2) separation (how large is a group/bin- 2000 bytes ? 4000 bytes ?).

- What is the difference between this two types.
In dynamic hash files, you let the engine decide how many bins (modulus)are needed to accomodate your records. You just decide on the separation.

In static hash files, you can decide on both parameters. The engine will create the number bins you specify even if there is no data in them. If the bins you specify are not able to accomodate all the rows you insert, they go to an overflow space.

- When should use Static & When should use Dynamic.
Use dynamic when the number of records is unpredictable, or small (relatively) in number. (you should also consider how much will fit in a bin - I have seen people load very fat rows into hash files)
Use static when you can predict that there will be a large number or records in the hash file. If you specify a dynamic in this case, it will just take longer to build the hash file, and you get overflows when the number of bins is not enough.

I may be all wrong, but thats how I normally decide. There was an excellent post by Ken Bland on this site, but I am not able to find it now.

- Bibhu
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Keeping it simple

Post by ray.wurlod »

There are two criteria: number of rows and maintenance.

:arrow: Number of Rows
  • Use a static hashed file if the total number of rows is known in advance and unlikely to change very much.

    Use a dynamic hashed file if the number of rows is not known in advance or it is known that the number will vary markedly between jobs.
:arrow: Maintenance
  • Static hashed files need to be correctly pre-sized, and monitored to ensure that this sizing remains appropriate, resizing if necessary.

    Dynamic hashed files implement "automatic table space management" and so require much less frequent monitoring and maintenance.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
nag0143
Premium Member
Premium Member
Posts: 159
Joined: Fri Nov 14, 2003 1:05 am

Post by nag0143 »

Ray,
When you said number of rows, in my case i am running 4 jobs in parallel and all these jobs write to a same hash file i know each job is writing around 690000 records into the hash files and the total number of records will be 4*690000 ,but this is a daily job and numbers always change(notmuch) , i am using a static hash file in my case, but how can number of records in my case help me to use static or dynamic hash file??

and when you said about monitoring, what do you mean by it, what and how should i monitor?? is it something relalated to hash file size ??

I am confused... can you please clarify....

Thanks
Nag.
ketfos
Participant
Posts: 562
Joined: Mon May 03, 2004 8:58 pm
Location: san francisco
Contact:

Post by ketfos »

Hi,
Overall a Static Hashed File can perform up to 80% faster that an equivalently configured Dynamic Hashed File.



Ketfos
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Records within hashed files (either type) are stored in "groups" (pages), the size of which is determined when the hashed file is created.

With static hashed files (and for the MINIMUM.MODULUS setting for dynamic hashed files) you need to ensure that you have enough groups to store all the records.

In dynamic hashed files, the number of groups adjusts automatically depending on the volume of data in the hashed file compared to its overall capacity.

There are many tools for monitoring; my preferred two are ANALYZE.FILE for single-shot monitoring, and ACCOUNT.FILE.STATS for periodic monitoring. The main thing to look for in monitoring is that there are enough groups; that a small enough proportion of groups is "overflowed". As a rule of thumb, I use 25% for static hashed files and 40% for dynamic hashed files.

None of the monitoring tools is documented in the DataStage manual set; you need to download the UniVerse User Reference manual from IBM.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Both of these require a VOC record pointer to work, correct? I know the syntax is here somewhere (SET?) for those who work exclusively with pathed hashes...
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

analyze.file can also be executed from the operating system shell (and therefore use a pathname); the executable is in the DataStage Engine's bin directory.

ACCOUNT.FILE.STATS does require VOC pointers; search for SETFILE.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply