Hi,
i have doubts in Static & Dynamic hashfiles.
- What is the difference between this two types.
- When should use Static & When should use Dynamic.
by,
Rajendran.
Hash File
Moderators: chulett, rschirm, roy
Re: Hash File
A hash file has 2 basic building parameters:
1) modulus [how many groups/bins to create - 1 ? 100 ? N (unknown)?] and
2) separation (how large is a group/bin- 2000 bytes ? 4000 bytes ?).
- What is the difference between this two types.
In dynamic hash files, you let the engine decide how many bins (modulus)are needed to accomodate your records. You just decide on the separation.
In static hash files, you can decide on both parameters. The engine will create the number bins you specify even if there is no data in them. If the bins you specify are not able to accomodate all the rows you insert, they go to an overflow space.
- When should use Static & When should use Dynamic.
Use dynamic when the number of records is unpredictable, or small (relatively) in number. (you should also consider how much will fit in a bin - I have seen people load very fat rows into hash files)
Use static when you can predict that there will be a large number or records in the hash file. If you specify a dynamic in this case, it will just take longer to build the hash file, and you get overflows when the number of bins is not enough.
I may be all wrong, but thats how I normally decide. There was an excellent post by Ken Bland on this site, but I am not able to find it now.
- Bibhu
1) modulus [how many groups/bins to create - 1 ? 100 ? N (unknown)?] and
2) separation (how large is a group/bin- 2000 bytes ? 4000 bytes ?).
- What is the difference between this two types.
In dynamic hash files, you let the engine decide how many bins (modulus)are needed to accomodate your records. You just decide on the separation.
In static hash files, you can decide on both parameters. The engine will create the number bins you specify even if there is no data in them. If the bins you specify are not able to accomodate all the rows you insert, they go to an overflow space.
- When should use Static & When should use Dynamic.
Use dynamic when the number of records is unpredictable, or small (relatively) in number. (you should also consider how much will fit in a bin - I have seen people load very fat rows into hash files)
Use static when you can predict that there will be a large number or records in the hash file. If you specify a dynamic in this case, it will just take longer to build the hash file, and you get overflows when the number of bins is not enough.
I may be all wrong, but thats how I normally decide. There was an excellent post by Ken Bland on this site, but I am not able to find it now.
- Bibhu
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Keeping it simple
There are two criteria: number of rows and maintenance.
Number of Rows
Number of Rows
- Use a static hashed file if the total number of rows is known in advance and unlikely to change very much.
Use a dynamic hashed file if the number of rows is not known in advance or it is known that the number will vary markedly between jobs.
- Static hashed files need to be correctly pre-sized, and monitored to ensure that this sizing remains appropriate, resizing if necessary.
Dynamic hashed files implement "automatic table space management" and so require much less frequent monitoring and maintenance.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Ray,
When you said number of rows, in my case i am running 4 jobs in parallel and all these jobs write to a same hash file i know each job is writing around 690000 records into the hash files and the total number of records will be 4*690000 ,but this is a daily job and numbers always change(notmuch) , i am using a static hash file in my case, but how can number of records in my case help me to use static or dynamic hash file??
and when you said about monitoring, what do you mean by it, what and how should i monitor?? is it something relalated to hash file size ??
I am confused... can you please clarify....
Thanks
Nag.
When you said number of rows, in my case i am running 4 jobs in parallel and all these jobs write to a same hash file i know each job is writing around 690000 records into the hash files and the total number of records will be 4*690000 ,but this is a daily job and numbers always change(notmuch) , i am using a static hash file in my case, but how can number of records in my case help me to use static or dynamic hash file??
and when you said about monitoring, what do you mean by it, what and how should i monitor?? is it something relalated to hash file size ??
I am confused... can you please clarify....
Thanks
Nag.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Records within hashed files (either type) are stored in "groups" (pages), the size of which is determined when the hashed file is created.
With static hashed files (and for the MINIMUM.MODULUS setting for dynamic hashed files) you need to ensure that you have enough groups to store all the records.
In dynamic hashed files, the number of groups adjusts automatically depending on the volume of data in the hashed file compared to its overall capacity.
There are many tools for monitoring; my preferred two are ANALYZE.FILE for single-shot monitoring, and ACCOUNT.FILE.STATS for periodic monitoring. The main thing to look for in monitoring is that there are enough groups; that a small enough proportion of groups is "overflowed". As a rule of thumb, I use 25% for static hashed files and 40% for dynamic hashed files.
None of the monitoring tools is documented in the DataStage manual set; you need to download the UniVerse User Reference manual from IBM.
With static hashed files (and for the MINIMUM.MODULUS setting for dynamic hashed files) you need to ensure that you have enough groups to store all the records.
In dynamic hashed files, the number of groups adjusts automatically depending on the volume of data in the hashed file compared to its overall capacity.
There are many tools for monitoring; my preferred two are ANALYZE.FILE for single-shot monitoring, and ACCOUNT.FILE.STATS for periodic monitoring. The main thing to look for in monitoring is that there are enough groups; that a small enough proportion of groups is "overflowed". As a rule of thumb, I use 25% for static hashed files and 40% for dynamic hashed files.
None of the monitoring tools is documented in the DataStage manual set; you need to download the UniVerse User Reference manual from IBM.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
analyze.file can also be executed from the operating system shell (and therefore use a pathname); the executable is in the DataStage Engine's bin directory.
ACCOUNT.FILE.STATS does require VOC pointers; search for SETFILE.
ACCOUNT.FILE.STATS does require VOC pointers; search for SETFILE.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.