Page 1 of 1

Hash File

Posted: Sat Aug 07, 2004 6:27 am
by Rajendran
Hi,

i have doubts in Static & Dynamic hashfiles.

- What is the difference between this two types.
- When should use Static & When should use Dynamic.

by,
Rajendran.

Posted: Sat Aug 07, 2004 6:43 am
by rasi
Search in this forum there were so many topics discussed about this.

Rasi

Re: Hash File

Posted: Sat Aug 07, 2004 7:45 pm
by bibhudc
A hash file has 2 basic building parameters:
1) modulus [how many groups/bins to create - 1 ? 100 ? N (unknown)?] and
2) separation (how large is a group/bin- 2000 bytes ? 4000 bytes ?).

- What is the difference between this two types.
In dynamic hash files, you let the engine decide how many bins (modulus)are needed to accomodate your records. You just decide on the separation.

In static hash files, you can decide on both parameters. The engine will create the number bins you specify even if there is no data in them. If the bins you specify are not able to accomodate all the rows you insert, they go to an overflow space.

- When should use Static & When should use Dynamic.
Use dynamic when the number of records is unpredictable, or small (relatively) in number. (you should also consider how much will fit in a bin - I have seen people load very fat rows into hash files)
Use static when you can predict that there will be a large number or records in the hash file. If you specify a dynamic in this case, it will just take longer to build the hash file, and you get overflows when the number of bins is not enough.

I may be all wrong, but thats how I normally decide. There was an excellent post by Ken Bland on this site, but I am not able to find it now.

- Bibhu

Keeping it simple

Posted: Mon Aug 09, 2004 9:37 pm
by ray.wurlod
There are two criteria: number of rows and maintenance.

:arrow: Number of Rows
  • Use a static hashed file if the total number of rows is known in advance and unlikely to change very much.

    Use a dynamic hashed file if the number of rows is not known in advance or it is known that the number will vary markedly between jobs.
:arrow: Maintenance
  • Static hashed files need to be correctly pre-sized, and monitored to ensure that this sizing remains appropriate, resizing if necessary.

    Dynamic hashed files implement "automatic table space management" and so require much less frequent monitoring and maintenance.

Posted: Tue Aug 24, 2004 9:16 am
by nag0143
Ray,
When you said number of rows, in my case i am running 4 jobs in parallel and all these jobs write to a same hash file i know each job is writing around 690000 records into the hash files and the total number of records will be 4*690000 ,but this is a daily job and numbers always change(notmuch) , i am using a static hash file in my case, but how can number of records in my case help me to use static or dynamic hash file??

and when you said about monitoring, what do you mean by it, what and how should i monitor?? is it something relalated to hash file size ??

I am confused... can you please clarify....

Thanks
Nag.

Posted: Tue Aug 24, 2004 2:47 pm
by ketfos
Hi,
Overall a Static Hashed File can perform up to 80% faster that an equivalently configured Dynamic Hashed File.



Ketfos

Posted: Tue Aug 24, 2004 4:39 pm
by ray.wurlod
Records within hashed files (either type) are stored in "groups" (pages), the size of which is determined when the hashed file is created.

With static hashed files (and for the MINIMUM.MODULUS setting for dynamic hashed files) you need to ensure that you have enough groups to store all the records.

In dynamic hashed files, the number of groups adjusts automatically depending on the volume of data in the hashed file compared to its overall capacity.

There are many tools for monitoring; my preferred two are ANALYZE.FILE for single-shot monitoring, and ACCOUNT.FILE.STATS for periodic monitoring. The main thing to look for in monitoring is that there are enough groups; that a small enough proportion of groups is "overflowed". As a rule of thumb, I use 25% for static hashed files and 40% for dynamic hashed files.

None of the monitoring tools is documented in the DataStage manual set; you need to download the UniVerse User Reference manual from IBM.

Posted: Tue Aug 24, 2004 5:08 pm
by chulett
Both of these require a VOC record pointer to work, correct? I know the syntax is here somewhere (SET?) for those who work exclusively with pathed hashes...

Posted: Tue Aug 24, 2004 8:58 pm
by ray.wurlod
analyze.file can also be executed from the operating system shell (and therefore use a pathname); the executable is in the DataStage Engine's bin directory.

ACCOUNT.FILE.STATS does require VOC pointers; search for SETFILE.