Dynamic Hash file still splitting !

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
hamzaqk
Participant
Posts: 249
Joined: Tue Apr 17, 2007 5:50 am
Location: islamabad

Dynamic Hash file still splitting !

Post by hamzaqk »

Hi all , i am generating a dynamic HF with the folowing settings. Min modulus= 1, group Size=1, Split load 80, Merge Load =50 etc. the problem is that even though the file is defined as dynamic, I was surprised to notice that even for a small amount of data (190 rows), the overflow file was being created with a split of 12 kb for the Data File and 4 Kb for the Overflow file.

Any one with any nice ideas on why this is happening ? i won't mind a small explanation of the parameters too :~)
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Many factors at work here. How 'fat' are your records? Why a minimum modulus of 1? There have been a number of posting on hashed file details - Ken Bland had an extensive article in a newletter you should be able to find from the home page. Someone else posted this link which should help. And then there's the HFC or Hashed File Calculator on your Client CDROM, something you should install and check out. It will help you 'presize' your hashed files properly.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

An empty hashed file (one with zero records) will have a header in each of DATA.30 and OVER.30. The header is 2KB in size unless GROUP.SIZE is set to 2, in which case the header is 4KB in size.

You hashed file is therefore using only one buffer in OVER.30. This might be an overflowed group, it might be an oversized record, or it might be the SICA if you used DataStage/SQL to create the hashed file as a table.

An analysis tool such as ANALYZE.FILE will determine which of these it is.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
hamzaqk
Participant
Posts: 249
Joined: Tue Apr 17, 2007 5:50 am
Location: islamabad

Post by hamzaqk »

Well thanks ... .i did not know abt the HFC before.. anyways i am using the default values which seem to be ok ..... there are only 190 records in the file(approx 5-10kb) which shouldnt take much space... and as the number of groups (modulus) is defined to 1 and its size (Group size) is set to 1 (i.e. 1024 kb) . the file should not reach the load percentage of 80% , thus should not cause it to split ? as it only consuming 5-10kb of the group size defined so i see no reason it should split
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

GROUP.SIZE is in units of 2KB. Definitely not 1024KB.

There is a storage overhead of three four-byte pointers per record. Straight away there's (190 x 12 = 2280 bytes).

On average, a group will only fill to the SPLIT.LOAD threshold, so that each group buffer will contain only approximately (2048 x 0.80 = 1628 bytes).

So, ignoring overflow (that is, assuming a perfectly tuned hashed file) I would expect your DATA.30 file to be (((10KB + 2280 bytes) / 1628 bytes/group) x 2048 bytes) in size. Or modulus 14, or approximately 28KB.

This is a rather simplified analysis, assuming perfect tuning, record sizes that are always multiples of four bytes, and no overflow or oversized records.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply