Dynamic Hash file still splitting !

hamzaqk · Post by **hamzaqk** » Wed Apr 18, 2007 5:51 am

Hi all , i am generating a dynamic HF with the folowing settings. Min modulus= 1, group Size=1, Split load 80, Merge Load =50 etc. the problem is that even though the file is defined as dynamic, I was surprised to notice that even for a small amount of data (190 rows), the overflow file was being created with a split of 12 kb for the Data File and 4 Kb for the Overflow file.

Any one with any nice ideas on why this is happening ? i won't mind a small explanation of the parameters too :~)

chulett · Post by **chulett** » Wed Apr 18, 2007 7:20 am

Many factors at work here. How 'fat' are your records? Why a minimum modulus of 1? There have been a number of posting on hashed file details - Ken Bland had an extensive article in a newletter you should be able to find from the home page. Someone else posted this link which should help. And then there's the HFC or Hashed File Calculator on your Client CDROM, something you should install and check out. It will help you 'presize' your hashed files properly.

ray.wurlod · Post by **ray.wurlod** » Wed Apr 18, 2007 3:18 pm

An empty hashed file (one with zero records) will have a header in each of DATA.30 and OVER.30. The header is 2KB in size unless GROUP.SIZE is set to 2, in which case the header is 4KB in size.

You hashed file is therefore using only one buffer in OVER.30. This might be an overflowed group, it might be an oversized record, or it might be the SICA if you used DataStage/SQL to create the hashed file as a table.

An analysis tool such as ANALYZE.FILE will determine which of these it is.

hamzaqk · Post by **hamzaqk** » Wed Apr 18, 2007 10:58 pm

Well thanks ... .i did not know abt the HFC before.. anyways i am using the default values which seem to be ok ..... there are only 190 records in the file(approx 5-10kb) which shouldnt take much space... and as the number of groups (modulus) is defined to 1 and its size (Group size) is set to 1 (i.e. 1024 kb) . the file should not reach the load percentage of 80% , thus should not cause it to split ? as it only consuming 5-10kb of the group size defined so i see no reason it should split

ray.wurlod · Post by **ray.wurlod** » Wed Apr 18, 2007 11:12 pm

GROUP.SIZE is in units of 2KB. Definitely not 1024KB.

There is a storage overhead of three four-byte pointers per record. Straight away there's (190 x 12 = 2280 bytes).

On average, a group will only fill to the SPLIT.LOAD threshold, so that each group buffer will contain only approximately (2048 x 0.80 = 1628 bytes).

So, ignoring overflow (that is, assuming a perfectly tuned hashed file) I would expect your DATA.30 file to be (((10KB + 2280 bytes) / 1628 bytes/group) x 2048 bytes) in size. Or modulus 14, or approximately 28KB.

This is a rather simplified analysis, assuming perfect tuning, record sizes that are always multiples of four bytes, and no overflow or oversized records.