Page 2 of 2

Posted: Thu Feb 15, 2007 6:55 pm
by narasimha
There are two references to a hashed file like - D_HashedFileXXX and HashedFileXXX.
Inside HashedFileXXX directory you have two files - DATA.30 and OVER.30.
Check the size of these too.

Posted: Thu Feb 15, 2007 6:58 pm
by paddu
sorry about that

here are the size of the files

DATA.30 691,870 KB
OVER.30 223,234 KB

Posted: Thu Feb 15, 2007 8:57 pm
by ray.wurlod
Just for interest, do the calculations like I did earlier. Keep one byte per delimiter and use 14 bytes/record as the storage overhead. Post your calculations, so I can be certain you understand.

Posted: Fri Feb 16, 2007 3:02 pm
by paddu
Ray-"Just for interest, do the calculations like I did earlier. Keep one byte per delimiter and use 14 bytes/record as the storage overhead. Post your calculations, so I can be certain you understand. "

I did not follow you Ray :? exactly . Why we need delimiter while calculating for Hashed files?


We downloaded Datastage client from the IBM site so Unsupported utilities was not provided to us.
I am not sure how to calculate for Hashed files.

I did something like this

10+3+8+14=35 bytes per line For 15446662 records equates to 540633170 bytes

May be i need to know more about Hashed files

Thanks
paddu

Posted: Fri Feb 16, 2007 7:27 pm
by ray.wurlod
paddu wrote:Why we need delimiter while calculating for Hashed files?

10+3+8+14=35 bytes per line \
For 15446662 records equates to 540633170 bytes
You need the delimiter because it's in there (though it's a "field mark" in hashed files).

By default hashed files are only filled to 80% of capacity, so your calculation is in the ballpark. ALlowing for the 80% (the "split load" tuneable parameter), your calculation would yield 675,791,463 bytes, whereas you observed slightly more than this.

There are other complexities relating to headers, free space, overflowed groups and oversized records, that I deliberately avoided. Your calculation, as I noted, is in the ballpark.