Posted: Thu Jan 06, 2011 3:58 pm
Let's begin with the terminology. It's properly "hashed" file, not "hash" file.
To understand the purpose of DATA.30 and OVER.30 you need to understand the internal structure of a hashed file. Records are organized into "groups" (pages, if you like), the size of which is determined by the GROUP.SIZE parameter (1 = 2KB, 2 = 4KB). Groups ideally consist of only one page (the "primary buffer"). If the hashing algorithm needs more space than one buffer, then the group daisy-chains into as many "secondary buffers" as needed. Oversized records (those larger than the LARGE.RECORD parameter) have their key stored in the group and their data stored in other secondary buffers. Primary buffers are stored in the DATA.30 file (which grows and shrinks as required, which is where the name "dynamic" comes from), secondary buffers are stored in the OVER.30 file. So, if OVER.30 is large, this might mean that you have overflowed groups and/or that you have oversized records. The ANALYZE.FILE utility with STATS keyword will report the numbers of both.
RESIZE is ordinarily done at TCL level and requires that the hashed file have a VOC entry. (Search for SETFILE command.) $DSHOME/bin/resize can be used, but its syntax is not documented. Compilation of the job is not relevant to the resize operation. RESIZE will not affect the size of OVER.30 if OVER.30 contains oversized records (which can not be stored in group buffers).
"Create file" in the Hashed File stage opens a dialog in which you can specify tuning parameters when the job creates the Hashed File. If Create File is not checked then no attempt is made to create the hashed file. If "Delete before Create" is not checked, then no attempt is made to create the hashed file if it already exists. The "container directory" for the hashed file only disappears if you delete the hashed file using DELETE.FILE command (if the hashed file is in the project) or rm -r command (if the hashed file is in a directory and has no VOC entry).
A VOC entry is created only if the account (= project) name is provided. You can subsequently create a VOC entry for a directory-pathed hashed file using the SETFILE command. If you later want to delete this hashed file, you also need to delete the VOC entry using a DELETE query.
Average record size for HFC is a manual calculation, and you must remember that everything is stored as string. ANALYZE.FILE command with the STATS option will show record sizes.
Separation is the size of a group in a static hashed file. It is in units of 512 bytes (for historical reasons). It does not apply to dynamic hashed files, for which the GROUP.SIZE parameter specifies the group buffer size.
To understand the purpose of DATA.30 and OVER.30 you need to understand the internal structure of a hashed file. Records are organized into "groups" (pages, if you like), the size of which is determined by the GROUP.SIZE parameter (1 = 2KB, 2 = 4KB). Groups ideally consist of only one page (the "primary buffer"). If the hashing algorithm needs more space than one buffer, then the group daisy-chains into as many "secondary buffers" as needed. Oversized records (those larger than the LARGE.RECORD parameter) have their key stored in the group and their data stored in other secondary buffers. Primary buffers are stored in the DATA.30 file (which grows and shrinks as required, which is where the name "dynamic" comes from), secondary buffers are stored in the OVER.30 file. So, if OVER.30 is large, this might mean that you have overflowed groups and/or that you have oversized records. The ANALYZE.FILE utility with STATS keyword will report the numbers of both.
RESIZE is ordinarily done at TCL level and requires that the hashed file have a VOC entry. (Search for SETFILE command.) $DSHOME/bin/resize can be used, but its syntax is not documented. Compilation of the job is not relevant to the resize operation. RESIZE will not affect the size of OVER.30 if OVER.30 contains oversized records (which can not be stored in group buffers).
"Create file" in the Hashed File stage opens a dialog in which you can specify tuning parameters when the job creates the Hashed File. If Create File is not checked then no attempt is made to create the hashed file. If "Delete before Create" is not checked, then no attempt is made to create the hashed file if it already exists. The "container directory" for the hashed file only disappears if you delete the hashed file using DELETE.FILE command (if the hashed file is in the project) or rm -r command (if the hashed file is in a directory and has no VOC entry).
A VOC entry is created only if the account (= project) name is provided. You can subsequently create a VOC entry for a directory-pathed hashed file using the SETFILE command. If you later want to delete this hashed file, you also need to delete the VOC entry using a DELETE query.
Average record size for HFC is a manual calculation, and you must remember that everything is stored as string. ANALYZE.FILE command with the STATS option will show record sizes.
Separation is the size of a group in a static hashed file. It is in units of 512 bytes (for historical reasons). It does not apply to dynamic hashed files, for which the GROUP.SIZE parameter specifies the group buffer size.