Page 1 of 1

Hash file Cache V/s Hash file Size

Posted: Mon Mar 22, 2004 12:03 am
by MukundShastri
In datastage administrator we have the tunable parameter setting for hash file which can be set maximum upto 999 MB. But it is said that 2GB is the maximum size of hash file. I have also seen hash file size cannot go beyond 2GB but very well above 999MB. Can anybody comment on what for this cache setting is and whether it is related to size of the hash file ??

Posted: Mon Mar 22, 2004 12:44 am
by anupam
Hello Mukund,

The cache size setting can be used for configuring the size of the read and write caches.

When a Hashed File stage writes records to a hashed file, this can be
written to cached rather than writing to the hashed file immediately.

Similarly, when a Hashed File stage is reading a hashed file there is an
option to pre-load the file to memory, which makes subsequent access very much faster and is used when the file is providing a reference link to a Transformer stage.

This cache Size Setting is not related to the maximum size of Hash File (2GB).

I hope i have given some valuable feed back.

Posted: Mon Mar 22, 2004 2:49 am
by MukundShastri
Hi Anupam,
I think you have given me the answer.
I have written you email via rediffmail sometime back.Please reply.

Thanks

Posted: Mon Mar 22, 2004 3:07 pm
by ray.wurlod
The cache limit is the maximum amount of memory that can be allocated for caching the hashed file in memory.

The maximum hashed file size refers to the largest hashed file that is possible on disk. This maximum is an artifact of the internal pointers used in the hashed file.
  • With 32-bit pointers the maximum address that can be specified is 2GB (2**31-1: the pointers are treated as signed integers).
    With 64-bit pointers the maximum address that can be specified is theoretically 0.9PB (2**63-1), though very few operating systems will allow a file of this size.

Posted: Tue Mar 23, 2004 5:34 am
by MukundShastri
Hi Ray,

32-Bit or 64-bit pointers are dependent upon the OS system hardware configuration. The OS can be of 32-bit or 64-bit configuration. Please correct me if my understanding is wrong.

Posted: Tue Mar 23, 2004 8:45 am
by kduke
The 2GB limit was originally a UNIX filesystem limit. Next they went to unsigned integers and got 4GB. So the version of OS and even how the filesystem was created is important but almost every version of UNIX supports large filesystems. If you are on Sun or HP or some mainstream version of UNIX then do not worry. If not then check with Ascential. Better safe than sorry.

Posted: Tue Mar 23, 2004 3:20 pm
by ray.wurlod
MukundShastri wrote:32-Bit or 64-bit pointers are dependent upon the OS system hardware configuration. The OS can be of 32-bit or 64-bit configuration. Please correct me if my understanding is wrong.
The operating system must also be able to support files of the size implied by 64-bit internal addressing. This is not the same thing as saying that the operating system has 64-bit capability, which usually refers to numeric data types.
The maximum size of an operating system file is the criterion. This is set in the kernel, and may be reduced by ulimit. The operating system does not see the 64-bit pointers within hashed files (these are seen only by DataStage).

Posted: Tue Mar 23, 2004 11:37 pm
by MukundShastri
Hi Ray,
Our Unix Sun Solaris operating system can store files of 10GB Plus size , however we are able to store Max size of Hash File as 2GB. Does it not contradict that "If max Size of the operating system file is the critirion for Hash File Size". Am I missing something from your comments made earlier ?
Thanks
Mukund

Posted: Wed Mar 24, 2004 12:14 am
by ray.wurlod
If your hashed files have 32-bit internal addressing (the default) they can only be up to 2GB in size.
If your hashed files have 64-bit internal addressing they can be larger than 2GB.
On Solaris the ulimit -f command will report the maximum size of an operating system file, however "unlimited" is misleading. There are other, built-in and hardware-specific limits. So, iirc, the maximum file size on Solaris is 1TB. This is theoretically the largest size that a hashed file with 64-bit internal addressing can be on Solaris. Of course, a file can not be larger than a file system.