Hash file Cache V/s Hash file Size

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
MukundShastri
Premium Member
Premium Member
Posts: 103
Joined: Tue Oct 14, 2003 4:07 am

Hash file Cache V/s Hash file Size

Post by MukundShastri »

In datastage administrator we have the tunable parameter setting for hash file which can be set maximum upto 999 MB. But it is said that 2GB is the maximum size of hash file. I have also seen hash file size cannot go beyond 2GB but very well above 999MB. Can anybody comment on what for this cache setting is and whether it is related to size of the hash file ??
anupam
Participant
Posts: 172
Joined: Fri Apr 04, 2003 10:51 pm
Location: India

Post by anupam »

Hello Mukund,

The cache size setting can be used for configuring the size of the read and write caches.

When a Hashed File stage writes records to a hashed file, this can be
written to cached rather than writing to the hashed file immediately.

Similarly, when a Hashed File stage is reading a hashed file there is an
option to pre-load the file to memory, which makes subsequent access very much faster and is used when the file is providing a reference link to a Transformer stage.

This cache Size Setting is not related to the maximum size of Hash File (2GB).

I hope i have given some valuable feed back.
----------------
Rgds,
Anupam
----------------
The future is not something we enter. The future is something we create.
MukundShastri
Premium Member
Premium Member
Posts: 103
Joined: Tue Oct 14, 2003 4:07 am

Post by MukundShastri »

Hi Anupam,
I think you have given me the answer.
I have written you email via rediffmail sometime back.Please reply.

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The cache limit is the maximum amount of memory that can be allocated for caching the hashed file in memory.

The maximum hashed file size refers to the largest hashed file that is possible on disk. This maximum is an artifact of the internal pointers used in the hashed file.
  • With 32-bit pointers the maximum address that can be specified is 2GB (2**31-1: the pointers are treated as signed integers).
    With 64-bit pointers the maximum address that can be specified is theoretically 0.9PB (2**63-1), though very few operating systems will allow a file of this size.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
MukundShastri
Premium Member
Premium Member
Posts: 103
Joined: Tue Oct 14, 2003 4:07 am

Post by MukundShastri »

Hi Ray,

32-Bit or 64-bit pointers are dependent upon the OS system hardware configuration. The OS can be of 32-bit or 64-bit configuration. Please correct me if my understanding is wrong.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

The 2GB limit was originally a UNIX filesystem limit. Next they went to unsigned integers and got 4GB. So the version of OS and even how the filesystem was created is important but almost every version of UNIX supports large filesystems. If you are on Sun or HP or some mainstream version of UNIX then do not worry. If not then check with Ascential. Better safe than sorry.
Mamu Kim
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

MukundShastri wrote:32-Bit or 64-bit pointers are dependent upon the OS system hardware configuration. The OS can be of 32-bit or 64-bit configuration. Please correct me if my understanding is wrong.
The operating system must also be able to support files of the size implied by 64-bit internal addressing. This is not the same thing as saying that the operating system has 64-bit capability, which usually refers to numeric data types.
The maximum size of an operating system file is the criterion. This is set in the kernel, and may be reduced by ulimit. The operating system does not see the 64-bit pointers within hashed files (these are seen only by DataStage).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
MukundShastri
Premium Member
Premium Member
Posts: 103
Joined: Tue Oct 14, 2003 4:07 am

Post by MukundShastri »

Hi Ray,
Our Unix Sun Solaris operating system can store files of 10GB Plus size , however we are able to store Max size of Hash File as 2GB. Does it not contradict that "If max Size of the operating system file is the critirion for Hash File Size". Am I missing something from your comments made earlier ?
Thanks
Mukund
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If your hashed files have 32-bit internal addressing (the default) they can only be up to 2GB in size.
If your hashed files have 64-bit internal addressing they can be larger than 2GB.
On Solaris the ulimit -f command will report the maximum size of an operating system file, however "unlimited" is misleading. There are other, built-in and hardware-specific limits. So, iirc, the maximum file size on Solaris is 1TB. This is theoretically the largest size that a hashed file with 64-bit internal addressing can be on Solaris. Of course, a file can not be larger than a file system.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply