Creating Hashfiles: size limitations

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
dwscblr
Participant
Posts: 19
Joined: Tue May 18, 2004 12:39 am

Creating Hashfiles: size limitations

Post by dwscblr »

I have to load 40 million records into a hashfile. The size of each row in the Hashfile is 50 bytes.

When I created it as dynamic hashfile it failed after a million rows with the error:
Abnormal termination of stage CopyOfint2outLoadAddrProcResHash..LoadHashCodedRecordsOut.IDENT1 detected

So I moved to the static hashfile. Used alogrithim 16(ascii key, any variations), created 4500 buckets each of size 8196K. The job failed with the error.
CopyOfint2outLoadAddrProcResHash..LoadHashCodedRecordsOut.IDENT1:|CopyOfint2outLoadAddrProcResHash..LoadHashCodedRecordsOut.LoadTableFile: DSD.UVOpen mkdbfile: unable to create a 32-bit file greater than 2 gigabytes.
.|

I have a couple of questions:
1. Is there a limit of the size of the Hashfiles. I thought the 2GB size no longer existed.
2. Could you suggest approaches, of How i could load 40 millions rows into a hashfile.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Select 64bit. It does not have this limit.
Mamu Kim
Amos.Rosmarin
Premium Member
Premium Member
Posts: 385
Joined: Tue Oct 07, 2003 4:55 am

Post by Amos.Rosmarin »

Hi,

Are you using a 32-bit machine? The 2G limit is still there. It's gone in the 64 bit machines.

The best I can think of is spliting the file either vertically or horizontlly


HTH
Amos
ariear
Participant
Posts: 237
Joined: Thu Dec 26, 2002 2:19 pm

Post by ariear »

If you go for 64bit you'll have to create the hash manually (mkdbfile .....64bit) and still won't be able to cache it (1GB LIMIT)
I'd go for a splitting algorithm
dwscblr
Participant
Posts: 19
Joined: Tue May 18, 2004 12:39 am

Post by dwscblr »

Its is a 64 bit machine. But the DataStage software is 32 bit, so my DS administrator says.

How do I manually create Hashfiles?
ariear
Participant
Posts: 237
Joined: Thu Dec 26, 2002 2:19 pm

Post by ariear »

every cd has an unsupported folder.
There you'll find a utility called something like HFR. It's a utility that generates a syntax for a specific command (MKDBFILE OR CREATE.FILE) for hash file creation that you can later use from your before job (OS or TCL)
ketfos
Participant
Posts: 562
Joined: Mon May 03, 2004 8:58 pm
Location: san francisco
Contact:

Post by ketfos »

Hi,
IN DataStage Administrator, go to the command window and type
CREATE.FILE and hit EXECUTE
It will ask you for file parameters
FILE NAME
FILE TYPE
FILE MODULO
SEPERATION


Ketfos
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

ariear wrote:There you'll find a utility called something like HFR.
HFC - the very useful but 'unsupported' Hash File Calculator utility, written by our dear friend Mr Wurlod. As noted, it will generate the create statement needed based on your input parameters.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

64bit is an option on the CREATE.FILE which you can set within the job. You can also do it manually. It has nothing to do with 64bit UNIX or any other thing like that. It should work on all current versions of DataStage and some older ones.
Mamu Kim
dwscblr
Participant
Posts: 19
Joined: Tue May 18, 2004 12:39 am

Post by dwscblr »

Many Thanks for the information. It helped me a lot.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

64-bit refers only to the internal forward and backward pointers within the file structure. Therefore, it does not matter that DataStage is a "32-bit application" (I'm not sure what that means - is it 16 times better than a "two-bit application"?). However, it is essential that the Operating System can support 64-bit quantities for this to work.

Incidentally, it's the 32-bit pointers that cause the 2GB limit. Observe that 2^31 is 2GB, and take away the sign bit from the pointer, and that leaves 31 bits with which to determine an offset.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Ray

64-bit hash files work on all supported platforms the DataStage ships on, correct?
Mamu Kim
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Not certain, but I believe so for current version.

There is a list of platforms somewhere in the UniVerse manuals for which 64-bit files are supported; I guess you could compare this to the list of supported platforms for DataStage.

Some of the older versions of DataStage and operating systems may not support 64-bit pointers.

If anyone wants to test a particular system all you have to do is to try to create a hashed file (or UV table) with 64-bit option set. If it fails because the O/S doesn't support them, the diagnostic message will say so.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

I did not know that. I thought they are supported on all platforms. Thanks Ray.
Mamu Kim
Post Reply