Page 1 of 1

Creating Hashfiles: size limitations

Posted: Thu Jul 15, 2004 1:49 pm
by dwscblr
I have to load 40 million records into a hashfile. The size of each row in the Hashfile is 50 bytes.

When I created it as dynamic hashfile it failed after a million rows with the error:
Abnormal termination of stage CopyOfint2outLoadAddrProcResHash..LoadHashCodedRecordsOut.IDENT1 detected

So I moved to the static hashfile. Used alogrithim 16(ascii key, any variations), created 4500 buckets each of size 8196K. The job failed with the error.
CopyOfint2outLoadAddrProcResHash..LoadHashCodedRecordsOut.IDENT1:|CopyOfint2outLoadAddrProcResHash..LoadHashCodedRecordsOut.LoadTableFile: DSD.UVOpen mkdbfile: unable to create a 32-bit file greater than 2 gigabytes.
.|

I have a couple of questions:
1. Is there a limit of the size of the Hashfiles. I thought the 2GB size no longer existed.
2. Could you suggest approaches, of How i could load 40 millions rows into a hashfile.

Posted: Thu Jul 15, 2004 2:40 pm
by kduke
Select 64bit. It does not have this limit.

Posted: Thu Jul 15, 2004 2:41 pm
by Amos.Rosmarin
Hi,

Are you using a 32-bit machine? The 2G limit is still there. It's gone in the 64 bit machines.

The best I can think of is spliting the file either vertically or horizontlly


HTH
Amos

Posted: Thu Jul 15, 2004 2:48 pm
by ariear
If you go for 64bit you'll have to create the hash manually (mkdbfile .....64bit) and still won't be able to cache it (1GB LIMIT)
I'd go for a splitting algorithm

Posted: Thu Jul 15, 2004 3:11 pm
by dwscblr
Its is a 64 bit machine. But the DataStage software is 32 bit, so my DS administrator says.

How do I manually create Hashfiles?

Posted: Thu Jul 15, 2004 3:19 pm
by ariear
every cd has an unsupported folder.
There you'll find a utility called something like HFR. It's a utility that generates a syntax for a specific command (MKDBFILE OR CREATE.FILE) for hash file creation that you can later use from your before job (OS or TCL)

Posted: Thu Jul 15, 2004 3:27 pm
by ketfos
Hi,
IN DataStage Administrator, go to the command window and type
CREATE.FILE and hit EXECUTE
It will ask you for file parameters
FILE NAME
FILE TYPE
FILE MODULO
SEPERATION


Ketfos

Posted: Thu Jul 15, 2004 4:16 pm
by chulett
ariear wrote:There you'll find a utility called something like HFR.
HFC - the very useful but 'unsupported' Hash File Calculator utility, written by our dear friend Mr Wurlod. As noted, it will generate the create statement needed based on your input parameters.

Posted: Thu Jul 15, 2004 5:37 pm
by kduke
64bit is an option on the CREATE.FILE which you can set within the job. You can also do it manually. It has nothing to do with 64bit UNIX or any other thing like that. It should work on all current versions of DataStage and some older ones.

Posted: Thu Jul 15, 2004 6:56 pm
by dwscblr
Many Thanks for the information. It helped me a lot.

Posted: Fri Jul 16, 2004 4:47 am
by ray.wurlod
64-bit refers only to the internal forward and backward pointers within the file structure. Therefore, it does not matter that DataStage is a "32-bit application" (I'm not sure what that means - is it 16 times better than a "two-bit application"?). However, it is essential that the Operating System can support 64-bit quantities for this to work.

Incidentally, it's the 32-bit pointers that cause the 2GB limit. Observe that 2^31 is 2GB, and take away the sign bit from the pointer, and that leaves 31 bits with which to determine an offset.

Posted: Fri Jul 16, 2004 5:48 am
by kduke
Ray

64-bit hash files work on all supported platforms the DataStage ships on, correct?

Posted: Fri Jul 16, 2004 4:01 pm
by ray.wurlod
Not certain, but I believe so for current version.

There is a list of platforms somewhere in the UniVerse manuals for which 64-bit files are supported; I guess you could compare this to the list of supported platforms for DataStage.

Some of the older versions of DataStage and operating systems may not support 64-bit pointers.

If anyone wants to test a particular system all you have to do is to try to create a hashed file (or UV table) with 64-bit option set. If it fails because the O/S doesn't support them, the diagnostic message will say so.

Posted: Fri Jul 16, 2004 4:48 pm
by kduke
I did not know that. I thought they are supported on all platforms. Thanks Ray.