Hash File which can hold more than 2 GB data

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

abhilashnair
Participant
Posts: 284
Joined: Fri Oct 13, 2006 4:31 am

Hash File which can hold more than 2 GB data

Post by abhilashnair »

I have data larger than 2GB to be stored into a hash file. The hash file I am using is 64bit. Still it is not able to store the data and the server job which does this aborts with WriteHash() error.

Should I go for a PX job with datasets, or is there any way for making the hash file store that much data. I am talking of 18GB data here.

My job looks like this

ODBC Stage -> Transformer-> Hash File
Maveric
Participant
Posts: 388
Joined: Tue Mar 13, 2007 1:28 am

Post by Maveric »

Search for "hash file size limit". See if one of those posts help.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If you have a 64bit hashed file then it will be able to store 18Gb of data. If your OS does not allow files larger than 2Gb then using a 64bit hashed file won't help; and this is what I think has happened on your system.
If you have PX installed and implemented it might make more sense to use datasets for this - but that depends upon how you intend to use the file.
abhilashnair
Participant
Posts: 284
Joined: Fri Oct 13, 2006 4:31 am

Post by abhilashnair »

ArndW wrote:If you have a 64bit hashed file then it will be able to store 18Gb of data. If your OS does not allow files larger than 2Gb then using a 64bit hashed file won't help; and this is what I think has happ ...
I thought that the max limit for 64bit hash file is not more than 4GB.
BTW..why won't the OS allow the size of 18GB...I am not clear on that.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If you do the math, a 64bit pointer can address significantly more than 4Gb. Just call up your handy calculator and try 2^64.
I never said that the OS won't allow 18Gb, that was your interpretation. Historically many OS's limited files to 2Gb, with the advent of 64bit addressing they allow files to grow, but on some you need to enable that feature from the OS, and I assume that is what has happened in your case, if you actually did create a 64bit hashed file.
abhilashnair
Participant
Posts: 284
Joined: Fri Oct 13, 2006 4:31 am

Post by abhilashnair »

Ok. Now I get your point ArndW. We have DS set up on Unix. Any idea how do we go about checking whether the OS allows this much data...or is it something for the Unix Geeks to answer?
Cr.Cezon
Participant
Posts: 101
Joined: Mon Mar 05, 2007 4:59 am
Location: Madrid

Post by Cr.Cezon »

You must do ulimit -a for the user root to see the kernells parameters. You can put it to ulimited, then your os will have not this limitation.

Also,You can resize your file hash to increase it up to 2GB using the command RESIZE. but a Hash file more than 2GB is not trateable in goog performance.You willl have problems to work with it.

If you have PX intalled it is better to use DataSets.

Cristina.
abhilashnair wrote:Ok. Now I get your point ArndW. We have DS set up on Unix. Any idea how do we go about checking whether the OS allows this much data...or is it something for the Unix Geeks to answer?
abhilashnair
Participant
Posts: 284
Joined: Fri Oct 13, 2006 4:31 am

Post by abhilashnair »

time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) unlimited
memory(kbytes) unlimited
coredump(blocks) unlimited
nofiles(descriptors) unlimited

The above is the output of ulimit -a in the directory where my hash file is present
Cr.Cezon
Participant
Posts: 101
Joined: Mon Mar 05, 2007 4:59 am
Location: Madrid

Post by Cr.Cezon »

have you done these command as root user?

abhilashnair wrote:time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) unlimited
memory(kbytes) unlimited
coredump(blocks) unlimited
nofiles(descriptors) unlimited

The above is the output of ulimit -a in the directory where my hash file is present
abhilashnair
Participant
Posts: 284
Joined: Fri Oct 13, 2006 4:31 am

Post by abhilashnair »

I logged in with my own id and executed this command. I am not aware what exactly do you mean by root user?
Cr.Cezon
Participant
Posts: 101
Joined: Mon Mar 05, 2007 4:59 am
Location: Madrid

Post by Cr.Cezon »

abhilashnair wrote:I logged in with my own id and executed this command. I am not aware what exactly do you mean by root user?
root is the user administrator of your unix machine.

The ulimit -a for you user is not relevant because DataStage don't use it.
abhilashnair
Participant
Posts: 284
Joined: Fri Oct 13, 2006 4:31 am

Post by abhilashnair »

Oh! Then we have an issue here. I have to contact Unix admin. He is in the US and myself in India.
Cr.Cezon
Participant
Posts: 101
Joined: Mon Mar 05, 2007 4:59 am
Location: Madrid

Post by Cr.Cezon »

abhilashnair wrote:Oh! Then we have an issue here. I have to contact Unix admin. He is in the US and myself in India.
then ask him, but DataStage has another limitation with Hash file that is not in the s.o.
for hash files greater than 2 GB the performance is not good.

If you want to work with it you must RESIZE the hash file after you have created but the performance is not good.
Better use other stage
abhilashnair
Participant
Posts: 284
Joined: Fri Oct 13, 2006 4:31 am

Post by abhilashnair »

I just tried running the same DS job with 'Allow Stage Write Cache' Option unchecked. It was checked earlier when the job aborted.

The job is running fine. Now the point where this job earlier aborted has been passed and the job is still running...

But it is very slow....
Cr.Cezon
Participant
Posts: 101
Joined: Mon Mar 05, 2007 4:59 am
Location: Madrid

Post by Cr.Cezon »

yes, the performace will be not very good, and when you will reach the limit of 2GB (or if you have 64 bits upper) the job will abort .....
Post Reply