Page 1 of 1

Hashfile cache - how much memory is used

Posted: Tue Feb 15, 2005 9:54 am
by netland
Hi,

I'm trying to find out how much memory is used when defining cache in the administrator (tunable)

I've set read and write cache to 128MB (default)
And my job is using some 50 hashfiles of different sizes, most of them are quite small)

Does my job allocate 50*128MB, or only the total size of the 50hashfiles, but with a project maximum of 128MB ?

Is the 128MB allocated for ALL hashfiles that are cache enabled. or is it a maximum for each hashfile, each job or the project ?

br
Tom

Posted: Tue Feb 15, 2005 10:47 am
by ArndW
Hello netland,

the cache size you specify is per hash file; and is not persistent. The cache is not pre-allocated but generated at runtime and discarded once finished.

Posted: Tue Feb 15, 2005 11:09 am
by scottr
in 32bit env the max hash file size is 2.2 Gig

Posted: Tue Feb 15, 2005 11:26 am
by ArndW
Scottr -
in 32bit env the max hash file size is 2.2 Gig
That is not quite true. The maximum size of a file system object is limited to 2Gb; but a default dynamic (type 30) hash file is actually composed of 2 file system files, so you can get significantly more data into a hashed file depending on the key types and data - but you never know how much data you can fit in until you *bang* hit the limit and most likely have a very lengthy fixfile process ahead of you :shock:

Re: Hashfile cache - how much memory is used

Posted: Tue Feb 15, 2005 11:58 am
by kcbland
netland wrote: Does my job allocate 50*128MB
No
netland wrote: or only the total size of the 50hashfiles, but with a project maximum of 128MB ?
No
netland wrote: Is the 128MB allocated for ALL hashfiles that are cache enabled.
No
netland wrote: or is it a maximum for each hashfile
Yes
netland wrote: each job or the project ?
No

Posted: Tue Feb 15, 2005 12:01 pm
by kcbland
Okay, enough fun. Each hash file has a maximum size that can be cached, once the file at preload time exceeds that size you get a message telling you that the file is too big and it won't preload.

As for write delay, that's the maximum that can write delay until it has to start writing to disk.

You will find that DataStage uses very little memory, but when you can, you could enable hash file sharing in a job incase the same hash file is referenced many times in the job. This allows only one footprint in memory for that file. You can also look at the hash cache daemon to manage a shared hash file across jobs so that the job doesn't incur any preload time because the file is already cached.

Posted: Tue Feb 15, 2005 2:39 pm
by throbinson
Can you tell me the practical differences between Link Private, Link Public and System caching?

Posted: Tue Feb 15, 2005 7:43 pm
by ray.wurlod
Those are well described in the "manual" dsdskche.pdf. System caching allows one hashed file to be shared between multiple jobs. I haven't attempted it but it may even allow sharing between jobs in multiple projects!