Some Hashed File questions

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
vnspn
Participant
Posts: 165
Joined: Mon Feb 12, 2007 11:42 am

Some Hashed File questions

Post by vnspn »

Hi,

We are using the Hashed Files for doing the lookup. We just like to get certain informations clarified on this.

1) Does Hashed files have good performance only till the number of records that are loaded into it is less? We would have around 5,000 to 10,000 records as of now that needs to be loaded into the Hashed file. Would the performance slowly do down as the number of records slowly increments as days progress?

2) We specify the Read and Write cache size for Hashed files as default 128 MB in DS Administrator. Does it mean that this is the amount of data that it can hold in memory at a point in time?

3) Where does Hashed file put the data into cache and when does it put the data in the disk? Does it hold the data in cache to the maximum as specified in cache size and puts the remaining in the disk?

Thanks.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

1) No. The performance seems to be consistent across the number of rows. The issue is you need to load the hashed file which takes longer based on the number of rows.

2) If your whole hashed file will not fit in 128mb then it will put it in cache. Also when you are writing to a hashed file in cache then it will take it out of cache when it exceeds 128mb.

3) It writes it to disk at the end of the job. It is all or nothing so either the hashed file fits in 128mb or doesn't.

So change 128 to 999 or whatever to make it perform better.

I think all of that is correct but I sure someone will correct me if I forgot how all this works.
Mamu Kim
vnspn
Participant
Posts: 165
Joined: Mon Feb 12, 2007 11:42 am

Post by vnspn »

kduke wrote: 2) If your whole hashed file will not fit in 128mb then it will put it in cache. Also when you are writing to a hashed file in cache then it will take it out of cache when it exceeds 128mb.
As per the point you have mentioned above, if I try to write around 1 million records into the Hashed file, then all the records that don't fit into the 128 MB would sit in the cache. This 128 MB is the size of the data that Hashed file can hold. Is it correct?


I have an understanding like, when I create a Hashed File, it creates files like DATA.30 and OVER.30. So, does it mean that when I write to a Hashed file, the data is written to a disk; and when I read from the Hashed file, data are first loaded into memory and then the processing is done. Is this correct?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You need to remove the 'not' from Kim's quote. :wink:

Reference hashed files will only be cached for reading if the entire thing fits, otherwise nothing will be cached. Write caching is another beast.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply