HASH File Lookup is very Slow

bapajju · Post by **bapajju** » Tue Jun 07, 2005 5:57 pm

Hi All,

I am trying to do a lookup through HASH file. I am reading 196000 records and in the hash there are 20000 records. The job runs very slow and the rows/sec comes to 24 rows\second.

Can someone help me in getting a proper tunning of this type of job. It wouls be very kind of you.

Thanks in advance
Bapajju

aartlett · Post by **aartlett** » Tue Jun 07, 2005 6:37 pm

bapajju,
Are you sure the hash lookup is the slow point. Make sure you have preload to memory enabled, it speeds things up a bit. I have normally found another reason for slow lookups: normally database access especially upserts cause a problem.

Remember the job will only run as fast as the slowest link.

kiran_418 · Post by **kiran_418** » Tue Jun 07, 2005 6:43 pm

Hi
May i know the total number of columns you are loading.
As we know hash can accomodate 2GB. Calculate the total amount of data you will be loading into hash by adding up all the column lenghts. You will get a figure. If it is more than 2Gb then you have a problem with hash. You have to preload into memory and change the minimum modulus and try to use static hash file. You can also go with dynamic hash file.

dsdesigner · Post by **dsdesigner** » Tue Jun 07, 2005 11:06 pm

HI Bapajju
There has been a lot of discussion in the forum regarding hash file performance. You will find a lot of valuable information if you search this forum.

However to answer your question

What is the value of modulus you are specifying when the hash file is created.
You need to use a modulus such that none of the records get into the over file.

Also the number of columns (width) in the hash file needs to as little as possible.

Please search the forum and you will find a lot more info.

Thanks
Shekar

bapajju · Post by **bapajju** » Wed Jun 08, 2005 7:34 pm

Yes I have checked the same. I tried to load the data in to a simple sequential file. Still the speed of the jobs was same.

Thanks
Amitav

bapajju · Post by **bapajju** » Wed Jun 08, 2005 7:39 pm

Appreciate your view. But does that mean that if number of columns are high (say 10-15) for lookup then the look up will be very slow? Could you please explain a bit more, like what could be module size for a hash file that uses 15 columns as lookup key. Or it will be very nice of you if you could suggest some methids to tune the hash look ups.

Thanks in advance
Bapajju

dsdesigner · Post by **dsdesigner** » Wed Jun 08, 2005 9:43 pm

Hi Bappaju,
Please refer to the following link. It has answers to almost all your questions.
viewtopic.php?t=85364.[url]

ArndW · Post by **ArndW** » Thu Jun 09, 2005 12:45 am

bapajju,

The number of columns is irrelevant to lookup speed, the number of bytes of data in those columns is. It will always take longer to transfer 10 columns filled with 1000 character per column than it will to transfer 10 columns of only 1 character per column.

That said, it is always best to reduce your data as far as possible, if you only need the key and one column never load all your column from a hash file or table. Make sure all strings are TRIMmed (sounds silly, but I've seen varchar(1024) strings that contained only 1 character but were padded with spaces (not trimmed) and were taking up a huge amount of unecessary storage and consequently made the lookup file quite large.

Unless your file is incredibly badly hashed you will not notice a great difference when you pre-load your hash file to memory for lookups, the big noticeable difference is with write times and lookups when the hash file is not loaded into memory.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Thu Jun 09, 2005 12:51 am

Before loading to memory, make sure that you are not writing into the file in the same job.

Secondly, try using hash-file calculator to identify the best spread and creation options. You can try to create a static hash-file.

Also delete the old one and recreate it if you can.

roy · Post by **roy** » Thu Jun 09, 2005 1:58 am

Hi,
On top of what was already said and keeping in mind a lookup is performed using a transformer stage ...

do you have complex derivations/stage variables?
are you using any routines in the transformer (before/after stage as well)?

the above also effects your performance since it depends on the slowest participant of the process - having this in mind what is your job flow???.

IHTH,

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Thu Jun 09, 2005 2:17 am

By-the-way, did you check the read-speed of the source as a separate entity? Also check whether writing into the target has any difficulty.

Also check whether you have any constraints on the rows that restricts the flow.