HASH File Lookup is very Slow

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
bapajju
Participant
Posts: 82
Joined: Wed Nov 19, 2003 11:58 pm

HASH File Lookup is very Slow

Post by bapajju »

Hi All,

I am trying to do a lookup through HASH file. I am reading 196000 records and in the hash there are 20000 records. The job runs very slow and the rows/sec comes to 24 rows\second.

Can someone help me in getting a proper tunning of this type of job. It wouls be very kind of you.

Thanks in advance
Bapajju
aartlett
Charter Member
Charter Member
Posts: 152
Joined: Fri Apr 23, 2004 6:44 pm
Location: Australia

Post by aartlett »

bapajju,
Are you sure the hash lookup is the slow point. Make sure you have preload to memory enabled, it speeds things up a bit. I have normally found another reason for slow lookups: normally database access especially upserts cause a problem.

Remember the job will only run as fast as the slowest link.
Andrew

Think outside the Datastage you work in.

There is no True Way, but there are true ways.
kiran_418
Participant
Posts: 40
Joined: Tue Feb 22, 2005 11:12 am

Post by kiran_418 »

Hi
May i know the total number of columns you are loading.
As we know hash can accomodate 2GB. Calculate the total amount of data you will be loading into hash by adding up all the column lenghts. You will get a figure. If it is more than 2Gb then you have a problem with hash. You have to preload into memory and change the minimum modulus and try to use static hash file. You can also go with dynamic hash file.
dsdesigner
Participant
Posts: 34
Joined: Thu Jul 29, 2004 1:03 pm

Post by dsdesigner »

HI Bapajju
There has been a lot of discussion in the forum regarding hash file performance. You will find a lot of valuable information if you search this forum.

However to answer your question

What is the value of modulus you are specifying when the hash file is created.
You need to use a modulus such that none of the records get into the over file.

Also the number of columns (width) in the hash file needs to as little as possible.

Please search the forum and you will find a lot more info.

Thanks
Shekar
bapajju
Participant
Posts: 82
Joined: Wed Nov 19, 2003 11:58 pm

Post by bapajju »

Yes I have checked the same. I tried to load the data in to a simple sequential file. Still the speed of the jobs was same.

Thanks
Amitav
bapajju
Participant
Posts: 82
Joined: Wed Nov 19, 2003 11:58 pm

Post by bapajju »

Appreciate your view. But does that mean that if number of columns are high (say 10-15) for lookup then the look up will be very slow? Could you please explain a bit more, like what could be module size for a hash file that uses 15 columns as lookup key. Or it will be very nice of you if you could suggest some methids to tune the hash look ups.

Thanks in advance
Bapajju
dsdesigner
Participant
Posts: 34
Joined: Thu Jul 29, 2004 1:03 pm

Post by dsdesigner »

Hi Bappaju,
Please refer to the following link. It has answers to almost all your questions.
viewtopic.php?t=85364.[url]
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

bapajju,

The number of columns is irrelevant to lookup speed, the number of bytes of data in those columns is. It will always take longer to transfer 10 columns filled with 1000 character per column than it will to transfer 10 columns of only 1 character per column.

That said, it is always best to reduce your data as far as possible, if you only need the key and one column never load all your column from a hash file or table. Make sure all strings are TRIMmed (sounds silly, but I've seen varchar(1024) strings that contained only 1 character but were padded with spaces (not trimmed) and were taking up a huge amount of unecessary storage and consequently made the lookup file quite large.

Unless your file is incredibly badly hashed you will not notice a great difference when you pre-load your hash file to memory for lookups, the big noticeable difference is with write times and lookups when the hash file is not loaded into memory.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Before loading to memory, make sure that you are not writing into the file in the same job.

Secondly, try using hash-file calculator to identify the best spread and creation options. You can try to create a static hash-file.

Also delete the old one and recreate it if you can.
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
On top of what was already said and keeping in mind a lookup is performed using a transformer stage ...

do you have complex derivations/stage variables?
are you using any routines in the transformer (before/after stage as well)?

the above also effects your performance since it depends on the slowest participant of the process - having this in mind what is your job flow???.

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

By-the-way, did you check the read-speed of the source as a separate entity? Also check whether writing into the target has any difficulty.

Also check whether you have any constraints on the rows that restricts the flow.
Post Reply