HASH File Lookup is very Slow
Moderators: chulett, rschirm, roy
HASH File Lookup is very Slow
Hi All,
I am trying to do a lookup through HASH file. I am reading 196000 records and in the hash there are 20000 records. The job runs very slow and the rows/sec comes to 24 rows\second.
Can someone help me in getting a proper tunning of this type of job. It wouls be very kind of you.
Thanks in advance
Bapajju
I am trying to do a lookup through HASH file. I am reading 196000 records and in the hash there are 20000 records. The job runs very slow and the rows/sec comes to 24 rows\second.
Can someone help me in getting a proper tunning of this type of job. It wouls be very kind of you.
Thanks in advance
Bapajju
bapajju,
Are you sure the hash lookup is the slow point. Make sure you have preload to memory enabled, it speeds things up a bit. I have normally found another reason for slow lookups: normally database access especially upserts cause a problem.
Remember the job will only run as fast as the slowest link.
Are you sure the hash lookup is the slow point. Make sure you have preload to memory enabled, it speeds things up a bit. I have normally found another reason for slow lookups: normally database access especially upserts cause a problem.
Remember the job will only run as fast as the slowest link.
Andrew
Think outside the Datastage you work in.
There is no True Way, but there are true ways.
Think outside the Datastage you work in.
There is no True Way, but there are true ways.
Hi
May i know the total number of columns you are loading.
As we know hash can accomodate 2GB. Calculate the total amount of data you will be loading into hash by adding up all the column lenghts. You will get a figure. If it is more than 2Gb then you have a problem with hash. You have to preload into memory and change the minimum modulus and try to use static hash file. You can also go with dynamic hash file.
May i know the total number of columns you are loading.
As we know hash can accomodate 2GB. Calculate the total amount of data you will be loading into hash by adding up all the column lenghts. You will get a figure. If it is more than 2Gb then you have a problem with hash. You have to preload into memory and change the minimum modulus and try to use static hash file. You can also go with dynamic hash file.
-
- Participant
- Posts: 34
- Joined: Thu Jul 29, 2004 1:03 pm
HI Bapajju
There has been a lot of discussion in the forum regarding hash file performance. You will find a lot of valuable information if you search this forum.
However to answer your question
What is the value of modulus you are specifying when the hash file is created.
You need to use a modulus such that none of the records get into the over file.
Also the number of columns (width) in the hash file needs to as little as possible.
Please search the forum and you will find a lot more info.
Thanks
Shekar
There has been a lot of discussion in the forum regarding hash file performance. You will find a lot of valuable information if you search this forum.
However to answer your question
What is the value of modulus you are specifying when the hash file is created.
You need to use a modulus such that none of the records get into the over file.
Also the number of columns (width) in the hash file needs to as little as possible.
Please search the forum and you will find a lot more info.
Thanks
Shekar
Appreciate your view. But does that mean that if number of columns are high (say 10-15) for lookup then the look up will be very slow? Could you please explain a bit more, like what could be module size for a hash file that uses 15 columns as lookup key. Or it will be very nice of you if you could suggest some methids to tune the hash look ups.
Thanks in advance
Bapajju
Thanks in advance
Bapajju
-
- Participant
- Posts: 34
- Joined: Thu Jul 29, 2004 1:03 pm
Hi Bappaju,
Please refer to the following link. It has answers to almost all your questions.
viewtopic.php?t=85364.[url]
Please refer to the following link. It has answers to almost all your questions.
viewtopic.php?t=85364.[url]
bapajju,
The number of columns is irrelevant to lookup speed, the number of bytes of data in those columns is. It will always take longer to transfer 10 columns filled with 1000 character per column than it will to transfer 10 columns of only 1 character per column.
That said, it is always best to reduce your data as far as possible, if you only need the key and one column never load all your column from a hash file or table. Make sure all strings are TRIMmed (sounds silly, but I've seen varchar(1024) strings that contained only 1 character but were padded with spaces (not trimmed) and were taking up a huge amount of unecessary storage and consequently made the lookup file quite large.
Unless your file is incredibly badly hashed you will not notice a great difference when you pre-load your hash file to memory for lookups, the big noticeable difference is with write times and lookups when the hash file is not loaded into memory.
The number of columns is irrelevant to lookup speed, the number of bytes of data in those columns is. It will always take longer to transfer 10 columns filled with 1000 character per column than it will to transfer 10 columns of only 1 character per column.
That said, it is always best to reduce your data as far as possible, if you only need the key and one column never load all your column from a hash file or table. Make sure all strings are TRIMmed (sounds silly, but I've seen varchar(1024) strings that contained only 1 character but were padded with spaces (not trimmed) and were taking up a huge amount of unecessary storage and consequently made the lookup file quite large.
Unless your file is incredibly badly hashed you will not notice a great difference when you pre-load your hash file to memory for lookups, the big noticeable difference is with write times and lookups when the hash file is not loaded into memory.
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
Hi,
On top of what was already said and keeping in mind a lookup is performed using a transformer stage ...
do you have complex derivations/stage variables?
are you using any routines in the transformer (before/after stage as well)?
the above also effects your performance since it depends on the slowest participant of the process - having this in mind what is your job flow???.
IHTH,
On top of what was already said and keeping in mind a lookup is performed using a transformer stage ...
do you have complex derivations/stage variables?
are you using any routines in the transformer (before/after stage as well)?
the above also effects your performance since it depends on the slowest participant of the process - having this in mind what is your job flow???.
IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
![Image](http://www.worldcommunitygrid.org/images/logo.gif)
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
![Image](http://www.worldcommunitygrid.org/images/logo.gif)
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom