Drastic slowdown of large hash file lookups

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
quiztime
Premium Member
Premium Member
Posts: 4
Joined: Fri Nov 16, 2007 2:26 am
Location: Sydney, Australia

Drastic slowdown of large hash file lookups

Post by quiztime »

Hi,

We have just experienced a dramatic decrease of performance of large hash file lookups on our Test server and are having a lot of trouble diagnosing the reason for this.

Summary:
- As an example - a step in job looks up a 4.7M row hash file with a DATA.30 file of 650Mb and OVER.30 of 208Mb
- We have stripped down the job to just the lookup step with sequential files as both input and ouput in order to eliminate any doubt about network/database perfromance. The problem is definitely the slow lookup of hash files.
- The the lookup used to run at 10000 rows per second on the Test server and suddenly started running at 350 rows per second FOR THE SAME DATA - yes we are rerunning identical data.
(This job is one example however all jobs with large hash file lookups seem to have been affected on the Test server)
- Running the same job for the same data on our Development server consistently runs at around 10000 runs per second.
- Monitoring resources on the Test server shows that is not being pushed for either CPU or memory or semingly I/O.
- There have been no system changes to the Test server since the job was running normally to explain why it has slowed down (Antivirus is turned off)
- I have tried the following of combinations on the Test server to see if they make a difference but none has:
a.) storing the hash file on different drives (E drive is a local 150Gb 15k SCSI drive and F drive is 300Gb SAN attached storage)
b.) Running the job from a different project after re-deloying the master copy of teh job from source control
c.) The hash file with both the preload to memory option turned on and off (seems to be a bit slower when not preloading to memory i.e. around 250 rows/s)

Any help or suggestions on what else we can test will be much appreciated

Thanks,

Alex
First make it run, then make it right and then make it fast.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Re: Drastic slowdown of large hash file lookups

Post by ray.wurlod »

quiztime wrote:The the lookup used to run at 10000 rows per second on the Test server and suddenly started running at 350 rows per second FOR THE SAME DATA - yes we are rerunning identical data.
What changed? "Nothing" is not correct answer. Something changed on the system somewhere.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Is it returning rows from the lookup? Not finding a row means that no data is read from disk and less cpu intensive.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
asitagrawal
Premium Member
Premium Member
Posts: 273
Joined: Wed Oct 18, 2006 12:20 pm
Location: Porto

Post by asitagrawal »

Check the "Tunables" settings for the project and the jobs.
Share to Learn, and Learn to Share.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Can you resize the file to a static hashed file or at least try a "RESIZE {filename} * * *" and see if that makes a difference?
Post Reply