Look up is vey slow

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Das
Participant
Posts: 87
Joined: Tue Oct 24, 2006 9:58 pm
Location: india

Look up is vey slow

Post by Das »

Hi
I want to do a lookup with a table having 50 million records ,
with the source flat file having 45 millon records

First i have implimented it trough a haded file ,the hashed file created out of the table -
It was very slow


Now I have implimented a direct orcale stage lookup-It is based on two keys -the keys are B tree indexed

Here also the performance is very poor ,look up speed is 280 rows per sec

Please let me know you suggestions to handle the scenario
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What speed do you get if you do no lookups? Maybe the bottleneck is in the stream input rather than in the reference input.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Das
Participant
Posts: 87
Joined: Tue Oct 24, 2006 9:58 pm
Location: india

Post by Das »

Thanks...

With out lookup it is processing 3000 to 4000 records per sec
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Is the hashed file created/populated in the same job or in a different job? If it's created in the same job then rows/sec is meaningless because it includes the wait time while the hashed file is populated. Populate the hashed file in a separate job and be prepared to be amazed.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Das
Participant
Posts: 87
Joined: Tue Oct 24, 2006 9:58 pm
Location: india

Post by Das »

The job logic is as given below

Case 1:

Job1-Create hashed file from table of 60 million records (fetching around 45 million records based on condition )
Job2-Using the hashed file created from job for look up purpose

Job1 is taking long time to create hashed file,running 400 rows/sec

So I tried the below option

with out hashed file,Do a direct look up on indexed columns in the table..

Please suggest...
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Here are the three main ideas for getting the best out of write performance with hashed files.
  • Critically examine what rows and columns are loaded into the hashed file, and eliminate any that are not required.

    Pre-size the hashed file using MINIMUM.MODULUS setting (get an approximate value from the current size of the hashed file, divided by the group size (2KB)).

    If the hashed file is under 999MB consider maximizing your write cache buffer size and employing cached write.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply