Maximum performance available in DS server for a lookup

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
kjaouhari
Participant
Posts: 325
Joined: Thu Mar 16, 2006 10:22 am

Maximum performance available in DS server for a lookup

Post by kjaouhari »

Hello all !

I'm currently run a simply job with a lookup between 2 tables on the same database.

I have 8 000 000 rows in entry and 115 000 rows in hash file reference.

When I see the performance statistics I have about 3000 rows per seconds

So I can estimate the job take 8 000 000 / 3000 / 60 = 44 minutes.

Also I want just to know (without take in consideration the database performance ) what is the maximum speed available in DS server 7.5.2.

thanks
Raghavendra
Participant
Posts: 147
Joined: Sat Apr 30, 2005 1:23 am
Location: Bangalore,India

Post by Raghavendra »

Hi.

I have observed number of rows per second will be high at the start of the job and later on number of rows per second will be decreasing.
I believe we may not estimate accurately based on number of rows per second on performance statistics.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Re: Maximum performance available in DS server for a lookup

Post by ArndW »

kjaouhari wrote:...what is the maximum speed available in DS server 7.5.2...
The answer to that question is easy. the maximum speed is about 5 rows per second; it is impossible to get any more throughput on my 5-year old notebook.

(Perhaps the sarcasm wasn't clear enough. If your question were transposed to vehicles it would be "how fast will a car drive?". Even the same model of car will have different answers depending on fuel or if 2 or 3 tires are removed).
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

As Arnd pointed out, the rows/sec is meaningless and platform dependent. In other words, it all depends upon whats under the hood.
If this conerns you and both your source and lookup are in the same database, just pass a sql command to do the lookup. That will be much faster, provided sucha move is ok by your dba's.
You can do a
select * from sourceTable where key in (select key from lookupTable)
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Even that was too detailed.

<soapbox>

Rows/sec is meaningless, especially in small jobs.

Row sizes vary, the clock keeps running even when no rows are being processed, the time is rounded to whole seconds, job designs are different, and so on.

Prefer something like MB/min as a throughput metric, and take the time from the active stage, not from the job.

</soapbox>
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kjaouhari
Participant
Posts: 325
Joined: Thu Mar 16, 2006 10:22 am

Post by kjaouhari »

I think 5 rows per seconds is very good.

My question was more about yours experiences in datastage and alslo yours higher speed in a job ...

I know that speed depends on several parameters.
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

kjaouhari wrote:I think 5 rows per seconds is very good.
:shock:
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
caltog
Premium Member
Premium Member
Posts: 39
Joined: Thu May 22, 2003 7:32 am

Post by caltog »

On average I have more or less the same row/s but time to time the loading time can be impacted by the server capacity and ressource availability.

Most of the time I copy first source data from db into flatfiles, this speed up the the data mapping with hashfiles.
(and I do read only what I need, no extra columns)
caltog
Premium Member
Premium Member
Posts: 39
Joined: Thu May 22, 2003 7:32 am

Post by caltog »

On average I have more or less the same row/s but time to time the loading time can be impacted by the server capacity and ressource availability.

Most of the time I copy first source data from db into flatfiles, this speed up the the data mapping with hashfiles.
(and I do read only what I need, no extra columns)
rachitha
Participant
Posts: 30
Joined: Wed Jun 28, 2006 10:53 am

Post by rachitha »

caltog wrote:On average I have more or less the same row/s but time to time the loading time can be impacted by the server capacity and ressource availability.

Most of the time I copy first source data from db into flatfiles, this speed up the the data mapping with hashfiles.
(and I do read only what I need, no extra columns)
ODBC->Flat file then doing look up . You told this will increase performace.
But until flatfile is loaded we can't proceed right. that means
time(ODBC->Flatfile) + remaining process time = total time.
i think this will take much time to run, please explain me clearly.

Thank You,
Rachitha.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

kjaouhari wrote:I think 5 rows per seconds is very good....
Some server jobs where I currently am do 30,000 rows per second, PX jobs can do in excess of 60,000 rps {given a very big DataStage server and an even bigger DB2 server}.

If you are working with a database, try to see what speed you can get with the fastest native tool that the database has to offer, and then measure your datastage job against that to get a %age of maximum attainable speed for your system configuration. That is a far better starting point than some number of rows-per-second which can't be compared.
Post Reply