Page 1 of 1

Maximum performance available in DS server for a lookup

Posted: Fri Dec 01, 2006 9:32 am
by kjaouhari
Hello all !

I'm currently run a simply job with a lookup between 2 tables on the same database.

I have 8 000 000 rows in entry and 115 000 rows in hash file reference.

When I see the performance statistics I have about 3000 rows per seconds

So I can estimate the job take 8 000 000 / 3000 / 60 = 44 minutes.

Also I want just to know (without take in consideration the database performance ) what is the maximum speed available in DS server 7.5.2.

thanks

Posted: Fri Dec 01, 2006 9:52 am
by Raghavendra
Hi.

I have observed number of rows per second will be high at the start of the job and later on number of rows per second will be decreasing.
I believe we may not estimate accurately based on number of rows per second on performance statistics.

Re: Maximum performance available in DS server for a lookup

Posted: Fri Dec 01, 2006 11:03 am
by ArndW
kjaouhari wrote:...what is the maximum speed available in DS server 7.5.2...
The answer to that question is easy. the maximum speed is about 5 rows per second; it is impossible to get any more throughput on my 5-year old notebook.

(Perhaps the sarcasm wasn't clear enough. If your question were transposed to vehicles it would be "how fast will a car drive?". Even the same model of car will have different answers depending on fuel or if 2 or 3 tires are removed).

Posted: Fri Dec 01, 2006 11:19 am
by DSguru2B
As Arnd pointed out, the rows/sec is meaningless and platform dependent. In other words, it all depends upon whats under the hood.
If this conerns you and both your source and lookup are in the same database, just pass a sql command to do the lookup. That will be much faster, provided sucha move is ok by your dba's.
You can do a
select * from sourceTable where key in (select key from lookupTable)

Posted: Fri Dec 01, 2006 2:28 pm
by ray.wurlod
Even that was too detailed.

<soapbox>

Rows/sec is meaningless, especially in small jobs.

Row sizes vary, the clock keeps running even when no rows are being processed, the time is rounded to whole seconds, job designs are different, and so on.

Prefer something like MB/min as a throughput metric, and take the time from the active stage, not from the job.

</soapbox>

Posted: Fri Dec 01, 2006 4:01 pm
by kjaouhari
I think 5 rows per seconds is very good.

My question was more about yours experiences in datastage and alslo yours higher speed in a job ...

I know that speed depends on several parameters.

Posted: Fri Dec 01, 2006 4:40 pm
by I_Server_Whale
kjaouhari wrote:I think 5 rows per seconds is very good.
:shock:

Posted: Mon Dec 04, 2006 4:45 am
by caltog
On average I have more or less the same row/s but time to time the loading time can be impacted by the server capacity and ressource availability.

Most of the time I copy first source data from db into flatfiles, this speed up the the data mapping with hashfiles.
(and I do read only what I need, no extra columns)

Posted: Mon Dec 04, 2006 4:46 am
by caltog
On average I have more or less the same row/s but time to time the loading time can be impacted by the server capacity and ressource availability.

Most of the time I copy first source data from db into flatfiles, this speed up the the data mapping with hashfiles.
(and I do read only what I need, no extra columns)

Posted: Mon Dec 04, 2006 5:14 am
by rachitha
caltog wrote:On average I have more or less the same row/s but time to time the loading time can be impacted by the server capacity and ressource availability.

Most of the time I copy first source data from db into flatfiles, this speed up the the data mapping with hashfiles.
(and I do read only what I need, no extra columns)
ODBC->Flat file then doing look up . You told this will increase performace.
But until flatfile is loaded we can't proceed right. that means
time(ODBC->Flatfile) + remaining process time = total time.
i think this will take much time to run, please explain me clearly.

Thank You,
Rachitha.

Posted: Mon Dec 04, 2006 5:30 am
by ArndW
kjaouhari wrote:I think 5 rows per seconds is very good....
Some server jobs where I currently am do 30,000 rows per second, PX jobs can do in excess of 60,000 rps {given a very big DataStage server and an even bigger DB2 server}.

If you are working with a database, try to see what speed you can get with the fastest native tool that the database has to offer, and then measure your datastage job against that to get a %age of maximum attainable speed for your system configuration. That is a far better starting point than some number of rows-per-second which can't be compared.