Page 1 of 2

Performance Issue

Posted: Mon Jan 08, 2007 12:09 pm
by pradkumar
Hi

I am trying to extract the data from Oracle SOurce.
Initially I used Dynamic RDBMS stgae as source stage. It was pulling like 4 rows per sec. I changed the array size to 1000 from 1.
But it was of no use. I took the query and ran it in Toad and checked the Explain Plan.
Everything looks ok. The Query took only fractions of seconds to output the data.

What might be the problem with the performance point of view?

Posted: Mon Jan 08, 2007 12:22 pm
by kcbland
Did you measure the time to get the first row or the last row? Just because Toad returned 100 rows in 2 seconds doesn't mean that the next 10000 rows are going to come any sooner. You measure the performance to get the last row, not the first row.

Take the same query in Toad, but put an order by into the statement. This forces the entire set of data to be prepared first, then start returning rows. This would give you a measure of how long it takes to get the entire set of results, minus the time to send over a network and to DataStage.

Posted: Mon Jan 08, 2007 12:28 pm
by chulett
Or the other 'Toad Trick' is to wrap your query in one that just does a 'select count(*) from (your query)'. Either way you have the (hopefully negligible) added time it takes to sort or count, but in either case you'll get a better picture of how long it takes to build the entire result set, not just get the first cursor-full back.

Posted: Mon Jan 08, 2007 12:35 pm
by pradkumar
Thanks For the reply

Its not taking a lot of time..Its taking around 2min for 1 million records

Posted: Mon Jan 08, 2007 12:39 pm
by narasimha
If I have not mistaken, Toad gives you only a few result rows when you run a query, it pulls only like 500 rows initially, may be thats why the response time looks good.
Correct me if my understanding is wrong here.

Posted: Mon Jan 08, 2007 12:45 pm
by I_Server_Whale
narasimha wrote:If I have not mistaken, Toad gives you only a few result rows when you run a query, it pulls only like 500 rows initially, may be thats why the response time looks good.
Correct me if my understanding is wrong here.
Kenneth and Craig have previously asserted the same, so you are right. :)

Posted: Mon Jan 08, 2007 12:51 pm
by narasimha
pradkumar wrote:Thanks For the reply

Its not taking a lot of time..Its taking around 2min for 1 million records
Good, so does that resolve your Performance Issue?
If yes mark as Resolved. :wink:

Posted: Mon Jan 08, 2007 12:57 pm
by pradkumar
Narasimha, What is meant by it picks up only 500 rows initially for the result set.
Its actually returning me the entire data set of 1 million records within 2min of time.
I am unable to get that point.

Posted: Mon Jan 08, 2007 1:19 pm
by narasimha
Let me try and give an example.

If you do a select * from a large table, you get results almost instantanously(depends on your data volume)
What I assume and what shows in the Datagrid is "Row 1 of 500 fetched so far (more rows exists)"
If you now try to pull the scroll bar of your Data Grid, it starts fetching more rows.... (This may take a while now to give you the entire results)
Again these are my assumptions, not sure how exactly Toad applies these constraints/filters

Re: Performance Issue

Posted: Mon Jan 08, 2007 1:35 pm
by Krazykoolrohit
pradkumar wrote:Hi

I am trying to extract the data from Oracle SOurce.
Initially I used Dynamic RDBMS stgae as source stage. It was pulling like 4 rows per sec. I changed the array size to 1000 from 1.
But it was of no use. I took the query and ran it in Toad and checked the Explain Plan.
Everything looks ok. The Query took only fractions of seconds to output the data.

What might be the problem with the performance point of view?
what is your job design? are you using any transformations in the same job? are you loading to another table in the same job?

Posted: Mon Jan 08, 2007 1:41 pm
by pradkumar
My actual job design is

Code: Select all

Source---->Transformer(selflkup for Key, InsertNew and Update output links)------>Insert_new(KeyGeneration)--->TARGET
									   ------->uPDATE EXISTING------------>SAME TARGET

Posted: Mon Jan 08, 2007 1:42 pm
by DSguru2B
Great. Now put @FALSE in the constraint of the output link inside the transformer. Then tell us what is the select rows/sec ?

Posted: Mon Jan 08, 2007 1:51 pm
by chulett
In other words, what you are going to find is your 'performance problem' isn't on the source side, it's on the target side. And dollars to doughnuts it's your update link. Are the fields you are using as your Keys - the ones used on the where clause - indexed?

Posted: Mon Jan 08, 2007 2:22 pm
by kcbland
If you have a query like this:

Code: Select all

SELECT A.*, B.*, C.* FROM TABLE_A, TABLE_B, TABLE_C
WHERE A.XYZ=B.XYZ AND B.MNO=C.MNO
AND B.EFFDT BETWEEN C.EFFDT AND C.ENDDT
The time to get the first row is the time find the first prefetch number of rows and assemble all of the columns and conduct all of the joins.

If you add an ORDER BY clause, all of the rows have to be computed and moved into a temporary holding space. This gives you an approximation of the time required to gather up all of the results.

If you add SELECT COUNT(*) FROM (the above query) the optimizer will bypass assembling the results, it will use vastly less space, and it can re-write the explain plan and cut a lot of corners because the only things it needs to look at is the join columns. This potentially means indexes that were avoided because full table scans were needed anyway can now be used. You get an unreliable answer. If using Toad Oracle, take a complicated query and look at the explain plan, then put the SELECT COUNT(*) and and recheck. You'll see the plan and cost can change dramatically.

This posters problem is the all-in-one job that new users always write. They never understand that their transformation pipeline runs at the speed of the slowest consuming process. In this case, it's obviously the receiving database putting back-pressure all the way up the pipeline.

Posted: Mon Jan 08, 2007 2:27 pm
by chulett
Yah, there's nothing like back-pressure all the way up your pipeline to ruin a perfectly good day. :wink:

ps. Ken, I don't mean replace what you are selecting with a COUNT(*), I mean wrap the whole original query in a count select of the result set. Don't believe that changes the explain plan of the original query... guess I need to go double-check for my own sanity's sake. :?