Performance Issue

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

Performance Issue

Post by pradkumar »

Hi

I am trying to extract the data from Oracle SOurce.
Initially I used Dynamic RDBMS stgae as source stage. It was pulling like 4 rows per sec. I changed the array size to 1000 from 1.
But it was of no use. I took the query and ran it in Toad and checked the Explain Plan.
Everything looks ok. The Query took only fractions of seconds to output the data.

What might be the problem with the performance point of view?
Pradeep Kumar
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Did you measure the time to get the first row or the last row? Just because Toad returned 100 rows in 2 seconds doesn't mean that the next 10000 rows are going to come any sooner. You measure the performance to get the last row, not the first row.

Take the same query in Toad, but put an order by into the statement. This forces the entire set of data to be prepared first, then start returning rows. This would give you a measure of how long it takes to get the entire set of results, minus the time to send over a network and to DataStage.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Or the other 'Toad Trick' is to wrap your query in one that just does a 'select count(*) from (your query)'. Either way you have the (hopefully negligible) added time it takes to sort or count, but in either case you'll get a better picture of how long it takes to build the entire result set, not just get the first cursor-full back.
-craig

"You can never have too many knives" -- Logan Nine Fingers
pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

Post by pradkumar »

Thanks For the reply

Its not taking a lot of time..Its taking around 2min for 1 million records
Pradeep Kumar
narasimha
Charter Member
Charter Member
Posts: 1236
Joined: Fri Oct 22, 2004 8:59 am
Location: Staten Island, NY

Post by narasimha »

If I have not mistaken, Toad gives you only a few result rows when you run a query, it pulls only like 500 rows initially, may be thats why the response time looks good.
Correct me if my understanding is wrong here.
Narasimha Kade

Finding answers is simple, all you need to do is come up with the correct questions.
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

narasimha wrote:If I have not mistaken, Toad gives you only a few result rows when you run a query, it pulls only like 500 rows initially, may be thats why the response time looks good.
Correct me if my understanding is wrong here.
Kenneth and Craig have previously asserted the same, so you are right. :)
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
narasimha
Charter Member
Charter Member
Posts: 1236
Joined: Fri Oct 22, 2004 8:59 am
Location: Staten Island, NY

Post by narasimha »

pradkumar wrote:Thanks For the reply

Its not taking a lot of time..Its taking around 2min for 1 million records
Good, so does that resolve your Performance Issue?
If yes mark as Resolved. :wink:
Narasimha Kade

Finding answers is simple, all you need to do is come up with the correct questions.
pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

Post by pradkumar »

Narasimha, What is meant by it picks up only 500 rows initially for the result set.
Its actually returning me the entire data set of 1 million records within 2min of time.
I am unable to get that point.
Pradeep Kumar
narasimha
Charter Member
Charter Member
Posts: 1236
Joined: Fri Oct 22, 2004 8:59 am
Location: Staten Island, NY

Post by narasimha »

Let me try and give an example.

If you do a select * from a large table, you get results almost instantanously(depends on your data volume)
What I assume and what shows in the Datagrid is "Row 1 of 500 fetched so far (more rows exists)"
If you now try to pull the scroll bar of your Data Grid, it starts fetching more rows.... (This may take a while now to give you the entire results)
Again these are my assumptions, not sure how exactly Toad applies these constraints/filters
Narasimha Kade

Finding answers is simple, all you need to do is come up with the correct questions.
Krazykoolrohit
Charter Member
Charter Member
Posts: 560
Joined: Wed Jul 13, 2005 5:36 am
Location: Ohio

Re: Performance Issue

Post by Krazykoolrohit »

pradkumar wrote:Hi

I am trying to extract the data from Oracle SOurce.
Initially I used Dynamic RDBMS stgae as source stage. It was pulling like 4 rows per sec. I changed the array size to 1000 from 1.
But it was of no use. I took the query and ran it in Toad and checked the Explain Plan.
Everything looks ok. The Query took only fractions of seconds to output the data.

What might be the problem with the performance point of view?
what is your job design? are you using any transformations in the same job? are you loading to another table in the same job?
pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

Post by pradkumar »

My actual job design is

Code: Select all

Source---->Transformer(selflkup for Key, InsertNew and Update output links)------>Insert_new(KeyGeneration)--->TARGET
									   ------->uPDATE EXISTING------------>SAME TARGET
Pradeep Kumar
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Great. Now put @FALSE in the constraint of the output link inside the transformer. Then tell us what is the select rows/sec ?
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

In other words, what you are going to find is your 'performance problem' isn't on the source side, it's on the target side. And dollars to doughnuts it's your update link. Are the fields you are using as your Keys - the ones used on the where clause - indexed?
-craig

"You can never have too many knives" -- Logan Nine Fingers
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

If you have a query like this:

Code: Select all

SELECT A.*, B.*, C.* FROM TABLE_A, TABLE_B, TABLE_C
WHERE A.XYZ=B.XYZ AND B.MNO=C.MNO
AND B.EFFDT BETWEEN C.EFFDT AND C.ENDDT
The time to get the first row is the time find the first prefetch number of rows and assemble all of the columns and conduct all of the joins.

If you add an ORDER BY clause, all of the rows have to be computed and moved into a temporary holding space. This gives you an approximation of the time required to gather up all of the results.

If you add SELECT COUNT(*) FROM (the above query) the optimizer will bypass assembling the results, it will use vastly less space, and it can re-write the explain plan and cut a lot of corners because the only things it needs to look at is the join columns. This potentially means indexes that were avoided because full table scans were needed anyway can now be used. You get an unreliable answer. If using Toad Oracle, take a complicated query and look at the explain plan, then put the SELECT COUNT(*) and and recheck. You'll see the plan and cost can change dramatically.

This posters problem is the all-in-one job that new users always write. They never understand that their transformation pipeline runs at the speed of the slowest consuming process. In this case, it's obviously the receiving database putting back-pressure all the way up the pipeline.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yah, there's nothing like back-pressure all the way up your pipeline to ruin a perfectly good day. :wink:

ps. Ken, I don't mean replace what you are selecting with a COUNT(*), I mean wrap the whole original query in a count select of the result set. Don't believe that changes the explain plan of the original query... guess I need to go double-check for my own sanity's sake. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply