Perfomace Improvement??

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Are you saying that 'select * from table' takes 25 minutes to start !!??
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

balu536 wrote:...Query looks something like "Select * from table".This Query takes 25 mins for execution...
There has to be more there, since a SELECT * without other selection criteria would start pushing rows to DataStage almost immediately. It sounds like your query is doing a full-table-scan (or more) before passing the first row. If you post the query I am sure that you will get some suggestions.
But if your query is the bottleneck and the customer is unwilling to change either the query or the database then you cannot do anything else to make the job run faster. End of discussion on the subject, there are no other options (well, I suppose the customer could buy more hardware, but that is unlikely).
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

What if you place a big file (say 1 or 2Gb for now) in the Oracle server and attempt to ftp it across to DataStage server (and viceversa)?

How long does that take ?

Or if you design a job with "select dummy from dual", how long does that take to start?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

What makes you guys think it doesn't "start" immediately? If it takes 25 minutes to push everything out, all you can do to improve a query like that is to use a parallel hint. And perhaps bump up the array size.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I thought that the poster meant it took 25 minutes before the first row arrives in DataStage - but perhaps I misunderstood.
balu536
Premium Member
Premium Member
Posts: 103
Joined: Tue Dec 02, 2008 5:01 am

Post by balu536 »

Sainath:
"Are you saying that 'select * from table' takes 25 minutes to start !!?? "

Balu : Here 25 minutes i mentioned implies the total time required to extract the complete data.In mean time the records are extracted(i.e for 5 minutes some 15 million,10 minutes- 29 million...etc)

chulett:
"What makes you guys think it doesn't "start" immediately? If it takes 25 minutes to push everything out, all you can do to improve a query like that is to use a parallel hint. And perhaps bump up the ... "

Balu : Chulett,thanks for the suggestion,i'll try to re-run the job after implementing the hints and will let you know if any perfomance improval is observed.


Regards,
Balakrishna
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

if it is "select * from table" try chaging the read mode to table instead of putting user defined query and see if it takes less time.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Balu,

DataStage runs pipeline parallelism mode. So the 29 minutes of Oracle extration will have impact due to stages further down the line.

Did you try applying @FALSE in the transformer stage?

If that shows any improvement, we can work on other stages to carry it till the end.
Post Reply