Perfomace Improvement??
Moderators: chulett, rschirm, roy
Perfomace Improvement??
Hi All,
I've designed a job which handles 80 million data.The jod design flow is as mentioned below
Oracle(Source) --> Transformer --> Sort --> Aggregator --> Oracle(Target)
As of now the job is taking 31 Minutes for execution.
The business logic involved in this job is,the transformer assigns values to few extra columns (are added in the transformer for the incoming data) and Sort stage performs the Sort operation based on key column and Aggregator performs a Max operation on the data grouped by key column
Note:
1.) no stage variables used in transformer
2.)The source table doesn't has any Indexes and my client is not ready to create any indexes on that table as it has dependencies.
Please help me in perfomance improval regarding the datastage wise(like changing the job design etc)
I've designed a job which handles 80 million data.The jod design flow is as mentioned below
Oracle(Source) --> Transformer --> Sort --> Aggregator --> Oracle(Target)
As of now the job is taking 31 Minutes for execution.
The business logic involved in this job is,the transformer assigns values to few extra columns (are added in the transformer for the incoming data) and Sort stage performs the Sort operation based on key column and Aggregator performs a Max operation on the data grouped by key column
Note:
1.) no stage variables used in transformer
2.)The source table doesn't has any Indexes and my client is not ready to create any indexes on that table as it has dependencies.
Please help me in perfomance improval regarding the datastage wise(like changing the job design etc)
Replace your output oracle with a dummy copy or a sequential file stage going to /dev/null. Rerun the program. Does it still take 31 minutes? How busy is your CPU during this time? Is Oracle on the same machine as DataStage?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
Sainath has already stated the next step. If the time still remains the same, then it isn't your transform or sort and that would mean your SELECT in Oracle is the bottleneck.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
Personally I think the source is not the bottleneck.
It must be the sort and agg in DS.
As you said you are not doing anything in tx, you may be better-off doing the whole in Oracle SQL itself.
Something like
Insert into......
as
Select ......
Group By ......
This way you avoid the transfer over network and also can use the parallel and tuning options in Oracle.
It must be the sort and agg in DS.
As you said you are not doing anything in tx, you may be better-off doing the whole in Oracle SQL itself.
Something like
Insert into......
as
Select ......
Group By ......
This way you avoid the transfer over network and also can use the parallel and tuning options in Oracle.
I might push the sort up front to Oracle - but again, without knowing the answers to questions posed earlier we are just guessing.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Premium Member
- Posts: 1
- Joined: Thu Nov 09, 2006 12:57 pm
- Location: hyderabad
- Contact:
Re: Perfomace Improvement??
How about,
Oracle(Source) --> Sort --> Transformer (to pick only the maximum values)--> Oracle(Target)
* Earlier aggregate performs only max operation if I am not wrong, which can be handled in transformer using stage variables.
Oracle(Source) --> Sort --> Transformer (to pick only the maximum values)--> Oracle(Target)
* Earlier aggregate performs only max operation if I am not wrong, which can be handled in transformer using stage variables.
Perhaps an obvious question, but it wasn't covered in the first post and I haven't seen it since either:
How many nodes are you running it against - single or multiple?
Is the Oracle read against a partitioned table and is this running sequentially or in parallel.
How many nodes are you running it against - single or multiple?
Is the Oracle read against a partitioned table and is this running sequentially or in parallel.
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
Hi All,
Sorry for the delayed reply
Below are my ans in the order wise of the replies given.
Sainath: Client and Server are not running on the same PC.
ArndW: As i already stated in initial post that source table doesn't have any indexes and client is not intrested in creating the same and has asked to handle the perfomance in the datstage itself.
Sainath : In this case for extracting the entire data from source it's nearly taking 25 mins and the rest 6-7 minutes is taken by the Aggregator and Sort Stage
ArndW,setty.ramesh : The approach you mentioned like keeping the sort stage immediately after Oracle,I haven't tried it.Will test with the same today.But i doubt i have is is the perfomance better if we use stage variables in transformer for the Max case or instead use aggregator.Using the aggregator was the approach initially in considered with out any stage variables in transformer.Please confirm me on this front.
miwinter:I'm running the job against multiple nodes.The source table is not partitioned and the Source oracle stage by default will be Sequential.
Sorry for the delayed reply
Below are my ans in the order wise of the replies given.
Sainath: Client and Server are not running on the same PC.
ArndW: As i already stated in initial post that source table doesn't have any indexes and client is not intrested in creating the same and has asked to handle the perfomance in the datstage itself.
Sainath : In this case for extracting the entire data from source it's nearly taking 25 mins and the rest 6-7 minutes is taken by the Aggregator and Sort Stage
ArndW,setty.ramesh : The approach you mentioned like keeping the sort stage immediately after Oracle,I haven't tried it.Will test with the same today.But i doubt i have is is the perfomance better if we use stage variables in transformer for the Max case or instead use aggregator.Using the aggregator was the approach initially in considered with out any stage variables in transformer.Please confirm me on this front.
miwinter:I'm running the job against multiple nodes.The source table is not partitioned and the Source oracle stage by default will be Sequential.
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
Before worrying about your query you need to ensure that the query is the bottleneck. Have you done so?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>