Perfomace Improvement??

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

balu536
Premium Member
Premium Member
Posts: 103
Joined: Tue Dec 02, 2008 5:01 am

Perfomace Improvement??

Post by balu536 »

Hi All,
I've designed a job which handles 80 million data.The jod design flow is as mentioned below

Oracle(Source) --> Transformer --> Sort --> Aggregator --> Oracle(Target)

As of now the job is taking 31 Minutes for execution.

The business logic involved in this job is,the transformer assigns values to few extra columns (are added in the transformer for the incoming data) and Sort stage performs the Sort operation based on key column and Aggregator performs a Max operation on the data grouped by key column

Note:
1.) no stage variables used in transformer
2.)The source table doesn't has any Indexes and my client is not ready to create any indexes on that table as it has dependencies.

Please help me in perfomance improval regarding the datastage wise(like changing the job design etc)
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Replace your output oracle with a dummy copy or a sequential file stage going to /dev/null. Rerun the program. Does it still take 31 minutes? How busy is your CPU during this time? Is Oracle on the same machine as DataStage?
balu536
Premium Member
Premium Member
Posts: 103
Joined: Tue Dec 02, 2008 5:01 am

Post by balu536 »

I have initially checked the job using the Peek Stage as output,but no change in the time taken for job execution. :(

And no other applications are running in parallel on my PC when i'm running the job and mine is a high configuration one.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

What if you set a constraint in the transformer as @FALSE ?

Also is both client and server of DataStage running on same PC ?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Sainath has already stated the next step. If the time still remains the same, then it isn't your transform or sort and that would mean your SELECT in Oracle is the bottleneck.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Personally I think the source is not the bottleneck.

It must be the sort and agg in DS.

As you said you are not doing anything in tx, you may be better-off doing the whole in Oracle SQL itself.

Something like

Insert into......
as
Select ......
Group By ......

This way you avoid the transfer over network and also can use the parallel and tuning options in Oracle.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I might push the sort up front to Oracle - but again, without knowing the answers to questions posed earlier we are just guessing.
setty.ramesh
Premium Member
Premium Member
Posts: 1
Joined: Thu Nov 09, 2006 12:57 pm
Location: hyderabad
Contact:

Re: Perfomace Improvement??

Post by setty.ramesh »

How about,

Oracle(Source) --> Sort --> Transformer (to pick only the maximum values)--> Oracle(Target)

* Earlier aggregate performs only max operation if I am not wrong, which can be handled in transformer using stage variables.
miwinter
Participant
Posts: 396
Joined: Thu Jun 22, 2006 7:00 am
Location: England, UK

Post by miwinter »

Perhaps an obvious question, but it wasn't covered in the first post and I haven't seen it since either:

How many nodes are you running it against - single or multiple?
Is the Oracle read against a partitioned table and is this running sequentially or in parallel.
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
balu536
Premium Member
Premium Member
Posts: 103
Joined: Tue Dec 02, 2008 5:01 am

Post by balu536 »

Hi All,
Sorry for the delayed reply
Below are my ans in the order wise of the replies given.

Sainath: Client and Server are not running on the same PC.

ArndW: As i already stated in initial post that source table doesn't have any indexes and client is not intrested in creating the same and has asked to handle the perfomance in the datstage itself. :(

Sainath : In this case for extracting the entire data from source it's nearly taking 25 mins and the rest 6-7 minutes is taken by the Aggregator and Sort Stage

ArndW,setty.ramesh : The approach you mentioned like keeping the sort stage immediately after Oracle,I haven't tried it.Will test with the same today.But i doubt i have is is the perfomance better if we use stage variables in transformer for the Max case or instead use aggregator.Using the aggregator was the approach initially in considered with out any stage variables in transformer.Please confirm me on this front.

miwinter:I'm running the job against multiple nodes.The source table is not partitioned and the Source oracle stage by default will be Sequential.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

In that case, try to optimize your SQL in Oracle itself. Alternatively you can break into datastage references and joins - if it simplifies and improves
balu536
Premium Member
Premium Member
Posts: 103
Joined: Tue Dec 02, 2008 5:01 am

Post by balu536 »

Hi Sainath,
My SQL Query is Perfectly tuned.In the above post you have mentioned as "Alternatively you can break into datastage references and joins - if it simplifies and improves ".Will you please through some light on the same.


Regards,
Balakrishna
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

To do you, we need information about your query and any issues with the query - such as doing a full table scan due to missing indexes etc.

Can you provide a glimpse of your query so others can see what is happening ?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Before worrying about your query you need to ensure that the query is the bottleneck. Have you done so?
balu536
Premium Member
Premium Member
Posts: 103
Joined: Tue Dec 02, 2008 5:01 am

Post by balu536 »

The source table has 80 million data and it has no indexes and client is not ready to create them.

Query looks something like "Select * from table".This Query takes 25 mins for execution and the rest 5-6 mins for the later portion of job execution


Regards,
Balakrishna
Post Reply