Perfomace Improvement??

balu536 · Post by **balu536** » Wed Jul 08, 2009 8:23 am

Hi All,
I've designed a job which handles 80 million data.The jod design flow is as mentioned below

Oracle(Source) --> Transformer --> Sort --> Aggregator --> Oracle(Target)

As of now the job is taking 31 Minutes for execution.

The business logic involved in this job is,the transformer assigns values to few extra columns (are added in the transformer for the incoming data) and Sort stage performs the Sort operation based on key column and Aggregator performs a Max operation on the data grouped by key column

Note:
1.) no stage variables used in transformer
2.)The source table doesn't has any Indexes and my client is not ready to create any indexes on that table as it has dependencies.

Please help me in perfomance improval regarding the datastage wise(like changing the job design etc)

ArndW · Post by **ArndW** » Wed Jul 08, 2009 8:27 am

Replace your output oracle with a dummy copy or a sequential file stage going to /dev/null. Rerun the program. Does it still take 31 minutes? How busy is your CPU during this time? Is Oracle on the same machine as DataStage?

balu536 · Post by **balu536** » Wed Jul 08, 2009 8:34 am

I have initially checked the job using the Peek Stage as output,but no change in the time taken for job execution.

And no other applications are running in parallel on my PC when i'm running the job and mine is a high configuration one.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Wed Jul 08, 2009 8:52 am

What if you set a constraint in the transformer as @FALSE ?

Also is both client and server of DataStage running on same PC ?

ArndW · Post by **ArndW** » Wed Jul 08, 2009 8:54 am

Sainath has already stated the next step. If the time still remains the same, then it isn't your transform or sort and that would mean your SELECT in Oracle is the bottleneck.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Wed Jul 08, 2009 8:59 am

Personally I think the source is not the bottleneck.

It must be the sort and agg in DS.

As you said you are not doing anything in tx, you may be better-off doing the whole in Oracle SQL itself.

Something like

Insert into......
as
Select ......
Group By ......

This way you avoid the transfer over network and also can use the parallel and tuning options in Oracle.

ArndW · Post by **ArndW** » Wed Jul 08, 2009 9:01 am

I might push the sort up front to Oracle - but again, without knowing the answers to questions posed earlier we are just guessing.

setty.ramesh · Post by **setty.ramesh** » Wed Jul 08, 2009 9:24 am

How about,

Oracle(Source) --> Sort --> Transformer (to pick only the maximum values)--> Oracle(Target)

* Earlier aggregate performs only max operation if I am not wrong, which can be handled in transformer using stage variables.

miwinter · Post by **miwinter** » Wed Jul 08, 2009 10:28 am

Perhaps an obvious question, but it wasn't covered in the first post and I haven't seen it since either:

How many nodes are you running it against - single or multiple?
Is the Oracle read against a partitioned table and is this running sequentially or in parallel.

balu536 · Post by **balu536** » Wed Jul 08, 2009 11:12 pm

Hi All,
Sorry for the delayed reply
Below are my ans in the order wise of the replies given.

Sainath: Client and Server are not running on the same PC.

ArndW: As i already stated in initial post that source table doesn't have any indexes and client is not intrested in creating the same and has asked to handle the perfomance in the datstage itself.

Sainath : In this case for extracting the entire data from source it's nearly taking 25 mins and the rest 6-7 minutes is taken by the Aggregator and Sort Stage

ArndW,setty.ramesh : The approach you mentioned like keeping the sort stage immediately after Oracle,I haven't tried it.Will test with the same today.But i doubt i have is is the perfomance better if we use stage variables in transformer for the Max case or instead use aggregator.Using the aggregator was the approach initially in considered with out any stage variables in transformer.Please confirm me on this front.

miwinter:I'm running the job against multiple nodes.The source table is not partitioned and the Source oracle stage by default will be Sequential.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Thu Jul 09, 2009 2:54 am

In that case, try to optimize your SQL in Oracle itself. Alternatively you can break into datastage references and joins - if it simplifies and improves

balu536 · Post by **balu536** » Thu Jul 09, 2009 2:57 am

Hi Sainath,
My SQL Query is Perfectly tuned.In the above post you have mentioned as "Alternatively you can break into datastage references and joins - if it simplifies and improves ".Will you please through some light on the same.

Regards,
Balakrishna

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Thu Jul 09, 2009 3:04 am

To do you, we need information about your query and any issues with the query - such as doing a full table scan due to missing indexes etc.

Can you provide a glimpse of your query so others can see what is happening ?

ArndW · Post by **ArndW** » Thu Jul 09, 2009 4:05 am

Before worrying about your query you need to ensure that the query is the bottleneck. Have you done so?

balu536 · Post by **balu536** » Thu Jul 09, 2009 4:27 am

The source table has 80 million data and it has no indexes and client is not ready to create them.

Query looks something like "Select * from table".This Query takes 25 mins for execution and the rest 5-6 mins for the later portion of job execution

Regards,
Balakrishna

DSXchange

Perfomace Improvement??

Perfomace Improvement??

Re: Perfomace Improvement??