How to improve performance

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

How to improve performance

Post by pradkumar »

Hi Everyone,

I have a job design as

DRS(DB2 database) ---> Copy stage--->DRS( Sqlserver)
|
Aggregator --->ODBC(audit Source rows)

The job is taking around 15 mins to complete in datastage where as the same job is taking 8mins in SSIS. The volume of records is 2million and the rows for transaction is around 1000/sec. After monitoring the job i got to know that 14mins of time is taking while extracting process only. Even i played by changing the DRS stage to ODBC connector /ODBC EE and also with the array size ( 1 to 5000) but no luck .

Can any one please suggest how to improve the performance, ( need to reduce the processing time to 8 minutes) .

Thanks In Advance
Nagaraj
Premium Member
Premium Member
Posts: 383
Joined: Thu Nov 08, 2007 12:32 am
Location: Bangalore

Post by Nagaraj »

Why dont you Pull the second link for AGG from COpy stage.? I assume that you are using the second link (Agg) to get the count rows.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What read method are you using to get the data out of DB2? Use the Resource Estimator and/or Job Monitor to give more targeted metrics. If the import really is the resource hog, this is the area upon which your investigations need to be more closely focussed.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

Post by pradkumar »

Hi Ray,


Iam using user defined SQL as read method to pull data from DB2 database.In the job monitor i found that 11.34 mins of total time 12 mins is taking for the source stage. can i play by changing the array size , read committed ? What changes have to make to improve it. Please sugest.


Thanks
pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

Post by pradkumar »

Hi Ray,


Iam using user defined SQL as read method to pull data from DB2 database.In the job monitor i found that 11.34 mins of total time 12 mins is taking for the source stage. can i play by changing the array size , read committed ? What changes have to make to improve it. Please sugest.


Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

No idea. What is your user-defined SQL doing? Are any constraints, joins, groupings or sorts supported by indexes in the database? Could this be faster using bulk unload and processing in DataStage?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DEEPTI
Participant
Posts: 9
Joined: Wed Nov 21, 2007 12:19 am

Post by DEEPTI »

pradkumar wrote:Hi Ray,


Iam using user defined SQL as read method to pull data from DB2 database.In the job monitor i found that 11.34 mins of total time 12 mins is taking for the source stage. can i play by changing the array size , read committed ? What changes have to make to improve it. Please sugest.


Thanks

If u r using joins in the source query
then specify alias for each table and have each column with alias table name.column name ...
This reduces the time taken to execute the query ..as it will not search each table for the column

Thanks
Deepti
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Not "u" but "you".
Not "r" but "are".

Do you use these SMS-style abbreviations when documenting your work? Then please don't use them here.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DEEPTI
Participant
Posts: 9
Joined: Wed Nov 21, 2007 12:19 am

Post by DEEPTI »

My apology ..
Thanks Ray for suggesting..
Deepti
pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

Post by pradkumar »

Hi

Thanks for your replys ..

But my source(db2 database/odbc EE stage) has 2 million records flows to a copy stage ; from their one link directly loads in to target sql server(drs stage) and 2nd link from copy stage to aggregator where it takes record count as aggregation type and loads just 1 record to the audit table... The same job design is taking 8 mins of time in SSIS where as its taking 15 mins in datastage .. is there any way can we improve the performace ( like doing partitioning etc)

Thanks in Advance
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What partitioning do you already have? Is the DB2 table partitioned?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Gazelle
Premium Member
Premium Member
Posts: 108
Joined: Mon Nov 24, 2003 11:36 pm
Location: Australia (Melbourne)

Post by Gazelle »

I don't know if this will help in your case, but we found a bottleneck in how the DB2 database was configured.

The symptoms were similar: our jobs would spend >80% of their time waiting for rows to be returned from the DB2 read.

We reduced this to about 20%, by completely restructuring DB2:
- defining Indexes, Logs and Data to be on separate filesystems (ideally different physical disk spindles);
- creating multiple "containers" for the tablespaces on separate filesystems (again, ideally on different disk spindles);
- maximising some of the db2 parameters (such as the buffer size), and the way that Unix manages its memory. [Sorry, but I do not remember all of the parameters].

So this might be more of a DB2 issue rather than a DataStage issue.
Note that if DB2 is on the same server as DataStage, then there's a bit of disagreement over the optimum Unix settings. They don't really play nicely together.
Post Reply