Page 1 of 1

How to improve performance

Posted: Fri Nov 14, 2008 2:18 pm
by pradkumar
Hi Everyone,

I have a job design as

DRS(DB2 database) ---> Copy stage--->DRS( Sqlserver)
|
Aggregator --->ODBC(audit Source rows)

The job is taking around 15 mins to complete in datastage where as the same job is taking 8mins in SSIS. The volume of records is 2million and the rows for transaction is around 1000/sec. After monitoring the job i got to know that 14mins of time is taking while extracting process only. Even i played by changing the DRS stage to ODBC connector /ODBC EE and also with the array size ( 1 to 5000) but no luck .

Can any one please suggest how to improve the performance, ( need to reduce the processing time to 8 minutes) .

Thanks In Advance

Posted: Fri Nov 14, 2008 4:33 pm
by Nagaraj
Why dont you Pull the second link for AGG from COpy stage.? I assume that you are using the second link (Agg) to get the count rows.

Posted: Fri Nov 14, 2008 9:51 pm
by ray.wurlod
What read method are you using to get the data out of DB2? Use the Resource Estimator and/or Job Monitor to give more targeted metrics. If the import really is the resource hog, this is the area upon which your investigations need to be more closely focussed.

Posted: Mon Nov 17, 2008 1:25 pm
by pradkumar
Hi Ray,


Iam using user defined SQL as read method to pull data from DB2 database.In the job monitor i found that 11.34 mins of total time 12 mins is taking for the source stage. can i play by changing the array size , read committed ? What changes have to make to improve it. Please sugest.


Thanks

Posted: Mon Nov 17, 2008 1:28 pm
by pradkumar
Hi Ray,


Iam using user defined SQL as read method to pull data from DB2 database.In the job monitor i found that 11.34 mins of total time 12 mins is taking for the source stage. can i play by changing the array size , read committed ? What changes have to make to improve it. Please sugest.


Thanks

Posted: Tue Nov 18, 2008 1:12 am
by ray.wurlod
No idea. What is your user-defined SQL doing? Are any constraints, joins, groupings or sorts supported by indexes in the database? Could this be faster using bulk unload and processing in DataStage?

Posted: Tue Nov 18, 2008 2:38 am
by DEEPTI
pradkumar wrote:Hi Ray,


Iam using user defined SQL as read method to pull data from DB2 database.In the job monitor i found that 11.34 mins of total time 12 mins is taking for the source stage. can i play by changing the array size , read committed ? What changes have to make to improve it. Please sugest.


Thanks

If u r using joins in the source query
then specify alias for each table and have each column with alias table name.column name ...
This reduces the time taken to execute the query ..as it will not search each table for the column

Thanks
Deepti

Posted: Tue Nov 18, 2008 3:01 pm
by ray.wurlod
Not "u" but "you".
Not "r" but "are".

Do you use these SMS-style abbreviations when documenting your work? Then please don't use them here.

Posted: Tue Nov 18, 2008 11:36 pm
by DEEPTI
My apology ..
Thanks Ray for suggesting..
Deepti

Posted: Wed Nov 19, 2008 5:26 pm
by pradkumar
Hi

Thanks for your replys ..

But my source(db2 database/odbc EE stage) has 2 million records flows to a copy stage ; from their one link directly loads in to target sql server(drs stage) and 2nd link from copy stage to aggregator where it takes record count as aggregation type and loads just 1 record to the audit table... The same job design is taking 8 mins of time in SSIS where as its taking 15 mins in datastage .. is there any way can we improve the performace ( like doing partitioning etc)

Thanks in Advance

Posted: Wed Nov 19, 2008 5:37 pm
by ray.wurlod
What partitioning do you already have? Is the DB2 table partitioned?

Posted: Thu Feb 12, 2009 7:13 pm
by Gazelle
I don't know if this will help in your case, but we found a bottleneck in how the DB2 database was configured.

The symptoms were similar: our jobs would spend >80% of their time waiting for rows to be returned from the DB2 read.

We reduced this to about 20%, by completely restructuring DB2:
- defining Indexes, Logs and Data to be on separate filesystems (ideally different physical disk spindles);
- creating multiple "containers" for the tablespaces on separate filesystems (again, ideally on different disk spindles);
- maximising some of the db2 parameters (such as the buffer size), and the way that Unix manages its memory. [Sorry, but I do not remember all of the parameters].

So this might be more of a DB2 issue rather than a DataStage issue.
Note that if DB2 is on the same server as DataStage, then there's a bit of disagreement over the optimum Unix settings. They don't really play nicely together.