How to improve performance

pradkumar · Post by **pradkumar** » Fri Nov 14, 2008 2:18 pm

Hi Everyone,

I have a job design as

DRS(DB2 database) ---> Copy stage--->DRS( Sqlserver)
|
Aggregator --->ODBC(audit Source rows)

The job is taking around 15 mins to complete in datastage where as the same job is taking 8mins in SSIS. The volume of records is 2million and the rows for transaction is around 1000/sec. After monitoring the job i got to know that 14mins of time is taking while extracting process only. Even i played by changing the DRS stage to ODBC connector /ODBC EE and also with the array size ( 1 to 5000) but no luck .

Can any one please suggest how to improve the performance, ( need to reduce the processing time to 8 minutes) .

Thanks In Advance

Nagaraj · Post by **Nagaraj** » Fri Nov 14, 2008 4:33 pm

Why dont you Pull the second link for AGG from COpy stage.? I assume that you are using the second link (Agg) to get the count rows.

ray.wurlod · Post by **ray.wurlod** » Fri Nov 14, 2008 9:51 pm

What read method are you using to get the data out of DB2? Use the Resource Estimator and/or Job Monitor to give more targeted metrics. If the import really is the resource hog, this is the area upon which your investigations need to be more closely focussed.

pradkumar · Post by **pradkumar** » Mon Nov 17, 2008 1:25 pm

Hi Ray,

Iam using user defined SQL as read method to pull data from DB2 database.In the job monitor i found that 11.34 mins of total time 12 mins is taking for the source stage. can i play by changing the array size , read committed ? What changes have to make to improve it. Please sugest.

Thanks

pradkumar · Post by **pradkumar** » Mon Nov 17, 2008 1:28 pm

Hi Ray,

Iam using user defined SQL as read method to pull data from DB2 database.In the job monitor i found that 11.34 mins of total time 12 mins is taking for the source stage. can i play by changing the array size , read committed ? What changes have to make to improve it. Please sugest.

Thanks

ray.wurlod · Post by **ray.wurlod** » Tue Nov 18, 2008 1:12 am

No idea. What is your user-defined SQL doing? Are any constraints, joins, groupings or sorts supported by indexes in the database? Could this be faster using bulk unload and processing in DataStage?

DEEPTI · Post by **DEEPTI** » Tue Nov 18, 2008 2:38 am

pradkumar wrote:Hi Ray,

Iam using user defined SQL as read method to pull data from DB2 database.In the job monitor i found that 11.34 mins of total time 12 mins is taking for the source stage. can i play by changing the array size , read committed ? What changes have to make to improve it. Please sugest.

Thanks

If u r using joins in the source query
then specify alias for each table and have each column with alias table name.column name ...
This reduces the time taken to execute the query ..as it will not search each table for the column

Thanks
Deepti

ray.wurlod · Post by **ray.wurlod** » Tue Nov 18, 2008 3:01 pm

Not "u" but "you".
Not "r" but "are".

Do you use these SMS-style abbreviations when documenting your work? Then please don't use them here.

DEEPTI · Post by **DEEPTI** » Tue Nov 18, 2008 11:36 pm

My apology ..
Thanks Ray for suggesting..
Deepti

pradkumar · Post by **pradkumar** » Wed Nov 19, 2008 5:26 pm

Hi

Thanks for your replys ..

But my source(db2 database/odbc EE stage) has 2 million records flows to a copy stage ; from their one link directly loads in to target sql server(drs stage) and 2nd link from copy stage to aggregator where it takes record count as aggregation type and loads just 1 record to the audit table... The same job design is taking 8 mins of time in SSIS where as its taking 15 mins in datastage .. is there any way can we improve the performace ( like doing partitioning etc)

Thanks in Advance

ray.wurlod · Post by **ray.wurlod** » Wed Nov 19, 2008 5:37 pm

What partitioning do you already have? Is the DB2 table partitioned?

Gazelle · Post by **Gazelle** » Thu Feb 12, 2009 7:13 pm

I don't know if this will help in your case, but we found a bottleneck in how the DB2 database was configured.

The symptoms were similar: our jobs would spend >80% of their time waiting for rows to be returned from the DB2 read.

We reduced this to about 20%, by completely restructuring DB2:
- defining Indexes, Logs and Data to be on separate filesystems (ideally different physical disk spindles);
- creating multiple "containers" for the tablespaces on separate filesystems (again, ideally on different disk spindles);
- maximising some of the db2 parameters (such as the buffer size), and the way that Unix manages its memory. [Sorry, but I do not remember all of the parameters].

So this might be more of a DB2 issue rather than a DataStage issue.
Note that if DB2 is on the same server as DataStage, then there's a bit of disagreement over the optimum Unix settings. They don't really play nicely together.