Page 1 of 1

DB2 to SQLServer ETL runs slow..using wrong logical

Posted: Wed Jul 25, 2007 8:48 am
by rmcclure
We are running a simple ETL server job that takes data from the source DB2 table using ODBC and puts it in a target SQLserver 2005 table.

The ETL job runs very slow processing less than 100 rows per second for a job that needs to process about 300K rows.

Working with our DB administrator we found that the ETL job was not using the most efficient logical file for that physical table.

We are trying to understand better why this is happening. Does anyone have any idea what might be the problem? The ETL job? The ODBC driver? DB2?

Posted: Wed Jul 25, 2007 9:17 am
by DSguru2B
Thats internals and I dont think is tunable from your end. Maybe from your DBA's end, it might be.
You could do a few things to identify the bottleneck:
-Do a simple select to a text file and see how fast the select is.
-Use that text file and load it into your target and see how much time it takes to load the file.

This will identify whether the select or the upsert is the problem.
More than likely, it will be the target thats creating a back pressure all the way to your source. For that you could:
-Increase the array size and transaction size
-Use seperate streams for inserts and updates
-Do not pass delete for huge volumes, prefer truncate or a stored procedure that truncates the table.

Re: DB2 to SQLServer ETL runs slow..using wrong logical

Posted: Wed Jul 25, 2007 9:21 am
by chulett
rmcclure wrote:Working with our DB administrator we found that the ETL job was not using the most efficient logical file for that physical table.
We are trying to understand better just what exactly this means. Can you explain what your DBA feels it is doing wrong or not doing correctly with regards to this 'most efficient logical file'?

Posted: Thu Jul 26, 2007 6:51 am
by rmcclure
After a little trial and error testing we are sure that the problem lies with the source and not the target. When we try even just a view data is Designer on the source table, it takes forever to display the results.

the select statement contains the following where clause:
Where (Createdate > #GetLastUpdateTimeE1# OR Createdate IS NULL OR Createdate = 0 OR GLDate > #GetLastUpdateTimeE1#)
AND NOT(UniqueId = 0 AND TransQty = 0 AND AmountPaid = 0)

If I run the select with this where clause it will take about 1/2 an hour to display the results in a view data.

If I remove the GLDate > #GetLastUpdateTimeE1# condition from the select statement it returns the results in less than 5 seconds.

Posted: Thu Jul 26, 2007 7:09 am
by chulett
So, you just have a SQL tuning problem? This has nothing to do with DataStage, obviously. Sounds like there's no index on the GLDate field so it's use basically forces a full scan of the table. You would need to verify.

Forget about DataStage for the moment and take that query into your favorite SQL tool. Work with whomever can help you to tune the query, including the possible addition of an index or indexes to support it. Once you've got it running 'properly' there, then migrate it back to your job.

Posted: Thu Jul 26, 2007 6:14 pm
by vmcburney
You might run into this problem on other jobs. The easiest way to spot it is via the job minotor showing that your job processes 0 rows for a long time. It will report that it ran at 200 rows a second but it was probably 0 rows a second for a long time followed by 1000 rows a second once it got going.