DB2 to SQLServer ETL runs slow..using wrong logical

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
rmcclure
Participant
Posts: 48
Joined: Fri Dec 01, 2006 7:50 am

DB2 to SQLServer ETL runs slow..using wrong logical

Post by rmcclure »

We are running a simple ETL server job that takes data from the source DB2 table using ODBC and puts it in a target SQLserver 2005 table.

The ETL job runs very slow processing less than 100 rows per second for a job that needs to process about 300K rows.

Working with our DB administrator we found that the ETL job was not using the most efficient logical file for that physical table.

We are trying to understand better why this is happening. Does anyone have any idea what might be the problem? The ETL job? The ODBC driver? DB2?
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Thats internals and I dont think is tunable from your end. Maybe from your DBA's end, it might be.
You could do a few things to identify the bottleneck:
-Do a simple select to a text file and see how fast the select is.
-Use that text file and load it into your target and see how much time it takes to load the file.

This will identify whether the select or the upsert is the problem.
More than likely, it will be the target thats creating a back pressure all the way to your source. For that you could:
-Increase the array size and transaction size
-Use seperate streams for inserts and updates
-Do not pass delete for huge volumes, prefer truncate or a stored procedure that truncates the table.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: DB2 to SQLServer ETL runs slow..using wrong logical

Post by chulett »

rmcclure wrote:Working with our DB administrator we found that the ETL job was not using the most efficient logical file for that physical table.
We are trying to understand better just what exactly this means. Can you explain what your DBA feels it is doing wrong or not doing correctly with regards to this 'most efficient logical file'?
-craig

"You can never have too many knives" -- Logan Nine Fingers
rmcclure
Participant
Posts: 48
Joined: Fri Dec 01, 2006 7:50 am

Post by rmcclure »

After a little trial and error testing we are sure that the problem lies with the source and not the target. When we try even just a view data is Designer on the source table, it takes forever to display the results.

the select statement contains the following where clause:
Where (Createdate > #GetLastUpdateTimeE1# OR Createdate IS NULL OR Createdate = 0 OR GLDate > #GetLastUpdateTimeE1#)
AND NOT(UniqueId = 0 AND TransQty = 0 AND AmountPaid = 0)

If I run the select with this where clause it will take about 1/2 an hour to display the results in a view data.

If I remove the GLDate > #GetLastUpdateTimeE1# condition from the select statement it returns the results in less than 5 seconds.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So, you just have a SQL tuning problem? This has nothing to do with DataStage, obviously. Sounds like there's no index on the GLDate field so it's use basically forces a full scan of the table. You would need to verify.

Forget about DataStage for the moment and take that query into your favorite SQL tool. Work with whomever can help you to tune the query, including the possible addition of an index or indexes to support it. Once you've got it running 'properly' there, then migrate it back to your job.
-craig

"You can never have too many knives" -- Logan Nine Fingers
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

You might run into this problem on other jobs. The easiest way to spot it is via the job minotor showing that your job processes 0 rows for a long time. It will report that it ran at 200 rows a second but it was probably 0 rows a second for a long time followed by 1000 rows a second once it got going.
Post Reply