Optimizing performance using DS

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
laiko
Premium Member
Premium Member
Posts: 35
Joined: Sun May 25, 2008 10:55 am

Optimizing performance using DS

Post by laiko »

Hi,

I have a source table which resides in a remote DB. I get the records from this table based on the last_update_date column. Unfortunately, this table is not range-partitioned in the last_update_date column. My query is simple, no joins and so gets only from one table. As simple as:

Code: Select all

SELECT col1, col2, col3 
FROM sourcetab 
WHERE last_update_date >= "<cut-off-date>"
Though, I have some TRUNC, TO_CHAR and TO_DATE functions on some of the columns. But apart from those, no other functions are used.

It takes a long time for my job to fetch the records from this remote source. We would like to tune it first at the DataStage level as much as possible, before going to the database level (such as using dblink, partitioning - ifever they may help).

I cannot think of any way to optimize this job. I tried using an intermediary container such as a temp file:

SOURCE --> temp file --> DW Staging

But this does not help..rather takes longer to finish. Any idea??
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

You need to differentiate between the time required to select the data and the time required to transfer it to DataStage. I suspect your limiting factor is the transfer, not the database selection.
laiko
Premium Member
Premium Member
Posts: 35
Joined: Sun May 25, 2008 10:55 am

Post by laiko »

Maybe.. I just need to optimize the entire process, so either an improvement on Selection or transfer will help. Any advise??
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You seem to be implying that the bottleneck here is your source query. What database are you using? What stage? Without adding an index over the last_update_date pretty much the only way to speed up a simple select like that is to introduce parallelism. In Oracle, that would mean a PARALLEL hint. And the index would help regardless.

As to the transfer speed, have you optimized your Array Size? I'm also curious, when you landed the data between the passive stages, what were your timings between the two 'halves'? That exercise should have gone a long way towards letting you know which 'side' was the issue here.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply