Job optimization

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
qutesanju
Participant
Posts: 373
Joined: Tue Aug 26, 2008 4:52 am

Job optimization

Post by qutesanju »

I have certain queries which results milions of data and I'm updating target tables based on the data derived thru above queries.

but whole job is taking long time around 2-3 days.

How can I optimize this process?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

What's your target database? What does your current job design look like?
-craig

"You can never have too many knives" -- Logan Nine Fingers
satya99
Participant
Posts: 104
Joined: Thu Nov 30, 2006 1:22 pm

Post by satya99 »

Split the source file--> Allow Multiple instance
satya
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

"Certain queries", eh? That's not very informative - could it be the ten way full outer join query from hell? How long does the same query take if DataStage is not in the mix? You will not be able to do anything in DataStage to improve this time. Parallel execution is a possibility but, in a server job (or several) you need to design this in.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
qutesanju
Participant
Posts: 373
Joined: Tue Aug 26, 2008 4:52 am

Post by qutesanju »

target database is SQL server 2005 and current job design is input query--->transformer--->target tables(update action)
qutesanju
Participant
Posts: 373
Joined: Tue Aug 26, 2008 4:52 am

Post by qutesanju »

is this applicable for server edition split the source file?
satya99 wrote:Split the source file--> Allow Multiple instance
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

What 'update action' exactly?
-craig

"You can never have too many knives" -- Logan Nine Fingers
satya99
Participant
Posts: 104
Joined: Thu Nov 30, 2006 1:22 pm

Post by satya99 »

yes,


Step 1:

source -->Link_Partitioner-->source1,source 2...

Step 2:

Source1 -->Transformer(your logic) ---> Target table ( Allow Multiple Instance)


Source2 -->Transformer(your logic) ---> Target table
...
satya
mfavero
Premium Member
Premium Member
Posts: 45
Joined: Thu Jan 16, 2003 1:20 pm
Location: Minneapolis

Post by mfavero »

Are you only doing updates? If not, you are probably doing upserts. Split the file into two, one for inserts and one for updates. Then run each file through the proper load.

If you are doing all updates, consider generating a key file and delete all the rows with those keys. Then bulk insert all the rows. Of course if you do not have all the columns for the rows in the source, you would need to bring down the missing columns and 'join' them to the source in datastage.

The point is, you need to understand what is taking so long. Upserts are expensive, updates are next. Inserts through a non-bulk OCI are pretty good. But nothing beats bulk utilities.
Michael Favero

2852 Humboldt Ave So
Minneapolis, MN 55408
Post Reply