Page 1 of 1

Job optimization

Posted: Mon Jun 01, 2009 8:36 am
by qutesanju
I have certain queries which results milions of data and I'm updating target tables based on the data derived thru above queries.

but whole job is taking long time around 2-3 days.

How can I optimize this process?

Posted: Mon Jun 01, 2009 8:58 am
by chulett
What's your target database? What does your current job design look like?

Posted: Mon Jun 01, 2009 2:47 pm
by satya99
Split the source file--> Allow Multiple instance

Posted: Mon Jun 01, 2009 4:18 pm
by ray.wurlod
"Certain queries", eh? That's not very informative - could it be the ten way full outer join query from hell? How long does the same query take if DataStage is not in the mix? You will not be able to do anything in DataStage to improve this time. Parallel execution is a possibility but, in a server job (or several) you need to design this in.

Posted: Tue Jun 02, 2009 1:18 am
by qutesanju
target database is SQL server 2005 and current job design is input query--->transformer--->target tables(update action)

Posted: Tue Jun 02, 2009 1:37 am
by qutesanju
is this applicable for server edition split the source file?
satya99 wrote:Split the source file--> Allow Multiple instance

Posted: Tue Jun 02, 2009 7:19 am
by chulett
What 'update action' exactly?

Posted: Tue Jun 02, 2009 11:28 am
by satya99
yes,


Step 1:

source -->Link_Partitioner-->source1,source 2...

Step 2:

Source1 -->Transformer(your logic) ---> Target table ( Allow Multiple Instance)


Source2 -->Transformer(your logic) ---> Target table
...

Posted: Tue Jun 02, 2009 1:42 pm
by mfavero
Are you only doing updates? If not, you are probably doing upserts. Split the file into two, one for inserts and one for updates. Then run each file through the proper load.

If you are doing all updates, consider generating a key file and delete all the rows with those keys. Then bulk insert all the rows. Of course if you do not have all the columns for the rows in the source, you would need to bring down the missing columns and 'join' them to the source in datastage.

The point is, you need to understand what is taking so long. Upserts are expensive, updates are next. Inserts through a non-bulk OCI are pretty good. But nothing beats bulk utilities.