Job optimization

qutesanju · Post by **qutesanju** » Mon Jun 01, 2009 8:36 am

I have certain queries which results milions of data and I'm updating target tables based on the data derived thru above queries.

but whole job is taking long time around 2-3 days.

How can I optimize this process?

chulett · Post by **chulett** » Mon Jun 01, 2009 8:58 am

What's your target database? What does your current job design look like?

satya99 · Post by **satya99** » Mon Jun 01, 2009 2:47 pm

Split the source file--> Allow Multiple instance

ray.wurlod · Post by **ray.wurlod** » Mon Jun 01, 2009 4:18 pm

"Certain queries", eh? That's not very informative - could it be the ten way full outer join query from hell? How long does the same query take if DataStage is not in the mix? You will not be able to do anything in DataStage to improve this time. Parallel execution is a possibility but, in a server job (or several) you need to design this in.

qutesanju · Post by **qutesanju** » Tue Jun 02, 2009 1:18 am

target database is SQL server 2005 and current job design is input query--->transformer--->target tables(update action)

qutesanju · Post by **qutesanju** » Tue Jun 02, 2009 1:37 am

is this applicable for server edition split the source file?

satya99 wrote:Split the source file--> Allow Multiple instance

chulett · Post by **chulett** » Tue Jun 02, 2009 7:19 am

What 'update action' exactly?

satya99 · Post by **satya99** » Tue Jun 02, 2009 11:28 am

yes,

Step 1:

source -->Link_Partitioner-->source1,source 2...

Step 2:

Source1 -->Transformer(your logic) ---> Target table ( Allow Multiple Instance)

Source2 -->Transformer(your logic) ---> Target table
...

mfavero · Post by **mfavero** » Tue Jun 02, 2009 1:42 pm

Are you only doing updates? If not, you are probably doing upserts. Split the file into two, one for inserts and one for updates. Then run each file through the proper load.

If you are doing all updates, consider generating a key file and delete all the rows with those keys. Then bulk insert all the rows. Of course if you do not have all the columns for the rows in the source, you would need to bring down the missing columns and 'join' them to the source in datastage.

The point is, you need to understand what is taking so long. Upserts are expensive, updates are next. Inserts through a non-bulk OCI are pretty good. But nothing beats bulk utilities.