DataStage Performance

bdixon · Post by **bdixon** » Thu Nov 20, 2003 6:01 pm

Hi,

I am looking at speeding up my datastage jobs. I have already looked into the loading of the data and got that down to 40mins but my batch still runs for 4hours.
Does anyone have any ideas on how to speed up datastage or make better use of memory available?

Thanks
Brad

Teej · Post by **Teej** » Thu Nov 20, 2003 6:15 pm

bdixon wrote:I am looking at speeding up my datastage jobs. I have already looked into the loading of the data and got that down to 40mins but my batch still runs for 4hours.
Does anyone have any ideas on how to speed up datastage or make better use of memory available?

It is all your calculation process. Test just loading the same amount of data onto the database. Are you using Update, not Insert? How many rows are we talking about here? Is it a flat file? Database?

There is so many variations here that we can not really help you on here unless you have specific questions or provide specific details.

-T.J.

bdixon · Post by **bdixon** » Thu Nov 20, 2003 6:27 pm

Teej wrote:
bdixon wrote:I am looking at speeding up my datastage jobs. I have already looked into the loading of the data and got that down to 40mins but my batch still runs for 4hours.
Does anyone have any ideas on how to speed up datastage or make better use of memory available?
It is all your calculation process. Test just loading the same amount of data onto the database. Are you using Update, not Insert? How many rows are we talking about here? Is it a flat file? Database?

There is so many variations here that we can not really help you on here unless you have specific questions or provide specific details.

-T.J.

Each night we insert about 3million rows and update about 1.5million from flat files into a redbrick database which in total takes about 40mins - I am happy with this performance.
But I am after any tips on speeding the processing time up for datastage. Can I make writing to hash files any quicker or check a setting to speed up the calculation process ie the other 3hrs 20mins?

mhester · Post by **mhester** » Thu Nov 20, 2003 6:30 pm

You might try using some form of CDC or CDD so that you do not update all 1.5 million if they have not changed. As the other posts pointed out, we really do not have a good understanding other than DB and number of rows. We need a bit more information so that we can help you more effectively.

Regards,

Michael Hester

kduke · Post by **kduke** » Thu Nov 20, 2003 7:29 pm

Brad

If you delete your hash files and recreate them with minimum modulos then that can easily speed up hash file lloading. ANALYZE.FILE will tell you how big to make the minimum modulo. Do a search on this site. It has been covered many times and why it works.

Hester's idea will probably help the most though but it harder to implement. My idea may get you the most gain for amount of time spent.

Kim.

kcbland · Post by **kcbland** » Thu Nov 20, 2003 9:12 pm

The easiest method to use to achieve multiples of improvement is to instantiate your jobs. You're on a Windows machine I see, so let's pretend you have 4 cpus. Since you're on DS 5, you don't have to worry about inter-process stages or row buffering.

If you're on DS 5.2, you can enable job instantiation. If you're on DS 5.1 or earlier, you can just SAVE-AS make copies of your job. You parameterize output sequential file names and partition the source data via some means (I like a round-robin approach using MOD(@INROWNUM, NumberOfClones) = ThisJobCloneNumber - 1) approach in a transformer constraint. So if you have 4 cpus, you're only using 1 at the most because jobs are single-threaded. So running 4 copies of the job simultaneously means you could net 4X the throughput.

Good luck!

Teej · Post by **Teej** » Thu Nov 20, 2003 9:34 pm

Or you can buy PX, make Ascential richer, and make your boss happier*.

* After going through the pain of learning the different rules, and figuring out the best policy to utilize the tool which may take you upward to a year or two to master.

Seriously, Ken's and Kim's solutions could be exactly what you need with the tools you have. However, this is something that requires specific TECHNICAL information.

-T.J.

bdixon · Post by **bdixon** » Thu Nov 20, 2003 9:53 pm

Thanks everyone. I think I have what I need now... I will go away and try out some of your suggestions.

Thanks again
Brad

DSXchange

DataStage Performance

DataStage Performance

Re: DataStage Performance

Re: DataStage Performance