DataStage Performance

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
bdixon
Participant
Posts: 35
Joined: Thu Nov 20, 2003 5:45 pm
Location: Australia, Sydney

DataStage Performance

Post by bdixon »

Hi,

I am looking at speeding up my datastage jobs. I have already looked into the loading of the data and got that down to 40mins but my batch still runs for 4hours.
Does anyone have any ideas on how to speed up datastage or make better use of memory available?

Thanks
Brad
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Re: DataStage Performance

Post by Teej »

bdixon wrote:I am looking at speeding up my datastage jobs. I have already looked into the loading of the data and got that down to 40mins but my batch still runs for 4hours.
Does anyone have any ideas on how to speed up datastage or make better use of memory available?
It is all your calculation process. Test just loading the same amount of data onto the database. Are you using Update, not Insert? How many rows are we talking about here? Is it a flat file? Database?

There is so many variations here that we can not really help you on here unless you have specific questions or provide specific details.

-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
bdixon
Participant
Posts: 35
Joined: Thu Nov 20, 2003 5:45 pm
Location: Australia, Sydney

Re: DataStage Performance

Post by bdixon »

Teej wrote:
bdixon wrote:I am looking at speeding up my datastage jobs. I have already looked into the loading of the data and got that down to 40mins but my batch still runs for 4hours.
Does anyone have any ideas on how to speed up datastage or make better use of memory available?
It is all your calculation process. Test just loading the same amount of data onto the database. Are you using Update, not Insert? How many rows are we talking about here? Is it a flat file? Database?

There is so many variations here that we can not really help you on here unless you have specific questions or provide specific details.

-T.J.
Each night we insert about 3million rows and update about 1.5million from flat files into a redbrick database which in total takes about 40mins - I am happy with this performance.
But I am after any tips on speeding the processing time up for datastage. Can I make writing to hash files any quicker or check a setting to speed up the calculation process ie the other 3hrs 20mins?
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

You might try using some form of CDC or CDD so that you do not update all 1.5 million if they have not changed. As the other posts pointed out, we really do not have a good understanding other than DB and number of rows. We need a bit more information so that we can help you more effectively.

Regards,

Michael Hester
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Brad

If you delete your hash files and recreate them with minimum modulos then that can easily speed up hash file lloading. ANALYZE.FILE will tell you how big to make the minimum modulo. Do a search on this site. It has been covered many times and why it works.

Hester's idea will probably help the most though but it harder to implement. My idea may get you the most gain for amount of time spent.

Kim.
Mamu Kim
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

The easiest method to use to achieve multiples of improvement is to instantiate your jobs. You're on a Windows machine I see, so let's pretend you have 4 cpus. Since you're on DS 5, you don't have to worry about inter-process stages or row buffering.

If you're on DS 5.2, you can enable job instantiation. If you're on DS 5.1 or earlier, you can just SAVE-AS make copies of your job. You parameterize output sequential file names and partition the source data via some means (I like a round-robin approach using MOD(@INROWNUM, NumberOfClones) = ThisJobCloneNumber - 1) approach in a transformer constraint. So if you have 4 cpus, you're only using 1 at the most because jobs are single-threaded. So running 4 copies of the job simultaneously means you could net 4X the throughput.

Good luck!
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Post by Teej »

Or you can buy PX, make Ascential richer, and make your boss happier*.

* After going through the pain of learning the different rules, and figuring out the best policy to utilize the tool which may take you upward to a year or two to master.

;-)

Seriously, Ken's and Kim's solutions could be exactly what you need with the tools you have. However, this is something that requires specific TECHNICAL information.

-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
bdixon
Participant
Posts: 35
Joined: Thu Nov 20, 2003 5:45 pm
Location: Australia, Sydney

Post by bdixon »

Thanks everyone. I think I have what I need now... I will go away and try out some of your suggestions.

Thanks again
Brad
Post Reply