Performance Issue

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
rasi
Participant
Posts: 464
Joined: Fri Oct 25, 2002 1:33 am
Location: Australia, Sydney

Post by rasi »

Break 25 lookups into small jobs is more efficient compared to having one single monster job
Regards
Siva

Listening to the Learned

"The most precious wealth is the wealth acquired by the ear Indeed, of all wealth that wealth is the crown." - Thirukural By Thiruvalluvar
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

One job, but with multiple Transformer stages (say, not more than four lookups per Transformer stage) would also be OK. You can enable inter-process row buffering (and, if desired, interpolate IPC stages to make it obvious that there are separate processes involved). Of course, splitting into multiple processes is not a great gain if you only have a single CPU.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rajkraj
Premium Member
Premium Member
Posts: 98
Joined: Wed Jun 15, 2005 1:41 pm

Post by rajkraj »

Thanks for your responses. Ray i have a question
which one is the best one
In Single job with multiple Tranasformer stages or spliting the big job into multile jobs
ray.wurlod wrote:One job, but with multiple Transformer stages (say, not more than four lookups per Transformer stage) would also be OK. You can enable inter-process row buffering (and, if desired, interpolate IPC stages to make it obvious that there are separate processes involved). Of course, splitting into multiple processes is not a great gain if you only have a single CPU.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

"Best" is too subjective a term. 25 jobs is very many to maintain. 25 separate jobs will make troubleshooting easier. What/where are your priorities?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
aartlett
Charter Member
Charter Member
Posts: 152
Joined: Fri Apr 23, 2004 6:44 pm
Location: Australia

Post by aartlett »

I have to agree with Ray, One job, multiple transformations.
Load as many Hashed tables into RAM (enable caching) as possible.

If volumes are huge you can even containerise the lookup transforms, split the stream using a link partitioner, process each stream through the lookups and then use a link collector to pring it all together.

This is not required if the hashed files load to ram. Use Administrator to increase the cache buffer size.

I like to hammer the cpu out of a box, remember a lost cpu cycle is a waster cpu cycle. Try to keep the box at about 5% idle.
Andrew

Think outside the Datastage you work in.

There is no True Way, but there are true ways.
Ocean
Participant
Posts: 18
Joined: Tue Jul 18, 2006 1:51 am

Post by Ocean »

Hi Ray,

I created a job with four transformers, one having 5~7 lookup. When using inter process row buffer, overall performance is like 50 records/sec. When not using inter process row option, it runs like 250 rows/sec.

Is there any issue with this? Any suggestion?

Thanks,
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Ocean - if you address a post to Ray, does that mean you don't want to hear from anyone else? The performance change is expected when you have a multi cpu system. What is your question? Oops, I'll take that back since I'm not Ray.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

How many rows? Rows/sec is an almost meaningless metric for a whole lot of reasons. For example, the start-up time and close-down time are counted in the elapsed time, even though no rows are processed. For a small number of rows, you might also be waiting for the (IPC) buffers to fill.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Ocean
Participant
Posts: 18
Joined: Tue Jul 18, 2006 1:51 am

Post by Ocean »

Hi Ray,

Not only row/sec figure, the elapse time is also taking longer than no inter process row option.

Hi ArndW,

Just happened addressing to ray on reading his advice. I really appreciage your help.

Development server has 1 processor, production has 4, so expected to have better performance in production. Can it be concluded processor issue here?


Thanks all for advice,
Post Reply