Does reducing the stages, improves the performance?

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Ravindar
Participant
Posts: 30
Joined: Tue Mar 23, 2004 6:14 am
Location: Chennai, India

Does reducing the stages, improves the performance?

Post by Ravindar »

Hi all,

In one of my job I have three consecutive transformer stages.
It is possible to club the three transformers into one transformer.

Is there any difference in performance between
----------having three transformer stages seperately
and
----------having one transformer.

If so how?
Can anybody explain in detail.

Thanks,
Ravin
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Technically YES, but the gains vary depending on the operations being done. Filtering out rows in one transformer may reduce the rows that flow to a subsequent transformer where the derivations are more intense. In this case, you're better off having more transformers. More tranformers allows you to simplify some logic and mitigate any performance overhead by using row buffering.

You should simply benchmark a job like seq-->xfm-->xfm-->xfm-->seq where the rows are simply passed thru the transformers. Benchmark it with 3 and then remove one at a time and see the performance differences for yourself.

Then, repeat the same test, scattering some logic across all three transformers and benchmark the performance changes as you move the logic towards the first transformer.

In my experience, you're better off using easier to manage designs and performance tune thru other means. Reducing tranformers is not the silver bullet to performant jobs.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
chucksmith
Premium Member
Premium Member
Posts: 385
Joined: Wed Jun 16, 2004 12:43 pm
Location: Virginia, USA
Contact:

Post by chucksmith »

Also, consider the parallel approach using a link partitioner and a link collector. If your data supports partitioning, and you have multiple CPUs, then more may be better.
sumitgulati
Participant
Posts: 197
Joined: Mon Feb 17, 2003 11:20 pm
Location: India

Post by sumitgulati »

I totally agree with Kenneth Bland. If you have intense transformations then using multiple transformers prevents overloading of one transformer. By splitting the load across transformers you actually do a load balancing. This definitly gives a better performance if you use row buffering. However for simple transformations avoid using multiple transformers.

Regards,
Sumit
Post Reply