transformer question

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
dnat
Participant
Posts: 200
Joined: Thu Sep 06, 2007 2:06 am

transformer question

Post by dnat »

Hi,

I have a server job where i am reading a fixed width file and separating the columns using a transformer. Again i am using one more basic transformer to do the validations. If i do all these in a single transformer, will the performance improve.
This is some existing design, where i am trying to reduce the time taken for the job to run.
Minhajuddin
Participant
Posts: 467
Joined: Tue Mar 20, 2007 6:36 am
Location: Chennai
Contact:

Post by Minhajuddin »

I am not a pro at server jobs, But let me give it a shot.

If you think about it, Even with the Pipeline paralleism in place for your two transformers, The rows are going to be processed by Transformer1 and then by Transformer2 in a sequential fashion. And you can always put the logic of two consecutive transformers into one. So, I am not sure if moving the logic from two transformers into one will really help. You may not see much of a difference in the performance of these. But, if you want to improve the performance, you can instead use a Link Partitioner and partition the data into two streams and then process the data using two transformers on your streams and then merge the streams together using a Link collector.

Server Gurus, Correct me if I am wrong.
Minhajuddin

<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

Minhajuddin wrote:Server Gurus, Correct me if I am wrong.
I am not a Guru, but you stand corrected. :wink:
Minhajuddin wrote:Even with the Pipeline paralleism in place for your two transformers, The rows are going to be processed by Transformer1 and then by Transformer2 in a sequential fashion.
However, if you use an inter-process buffering between the 2 transformers, you are certainly invoking the second process in a pipeline fashion, thereby giving you a performance gain.
gateleys
vijay.barani
Participant
Posts: 78
Joined: Wed Jun 04, 2008 2:59 am

Post by vijay.barani »

You can put IPC stge between these two transformer stage, it will improve the performance
Warm Regards,
Vijay
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I am a server job guru (!) and wish to clarify as follows.

If two Transformer stages - or any kind of active stages - are directly connected by a link then, by default, they will run in the same process.

Enabling inter-process row buffering, whether at the job level or by placing an IPC stage between the two active stages, will cause them to run in separate processes.

Whether or not this results in improved throughput will depend on a number of factors, primarily the total load on the machine. If it's already maxed out, then no gain is possible by adding processes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply