transformer question

dnat · Post by **dnat** » Mon Jun 09, 2008 2:02 am

Hi,

I have a server job where i am reading a fixed width file and separating the columns using a transformer. Again i am using one more basic transformer to do the validations. If i do all these in a single transformer, will the performance improve.
This is some existing design, where i am trying to reduce the time taken for the job to run.

Minhajuddin · Post by **Minhajuddin** » Mon Jun 09, 2008 11:21 am

I am not a pro at server jobs, But let me give it a shot.

If you think about it, Even with the Pipeline paralleism in place for your two transformers, The rows are going to be processed by Transformer1 and then by Transformer2 in a sequential fashion. And you can always put the logic of two consecutive transformers into one. So, I am not sure if moving the logic from two transformers into one will really help. You may not see much of a difference in the performance of these. But, if you want to improve the performance, you can instead use a Link Partitioner and partition the data into two streams and then process the data using two transformers on your streams and then merge the streams together using a Link collector.

Server Gurus, Correct me if I am wrong.

gateleys · Post by **gateleys** » Mon Jun 09, 2008 11:53 am

Minhajuddin wrote:Server Gurus, Correct me if I am wrong.

I am not a Guru, but you stand corrected.

Minhajuddin wrote:Even with the Pipeline paralleism in place for your two transformers, The rows are going to be processed by Transformer1 and then by Transformer2 in a sequential fashion.

However, if you use an inter-process buffering between the 2 transformers, you are certainly invoking the second process in a pipeline fashion, thereby giving you a performance gain.

vijay.barani · Post by **vijay.barani** » Mon Mar 02, 2009 12:37 am

You can put IPC stge between these two transformer stage, it will improve the performance

ray.wurlod · Post by **ray.wurlod** » Mon Mar 02, 2009 12:42 am

I am a server job guru (!) and wish to clarify as follows.

If two Transformer stages - or any kind of active stages - are directly connected by a link then, by default, they will run in the same process.

Enabling inter-process row buffering, whether at the job level or by placing an IPC stage between the two active stages, will cause them to run in separate processes.

Whether or not this results in improved throughput will depend on a number of factors, primarily the total load on the machine. If it's already maxed out, then no gain is possible by adding processes.