Multiple Transformers?

rcil · Post by **rcil** » Sat Jul 02, 2005 6:34 pm

Hello All,

If I have 6 hash file lookups to be performed on a job then how good is it use 6 transformers for each lookup and how good is to use one transformer for all the lookup? Which is the good way of doing and what is the difference?

I usually use one trasformer for all the 6 hash lookup but one of my senior uses individual transformer for all the 6 lookups i.e. 6 transformers. I am not confident to argue with him becuase I do not know at this point which is good in performance.

Hope I will get some help.

thanks,
Rcil

ray.wurlod · Post by **ray.wurlod** » Sat Jul 02, 2005 7:50 pm

First point is that both methods work. So would a design with two Transformer stages each performing three lookups.

Technically one Transformer stage can have 1 stream input, N outputs and (127 - N) reference inputs. However, by cramming all of these into a single process (for larger N) you probably introduce a bottleneck.

Using more than one Transformer, particularly if row buffering is enabled and/or IPC stages are used to enforce separate processes, you can spread the CPU load. This will be of most benefit where you have more than one CPU of course. Even if you're not using row buffering, a multi Transformer design allows you to add it later when searching for improved throughput (pipeline parallelism).

There must be a limit where the one-Transformer design becomes cumbersome to maintain (or simply not aesthetically pleasing on the design canvas).

As a rule of thumb I tend to use four as that limit, but have created jobs in which up to eight lookups have been performed in one Transformer stage; however in this case each of the hashed files was small (only a few tens of rows at most).

When you have a lookup dependent on the result of a prior lookup, using more than one Transformer stage is cleaner, in that the Designer doesn't complain about using an earlier reference input in a key expression (even though it's legal to do so and works). By using a method that doesn't leave the key expression you're assisting the next developer by not violating the "law of least astonishment".

In summary, restricting each Transformer to one reference lookup seems a little too restrictive to me, unless there's a reason for doing so (such as dependent downstream lookups). Otherwise, "a few" - say no more than four as a rule of thumb - should be OK.

You can identify the processes associated with each Transformer stage and monitor their resource consumption. When running the job, open the Tracing tab on the Job Run Options dialog and select Statistics for each of the Transformer stages.