If I have a CFF source file as stream input to a Transformer (TFM) and have 8 destination SEQ files based on a contsraint in the TFM, how would performance differ should I rewrite the job to have the source file stream into a TFM1, and from there have two 3 outputs, 2 constrained (to SEQ files) on the column and the other a [Reject] output. I continue with this process until I have 4 TFM stages each (except the last) with two SEQ destinations and the third the [reject] output.
I could of course add IPC stages between all reads and writes, which may improve performance, but I am more concerned about how the design of a job that does the above affects performance. I suppose I am looking for best practice.
So, from:
Code: Select all
CFF ---- TFM ---- SEQ1
---- SEQ2
---- SEQ3
---- SEQ4
---- SEQ5
---- SEQ6
---- SEQ7
---- SEQ8
Code: Select all
CFF ---- TFM1 ---- SEQ1
---- SEQ2
---- TFM2
---- SEQ3
---- SEQ4
---- TFM3
---- SEQ5
---- SEQ6
---- TFM4
---- SEQ7
---- SEQ8
To the eye, it is a lot more appeasing looking at a simple job than a complex design of links and stages.
The number of rows in the source file is 40 million. I must add that the CFF has 19 columns of which I am only interested in 14. The first TFM omits the columns I dont require in the process.