Page 1 of 1

Best practices for 2 consecutives transforms

Posted: Tue May 30, 2006 3:38 am
by kjaouhari
Hi all,

My question is just on the best practices for two consecutives transforms.
Transforms are used to do look up.

What is the best 1 or 2 ?
1. Put a Hash File or Sequential between the two Look up
2. Nothing between the transforms

Thanks !

Best practices for 2 consecutives transforms

Posted: Tue May 30, 2006 4:47 am
by ashwin141
Hi

I will suggest that you should try avoiding the two transformers. Try implementing all the transformations in one transformer.

If at all you can't avoid then I thing directly linking this transformer to next transformer will be better that using a stage between them and then linking it to second transformer.

I hope that answers your question.

Regards
Ashwin

Posted: Tue May 30, 2006 5:21 am
by balajisr
Can you tell exactly what you are trying to do?

As mentioned by ashwin it is better to use only one transfomer rather than two.

Posted: Tue May 30, 2006 6:00 am
by kumar_s
Two transformed next to next will be clubbed as a single process during runtime unless interprocess is introduced. And placeing a intermediate passive stage will increase a unecessary IO.

Posted: Tue May 30, 2006 6:11 am
by loveojha2
kumar_s wrote:Two transformed next to next will be clubbed as a single process during runtime unless interprocess is introduced. And placeing a intermediate passive stage will increase a unecessary IO.
Kumar, I have one doubt here?
I am aware of the thing that if you put consecutive transformers it would be executed only by a single process, until you have the row buffering enabled.

The doubt is on how is it actually executed?
1. Do all the transformation clubbed together, then it is executed.
2. Or, each single transformer related transformations are executed first, then the row follows the transformations of the succeding transformers in that order.

Thanks

Posted: Tue May 30, 2006 6:31 am
by kumar_s
Not only transformer, all active stages are merged to a single process during compilation. Passive stages frame the boundary. All the transformation logic will be built in to a single process. There wont be pipeline parallelism among the stages.

Posted: Tue May 30, 2006 6:32 am
by kjaouhari
In a first time, the first transformers (look up) is used to get the ColA in FileA
Then in a second time, ColA is used in the second transformers(look up) to get ColB in FileB.

So the process has 2 sequenced steps.

Thanks for your help !

Re: Best practices for 2 consecutives transforms

Posted: Tue May 30, 2006 6:56 am
by ray.wurlod
kjaouhari wrote:Hi all,

My question is just on the best practices for two consecutives transforms.
Transforms are used to do look up.

What is the best 1 or 2 ?
1. Put a Hash File or Sequential between the two Look up
2. Nothing between the transforms

Thanks !
Nothing between the Transformer stages. If you have multiple CPUs, or the job in its current form consumes less than 40% of the single CPU, enable inter-process row buffering.