Page 1 of 1

Different output with different nodes configuration files

Posted: Mon Jan 05, 2009 6:28 am
by Anoop3d
I am using 3 different stage variables in transformer with initial values say 1,2,3 respectively and then incrementing them by 1. This output is going to 3 different output files.
For these 3 files , I shud get output as
First file - 1,2,3,4
Second file - 2,3,4,5
Third File - 3,4,5,6
I am getting this output when running my job on 1 node configuration.
But when I run it on 2 node configuration it gives output as -
First file - 3,3,3,3
Second file - 2,2,3,3
Third File - 3,3,4,4
I don't want my output to change to change when I am using different node configuration files
Please help

Posted: Mon Jan 05, 2009 4:25 pm
by ray.wurlod
Identify (from the score) exactly what partitioning is being used at each stage in your job. Post that information here. Without it it is not possible to provide cogent advice.

Posted: Mon Jan 05, 2009 5:05 pm
by shankar_iyer
Along with partitioning, you should also mention the constraints of your each output link if any.

Posted: Mon Jan 05, 2009 8:26 pm
by vmcburney
There is a thread in the FAQ forum on implementing a counter in a Transformer of a parallel job. You are safer referring to the special macros for parallel jobs @NUMPARTITIONS and @PARTITIONNUM. You set the three stage variable starting values to:
@PARTITIONNUM - @NUMPARTITIONS +1
@PARTITIONNUM - @NUMPARTITIONS +2
@PARTITIONNUM - @NUMPARTITIONS +3

Then increment each variable by @PARTITIONNUM:
StageVar1 = StageVar1 + @PARTITIONNUM

This should give you three stage variables starting at 1, 2 and 3 and delivering unique numbers across the partitions. With round robin partitioning I think this will deliver numbers in sequence but there is a chance numbers will output out of sequence (if one partition is faster than the others) and there is also a chance that some numbers will be skipped at the end of a dataset if partitions are not balanced. You should test it to see what happens.