Page 1 of 1

Data inconistency with Transfomer n paralle

Posted: Wed May 19, 2010 11:13 pm
by Gokul
Hi,

I am generating a unique identifcation number (UID) in a transfomer.
The job design like

Dataset---->Sort ---> Transformer---> Target.

I am sorting on columns A,B and generating keychange indicator.
In the transformer,When the keychange indicator is 1 , the UID s incremented else the previous value is generated using stage variables.

When I run the transfomer in parallel, duplicates UID are created, while when I run the transformer in sequential, unique UIDS are generated.

The transfomer,sort stage are partitioned on keys A,B.

Running the transfomer in sequentail will hamper the performance, any suggestin on this.

Re: Data inconistency with Transfomer n paralle

Posted: Thu May 20, 2010 12:02 am
by ravindras83
Since transformer is running in parallel the unique numbers generated through stage variables will be unique within the partition only.

If you want unique numbers across all partitions then use the variables

@PARTITIONNUM AND @NUMPARTITIONS in the transformer

i.e give first value for stage variable as
@PARTITIONNUM

and increment the stage variable by
@NUMPARTITIONS
instead of 1

Re: Data inconistency with Transfomer n paralle

Posted: Thu May 20, 2010 3:43 am
by oracle
Hi Ravindra,

Could you please give some more info on @PARTITIONNUM AND @NUMPARTITIONS

My Understanding is if there are 4 partitions then @NUMPARTITIONS value is 4 but no idea on @PARTITIONNUM... trying to understad how this value will be fetched.

Thanks in Advance

Re: Data inconistency with Transfomer n paralle

Posted: Thu May 20, 2010 4:20 am
by ravindras83
oracle wrote:Hi Ravindra,

Could you please give some more info on @PARTITIONNUM AND @NUMPARTITIONS

My Understanding is if there are 4 partitions then @NUMPARTITIONS value is 4 but no idea on @PARTITIONNUM... trying to understad how this value will be fetched.

Thanks in Advance
@PARTITIONNUM is the partition number
If there are two nodes defined you will have partition numbers (0 and 1)

You will have to use this value for the first record processed