Hi,
I am generating a unique identifcation number (UID) in a transfomer.
The job design like
Dataset---->Sort ---> Transformer---> Target.
I am sorting on columns A,B and generating keychange indicator.
In the transformer,When the keychange indicator is 1 , the UID s incremented else the previous value is generated using stage variables.
When I run the transfomer in parallel, duplicates UID are created, while when I run the transformer in sequential, unique UIDS are generated.
The transfomer,sort stage are partitioned on keys A,B.
Running the transfomer in sequentail will hamper the performance, any suggestin on this.
Data inconistency with Transfomer n paralle
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 15
- Joined: Tue Sep 22, 2009 5:54 am
Re: Data inconistency with Transfomer n paralle
Since transformer is running in parallel the unique numbers generated through stage variables will be unique within the partition only.
If you want unique numbers across all partitions then use the variables
@PARTITIONNUM AND @NUMPARTITIONS in the transformer
i.e give first value for stage variable as
@PARTITIONNUM
and increment the stage variable by
@NUMPARTITIONS
instead of 1
If you want unique numbers across all partitions then use the variables
@PARTITIONNUM AND @NUMPARTITIONS in the transformer
i.e give first value for stage variable as
@PARTITIONNUM
and increment the stage variable by
@NUMPARTITIONS
instead of 1
Re: Data inconistency with Transfomer n paralle
Hi Ravindra,
Could you please give some more info on @PARTITIONNUM AND @NUMPARTITIONS
My Understanding is if there are 4 partitions then @NUMPARTITIONS value is 4 but no idea on @PARTITIONNUM... trying to understad how this value will be fetched.
Thanks in Advance
Could you please give some more info on @PARTITIONNUM AND @NUMPARTITIONS
My Understanding is if there are 4 partitions then @NUMPARTITIONS value is 4 but no idea on @PARTITIONNUM... trying to understad how this value will be fetched.
Thanks in Advance
-
- Participant
- Posts: 15
- Joined: Tue Sep 22, 2009 5:54 am
Re: Data inconistency with Transfomer n paralle
@PARTITIONNUM is the partition numberoracle wrote:Hi Ravindra,
Could you please give some more info on @PARTITIONNUM AND @NUMPARTITIONS
My Understanding is if there are 4 partitions then @NUMPARTITIONS value is 4 but no idea on @PARTITIONNUM... trying to understad how this value will be fetched.
Thanks in Advance
If there are two nodes defined you will have partition numbers (0 and 1)
You will have to use this value for the first record processed