Data inconistency with Transfomer n paralle

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Gokul
Participant
Posts: 74
Joined: Wed Feb 23, 2005 10:58 pm
Location: Mumbai

Data inconistency with Transfomer n paralle

Post by Gokul »

Hi,

I am generating a unique identifcation number (UID) in a transfomer.
The job design like

Dataset---->Sort ---> Transformer---> Target.

I am sorting on columns A,B and generating keychange indicator.
In the transformer,When the keychange indicator is 1 , the UID s incremented else the previous value is generated using stage variables.

When I run the transfomer in parallel, duplicates UID are created, while when I run the transformer in sequential, unique UIDS are generated.

The transfomer,sort stage are partitioned on keys A,B.

Running the transfomer in sequentail will hamper the performance, any suggestin on this.
ravindras83
Participant
Posts: 15
Joined: Tue Sep 22, 2009 5:54 am

Re: Data inconistency with Transfomer n paralle

Post by ravindras83 »

Since transformer is running in parallel the unique numbers generated through stage variables will be unique within the partition only.

If you want unique numbers across all partitions then use the variables

@PARTITIONNUM AND @NUMPARTITIONS in the transformer

i.e give first value for stage variable as
@PARTITIONNUM

and increment the stage variable by
@NUMPARTITIONS
instead of 1
oracle
Participant
Posts: 43
Joined: Sat Jun 25, 2005 11:52 pm

Re: Data inconistency with Transfomer n paralle

Post by oracle »

Hi Ravindra,

Could you please give some more info on @PARTITIONNUM AND @NUMPARTITIONS

My Understanding is if there are 4 partitions then @NUMPARTITIONS value is 4 but no idea on @PARTITIONNUM... trying to understad how this value will be fetched.

Thanks in Advance
ravindras83
Participant
Posts: 15
Joined: Tue Sep 22, 2009 5:54 am

Re: Data inconistency with Transfomer n paralle

Post by ravindras83 »

oracle wrote:Hi Ravindra,

Could you please give some more info on @PARTITIONNUM AND @NUMPARTITIONS

My Understanding is if there are 4 partitions then @NUMPARTITIONS value is 4 but no idea on @PARTITIONNUM... trying to understad how this value will be fetched.

Thanks in Advance
@PARTITIONNUM is the partition number
If there are two nodes defined you will have partition numbers (0 and 1)

You will have to use this value for the first record processed
Post Reply