Remove duplicates using Transformer Stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kashif007
Premium Member
Premium Member
Posts: 216
Joined: Wed Jun 07, 2006 5:48 pm
Location: teaneck

Remove duplicates using Transformer Stage

Post by kashif007 »

Is it possible to remove duplicates by using the Transformer stage ? If yes then how can we accomplish that ? I was thinking to sort the data using sort stage and write a logic in the transformer stage variable and constraint to stream out the unwanted duplicate data and retain only one of the many duplicate records. Am I correct ?
Regards
Kashif Khan
jaybee223
Participant
Posts: 5
Joined: Fri Jul 11, 2008 6:54 am

Re: Remove duplicates using Transformer Stage

Post by jaybee223 »

Why dont you use the "Remove Duplicates" stage?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If the data is sorted and partitioned on the column with the duplicates, then use two stage variables, "IsDup" and "LastValue" derived via "IF In.ColumnName=LastValue THEN 1 ELSE 0" and "In.ColumnName" respectively, the constraint would be "IsDup=0" to only pass on non-duplicate rows.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Generate a Key Change column in the Sort stage and apply a constraint in the Transformer stage that this column has the value 1. Or simply specify a unique sort.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kashif007
Premium Member
Premium Member
Posts: 216
Joined: Wed Jun 07, 2006 5:48 pm
Location: teaneck

Post by kashif007 »

Thanks Everybody.
Regards
Kashif Khan
Post Reply