Remove duplicates using Transformer

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
siva7143
Participant
Posts: 35
Joined: Wed Aug 09, 2006 12:20 am

Remove duplicates using Transformer

Post by siva7143 »

Hi ,

how to remove duplicate rows using transformer stage?


Thanks in advance,
Siva Kumar N :)
mahadev.v
Participant
Posts: 111
Joined: Tue May 06, 2008 5:29 am
Location: Bangalore

Post by mahadev.v »

You have a stage called remove duplicates to do this for you. Any specific reason for wanting to using transformer?
"given enough eyeballs, all bugs are shallow" - Eric S. Raymond
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you search the forum you will find that this question has been answered before. Searching means that you can get your answer faster than posting and waiting for someone to answer.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
singhald
Participant
Posts: 180
Joined: Tue Aug 23, 2005 2:50 am
Location: Bangalore
Contact:

Post by singhald »

hello Mahadev,


it is very tricy to remove duplcicate records using transformer stage,

to do that , you need to use two stages,

1) Sort Stage : you need to sort your indatcoming data based on key attributes on which you want to remove duplicate records,
also enable KeyChange options inside the sort stage.

2) in transformer: use the KeyChange coulmn in the constraint.

KeyChange =1

if you use this , in output you will find all unique records.


Please mark Resolved this post if you able to remove duplicate records using transformer.

Regards,
Regards,
Deepak Singhal
Everything is okay in the end. If it's not okay, then it's not the end.
fareeda_b
Participant
Posts: 48
Joined: Sat Feb 23, 2008 4:25 pm

Need clarification in Transformer

Post by fareeda_b »

Hi Deepak,

In sort stage will have key change column right but we don't have option as key change=1 in transformer so how will remove dupilcates in transformer .
my logic is
take 3 stagevaraibles
st1 =st3
st2= if stg1=stg3 then duplictae else Not-duplicate
st3= column name

if i'm wrong this please correct me .

thanks in advance
fareeda
Thanks
fareeda_b
Participant
Posts: 48
Joined: Sat Feb 23, 2008 4:25 pm

Need clarification in Transformer

Post by fareeda_b »

Hi Deepak,

In sort stage will have key change column right but we don't have option as key change=1 in transformer so how will remove dupilcates in transformer .
my logic is
take 3 stagevaraibles
st1 =st3
st2= if stg1=stg3 then duplictae else Not-duplicate
st3= column name

if i'm wrong this please correct me .

thanks in advance
fareeda
Thanks
battaliou
Participant
Posts: 155
Joined: Mon Feb 24, 2003 7:28 am
Location: London
Contact:

Re: Need clarification in Transformer

Post by battaliou »

Your logic is fine
3NF: Every non-key attribute must provide a fact about the key, the whole key, and nothing but the key. So help me Codd.
singhald
Participant
Posts: 180
Joined: Tue Aug 23, 2005 2:50 am
Location: Bangalore
Contact:

Post by singhald »

i have a question, why can't you use KeyChange=1 in transformer instead of using stage variable. is there any problem.
Regards,
Deepak Singhal
Everything is okay in the end. If it's not okay, then it's not the end.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Only that there was no mention of a Sort stage in the original question. Without a Sort stage you don't get a KeyChange column. But it would be a viable solution.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply