I have a requirement in which if a duplicate value occurs in a particular field, I need to drop both (or all) of the records with that value.
I'm guessing there may be a sql statement that could be devised to handle this, but I'd like to do it without that if possible. Anyone have suggestions?
Thanks in advance.
Tom
Removing both duplicates
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
How about sorting the data on your key criteria, adding a cluster key change column (1 for first duplicate, 0 for subsequent ones), then sort again but add the change column, then a transform stage using a stage variable to store previous record key data. If change column = 1 and previous record is change column 0 and has the same key, drop the record. Drop the record if the change column is 0 as well.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Hi Tom,
Welcome on board.
I have implemented this solution before. I will suggest you to use two sort stages and one transformer stage to remove both instance incase of duplication.
1st Sort Stage: you need to sort the data on keys on which you are identifying duplicate records & enable KEY CHANGE column in this sort stage.
2nd Sort Stage: use the same fields on which you sorted the data and keep dont sort previously sorted data and sort only KeyChange Field in ascending order.
Transformer Stage: use two stage variables:
1stStage variable--> DupRec:=>If keyChange =0 Or (keyChange=1 And PreKeyChange = 0) Then "DUPE" Else "UNIQUE"
2ndStageVar--->PreKeyChange =>Map it to keychange field
In your transformer Stage, you can put the variable DUPE in to constraint to get all duplicate records and "UNIQUE" to get all unique records.
Let me know if you face any problem.
Welcome on board.
I have implemented this solution before. I will suggest you to use two sort stages and one transformer stage to remove both instance incase of duplication.
1st Sort Stage: you need to sort the data on keys on which you are identifying duplicate records & enable KEY CHANGE column in this sort stage.
2nd Sort Stage: use the same fields on which you sorted the data and keep dont sort previously sorted data and sort only KeyChange Field in ascending order.
Transformer Stage: use two stage variables:
1stStage variable--> DupRec:=>If keyChange =0 Or (keyChange=1 And PreKeyChange = 0) Then "DUPE" Else "UNIQUE"
2ndStageVar--->PreKeyChange =>Map it to keychange field
In your transformer Stage, you can put the variable DUPE in to constraint to get all duplicate records and "UNIQUE" to get all unique records.
Let me know if you face any problem.
Regards,
Deepak Singhal
Everything is okay in the end. If it's not okay, then it's not the end.
Deepak Singhal
Everything is okay in the end. If it's not okay, then it's not the end.
-
- Participant
- Posts: 152
- Joined: Tue Jan 13, 2009 8:59 am
Re: Removing both duplicates
hobocamp wrote:I have a requirement in which if a duplicate value occurs in a particular field, I need to drop both (or all) of the records with that value.
I'm guessing there may be a sql statement that could be devised to handle this, but I'd like to do it without that if possible. Anyone have suggestions?
Thanks in advance.
Tom