Here is the data:
State City
a 1
a 2
a 3
b 4
b 5
c 6
I want remove duplcates (whic i can do using remove duplicate stage). And also want to capture removed duplicates.
i,e,
Output 1:
a 1
b 4
c 6
Output2;
a 2
a 3
a 4
a 5
b 5
Any suggestions.
Capture Duplicates
Moderators: chulett, rschirm, roy
-
- Charter Member
- Posts: 822
- Joined: Sat Sep 17, 2005 5:25 pm
- Location: USA
Re: Capture Duplicates
On what logic do want the extract that output2??? That doesn't seem to be just duplicates...bikan wrote: Output2:
a 2
a 3
a 4
a 5
b 5
Any suggestions.
I haven't failed, I've found 10,000 ways that don't work.
Thomas Alva Edison(1847-1931)
Thomas Alva Edison(1847-1931)
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The most commonly used method, as far as I am aware, is to use a "fork join" - split the data stream into two and, on one of them, count each group. Downstream of that join the count back on to the original rows. You can use the count to determine the duplicates - the rows having the count greater than or equal to two (this might be a Filter stage WHERE clause, for example). To remove the duplicates, use a Remove Duplicates stage against another copy of the data. Make sure that your data are partitioned by the grouping key.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.