How to Remove Duplicates?
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 467
- Joined: Tue Mar 20, 2007 6:36 am
- Location: Chennai
- Contact:
How to Remove Duplicates?
Hi,
I have a scenario where I need to delete all the duplicates from the source file.
Say I have this i/p data -->
EmpID
====
1
1
2
3
4
4
O/P Data-->
EmpID
====
2
3
Here since 1 and 4 are repeating I need to remove all the rows which have 1 or 4 as the EmpID.
I am implementing it this way -->
Here I have an Aggregator which counts the EmpIDs and then do a join on the key column EmpID.
In the next transformer I pass the rows only if count=1.
Is there a better and more elegant way to do it?
Appreciate all your time and help.
Thanks a lot.[/img]
I have a scenario where I need to delete all the duplicates from the source file.
Say I have this i/p data -->
EmpID
====
1
1
2
3
4
4
O/P Data-->
EmpID
====
2
3
Here since 1 and 4 are repeating I need to remove all the rows which have 1 or 4 as the EmpID.
I am implementing it this way -->
Here I have an Aggregator which counts the EmpIDs and then do a join on the key column EmpID.
In the next transformer I pass the rows only if count=1.
Is there a better and more elegant way to do it?
Appreciate all your time and help.
Thanks a lot.[/img]
Minhajuddin
<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
-
- Participant
- Posts: 467
- Joined: Tue Mar 20, 2007 6:36 am
- Location: Chennai
- Contact:
Thank you Maverick.
But I forgot to mention that I had more than one column(Almost 10 columns). And I need to check the duplicates on a single column. Just for the sake of simplicity I posted it using a single column.
Thanks for all the help.
But I forgot to mention that I had more than one column(Almost 10 columns). And I need to check the duplicates on a single column. Just for the sake of simplicity I posted it using a single column.
Thanks for all the help.
Minhajuddin
<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
One thing i am not sure is if u can propagate all the input fields to the output in the aggregator stage. If you can propagate all the input fields then the logic above should work. If you can propagate only the key fields and the output field then you need to join it to main data and then use filter stage.
-
- Participant
- Posts: 467
- Joined: Tue Mar 20, 2007 6:36 am
- Location: Chennai
- Contact:
Isn't there a special stage in Datastage which accomplishes this?
Minhajuddin
<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
-
- Participant
- Posts: 467
- Joined: Tue Mar 20, 2007 6:36 am
- Location: Chennai
- Contact:
I doubt if we can accomplish this with Remove Duplicate stages.
Minhajuddin
<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
I think Maveric suggested a good solution. The question shouldn't have been "how to remove duplicates", since the goal is to remove duplicate rows as well as the original row if it has duplicates, which is a more complex operation.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 467
- Joined: Tue Mar 20, 2007 6:36 am
- Location: Chennai
- Contact:
Hi Josh,
That sounds like a good idea. I will try it today and let you know.
And thanks everybody for all your help.
That sounds like a good idea. I will try it today and let you know.
And thanks everybody for all your help.
Minhajuddin
<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Ray - because the poster also wants to remove the original row as well if duplicates exist.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 467
- Joined: Tue Mar 20, 2007 6:36 am
- Location: Chennai
- Contact:
Thank you Josh,
It's working fine! :D
Thank you all for all you valuable suggestions.
It's working fine! :D
Thank you all for all you valuable suggestions.
Minhajuddin
<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>