hi,
I have an issue with the remove duplicates stage in the job. It is not removing the duplicates based on the key, even though identical values are coming in the key.
Could you please let me know why this could happen.
Thanks
Remove Duplicates not removing duplicates
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 437
- Joined: Fri Oct 21, 2005 10:00 pm
Re: Remove Duplicates not removing duplicates
Two ways I can think of:
1. The data is not hash partitioned in a manner that would have the records on the same partition.
2. The data is not sorted properly. The remove duplicate stage is removing duplicates when they are sorted by the key, essentially removing duplicates that are located one after another.
So is the data sorted and hash paritioned correctly?
1. The data is not hash partitioned in a manner that would have the records on the same partition.
2. The data is not sorted properly. The remove duplicate stage is removing duplicates when they are sorted by the key, essentially removing duplicates that are located one after another.
So is the data sorted and hash paritioned correctly?
Keith Williams
keith@peacefieldinc.com
keith@peacefieldinc.com
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: