Remove Duplicates
Posted: Fri Oct 21, 2016 7:08 pm
Hi,
I have a scenario where I need to remove duplicates using a complete record based on Y or N indicator. A few columns in the record have nulls. How do I need to partition?
This is what I have done, sorted the input rows using sort stage with Indicator in desc. Partitioned the data using all columns except the indicator. Then used the remove duplicate stage with same partition and sorted using the indicator and other key columns (text columns).
Remove duplicate is not working correctly. Sometimes it works and sometimes doesn't. Can any one let me know where exactly I'm going wrong.
Thanks
I have a scenario where I need to remove duplicates using a complete record based on Y or N indicator. A few columns in the record have nulls. How do I need to partition?
This is what I have done, sorted the input rows using sort stage with Indicator in desc. Partitioned the data using all columns except the indicator. Then used the remove duplicate stage with same partition and sorted using the indicator and other key columns (text columns).
Remove duplicate is not working correctly. Sometimes it works and sometimes doesn't. Can any one let me know where exactly I'm going wrong.
Thanks