Storing duplicate records
Moderators: chulett, rschirm, roy
Storing duplicate records
Using 'Remove Duplicate' stage we can remove the duplicates. Now since we can not have Reject link with that stage so how to collect (and store them in a file) the duplicate records that are being rejected.
-
- Premium Member
- Posts: 62
- Joined: Tue Jun 14, 2005 7:17 pm
- Location: Australia
- Contact:
You can create a "Manual" Remove Duplicates by using a Transformer.
Ensure your data is sorted and partitioned on the grouping keys before entering the stage. Then, use stage variables to assess each new attribute against the preceeding rows.
Ensure your data is sorted and partitioned on the grouping keys before entering the stage. Then, use stage variables to assess each new attribute against the preceeding rows.
- - Create a stage variable called NewID and set as current row ID.
- Evaluate OldID against NewID.
- Create a stage variable called OldID and set as current row ID.
-
- Premium Member
- Posts: 62
- Joined: Tue Jun 14, 2005 7:17 pm
- Location: Australia
- Contact:
This has been discussed many times. One other option, Sort and Enable the KeyChange option. And in transformer, you can track the value of this field and filter based on it.
Or Remove the duplicate, and find a difference with the original file using Difference stage.
And there are many options too.
Or Remove the duplicate, and find a difference with the original file using Difference stage.
And there are many options too.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'