Storing duplicate records

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
aaikat
Participant
Posts: 47
Joined: Tue Mar 07, 2006 2:49 am

Storing duplicate records

Post by aaikat »

Using 'Remove Duplicate' stage we can remove the duplicates. Now since we can not have Reject link with that stage so how to collect (and store them in a file) the duplicate records that are being rejected.
Daddy Doma
Premium Member
Premium Member
Posts: 62
Joined: Tue Jun 14, 2005 7:17 pm
Location: Australia
Contact:

Post by Daddy Doma »

You can create a "Manual" Remove Duplicates by using a Transformer.

Ensure your data is sorted and partitioned on the grouping keys before entering the stage. Then, use stage variables to assess each new attribute against the preceeding rows.
  • - Create a stage variable called NewID and set as current row ID.
    - Evaluate OldID against NewID.
    - Create a stage variable called OldID and set as current row ID.
This is possible because stage variables are processes in sequence. When you get to row 2, the key from row 1 is still in the OldID field, so it can compare them.
Daddy Doma
Premium Member
Premium Member
Posts: 62
Joined: Tue Jun 14, 2005 7:17 pm
Location: Australia
Contact:

Post by Daddy Doma »

Oops! Forgot to say: The value of your assessment stage variable sends the rows to different output links from the transformer. If the OldID and NewID match, send it as a duplicate. If they don't match, then it is the first record and any other duplicate records will follow it.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

This has been discussed many times. One other option, Sort and Enable the KeyChange option. And in transformer, you can track the value of this field and filter based on it.
Or Remove the duplicate, and find a difference with the original file using Difference stage.
And there are many options too.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Post Reply