To find out duplicate records
Moderators: chulett, rschirm, roy
To find out duplicate records
I have some duplicate records in my source, which are getting rejected at the target. I can use remove duplicate stage to remove those records. But I need to capture those records in some other table/file. How can I do this? Reject link from transformer is not working, as the records are getting rejected at the target (at end) itselt.
Would appreciate any suggestions....
Thanks,
Would appreciate any suggestions....
Thanks,
-
- Participant
- Posts: 42
- Joined: Tue Jan 18, 2005 5:48 am
Re: To find out duplicate records
Hi Amarpreet,
In the final Transformer Stage you can concate all key columns to a long String and Store it into a Stage Variable named STG1.
Now another variable STG2 = STG1.
Compare the two.by this you will capture the rows in the two variables.Check them and put the constaint accordingly on the links of Transformer.
But it is necessary to sort the data on the key columns before doing this.
In the final Transformer Stage you can concate all key columns to a long String and Store it into a Stage Variable named STG1.
Now another variable STG2 = STG1.
Compare the two.by this you will capture the rows in the two variables.Check them and put the constaint accordingly on the links of Transformer.
But it is necessary to sort the data on the key columns before doing this.
regards,
Jayant S Thakore
Jayant S Thakore
Re: To find out duplicate records
[img]I can use remove duplicate stage to remove those records. But I need to capture those records in some other table/file.
[/img]
As you said You want whole copy of record and also want to remove the duplicates then use copy stage before remove duplicate stage.
file-->copystage----->1) File or Table
2) Remove Duplicate stage
[/img]
As you said You want whole copy of record and also want to remove the duplicates then use copy stage before remove duplicate stage.
file-->copystage----->1) File or Table
2) Remove Duplicate stage
Re: To find out duplicate records
Hi
If you are using Target Oracla Enterprise stage which has reject option to reject duplicate records which we can load into dataset or sequential file based on key columns in target table.
SOURCE ---> ORA ENTERPRICE STAGE ----------> SEQUENTIAL FILE.
I think this would help to your requirement.
Thanks
If you are using Target Oracla Enterprise stage which has reject option to reject duplicate records which we can load into dataset or sequential file based on key columns in target table.
SOURCE ---> ORA ENTERPRICE STAGE ----------> SEQUENTIAL FILE.
I think this would help to your requirement.
Thanks
Using the Oracle stage limits you to upserts and the rejected data is for rows that could not be inserted or updated. This means the duplicate could already reside in the database and is not quite the same as the problem described in this post.
A remove duplicates stage requires your data to be sorted so presumably you have a sort stage prior to it? In the sort stage generate a key change on the columns that define the key. If you are sorting by more than the key columns, for instance to ensure the first record is the one you want to keep, you can achieve the same thing using 2 sorts and generating a cluster key column on the key fields.
After the sort simply filter on the key change or cluster key change to identify duplicates (values of 0).
A remove duplicates stage requires your data to be sorted so presumably you have a sort stage prior to it? In the sort stage generate a key change on the columns that define the key. If you are sorting by more than the key columns, for instance to ensure the first record is the one you want to keep, you can achieve the same thing using 2 sorts and generating a cluster key column on the key fields.
After the sort simply filter on the key change or cluster key change to identify duplicates (values of 0).
Its a round about way, but you can achieve what do you want. From the transformer output your data into both DataSet(hashed file) and a sequential file. DataSet will automatically removes the duplicates based on the key column.
Now you can compare/diff both DataSet and Seq File to output the duplicated record into a file. From DataSet(hashed) you can load into Oracle Stage.
Now you can compare/diff both DataSet and Seq File to output the duplicated record into a file. From DataSet(hashed) you can load into Oracle Stage.
-
- Premium Member
- Posts: 353
- Joined: Wed Apr 06, 2005 8:45 am
-
- Premium Member
- Posts: 70
- Joined: Thu Aug 14, 2003 6:24 am
- Contact:
Re: To find out duplicate records
Use sort stage with Hash method and set "Create Key Change Column" property to true. So that first row will get this new column value as 1 and duplicates as 0. Then, Use switch or filter stage to fileter rows with 1 and 0. Is is simple way and shows better performance also. I used that.
Sree
Sree
Amarpreet wrote:I have some duplicate records in my source, which are getting rejected at the target. I can use remove duplicate stage to remove those records. But I need to capture those records in some other table/file. How can I do this? Reject link from transformer is not working, as the records are getting rejected at the target (at end) itselt.
Would appreciate any suggestions....
Thanks,
mujeebur wrote ...
This is a PX job so is using datasets not hashed files. Datasets do not automatically remove duplicates.Its a round about way, but you can achieve what do you want. From the transformer output your data into both DataSet(hashed file) and a sequential file. DataSet will automatically removes the duplicates based on the key column.
Now you can compare/diff both DataSet and Seq File to output the duplicated record into a file. From DataSet(hashed) you can load into Oracle Stage.