Capturing the Duplicates
Moderators: chulett, rschirm, roy
Capturing the Duplicates
Hi,
I am using the Remove duplicate stage to eliminate the duplicates. In the properties of Rem Dup Stage there is one Option called "Duplicate To Retain" . It has two values like First or Last.
I want to capture the Duplicate data. How can I achieve this?
Any answer is appreciated.
thanks in advance
I am using the Remove duplicate stage to eliminate the duplicates. In the properties of Rem Dup Stage there is one Option called "Duplicate To Retain" . It has two values like First or Last.
I want to capture the Duplicate data. How can I achieve this?
Any answer is appreciated.
thanks in advance
Ravi
Hi ravi
I do not think it is possible to capture the duplicate date in remove duplicate stage.
Remove duplicate stage takes single dataset as input and outputs a single dataset with duplicates removed.
When two records are duplicate of each other by default first record is retained and other discarded. "Duplicate to Retain" option allows you to retain last record rather than the first.
I do not think it is possible to capture the duplicate date in remove duplicate stage.
Remove duplicate stage takes single dataset as input and outputs a single dataset with duplicates removed.
When two records are duplicate of each other by default first record is retained and other discarded. "Duplicate to Retain" option allows you to retain last record rather than the first.
Hi rajiv,
There is no option to capture duplicated from REMOVE DUPLICATE stage. No reject option as well.
Better find a DIFFERENCE between the original dataset and the final dataset where in which duplicates are removed.
Or you have another option, use sort to capture the change in the key, and a transformer to collect the duplicates.
Other work around would be, do the join/merge between the two dataset and extract the unmatched rows, which is internally accomplished by difference.
-Kumar
There is no option to capture duplicated from REMOVE DUPLICATE stage. No reject option as well.
Better find a DIFFERENCE between the original dataset and the final dataset where in which duplicates are removed.
Or you have another option, use sort to capture the change in the key, and a transformer to collect the duplicates.
Other work around would be, do the join/merge between the two dataset and extract the unmatched rows, which is internally accomplished by difference.
-Kumar
Hi,
In sort stage you have a option called Create Key Change Column.
Enable it to true. It will give you the information of the change in Key column.
i.e,
You can use a transformer to collect the rows which are greater than 1 in KeyChangeCol.
-Kumar
In sort stage you have a option called Create Key Change Column.
Enable it to true. It will give you the information of the change in Key column.
i.e,
Code: Select all
Key KeyChangeCol
1000 1
1000 2
2000 1
3000 1
3000 2
3000 3
-Kumar
Kumar,
I have a doubt.
Will not the KeyChangeCol will be 0 for duplicates rather than 2, 3 etc..
In that case transformer should be change to accept rows greater than 0 right?
Correct me if i am wrong.
--Balaji S.R
Code: Select all
Key KeyChangeCol
1000 1
1000 2
2000 1
3000 1
3000 2
3000 3
Will not the KeyChangeCol will be 0 for duplicates rather than 2, 3 etc..
In that case transformer should be change to accept rows greater than 0 right?
Correct me if i am wrong.
--Balaji S.R
-
- Participant
- Posts: 437
- Joined: Fri Oct 21, 2005 10:00 pm
To then capture the duplicates you could use a filter stage to filter based upon the key column change = 1. Set your output rejects = True and hang a reject link off of the filter and put them wherever you want.
Keith Williams
keith@peacefieldinc.com
keith@peacefieldinc.com
Hi ,
Iam capturing duplicate records with sort stage Keychange columm. To achieve this Iam using two duplicate stages one for sorting records on key1,key2,key3 and key4(price) and another sort stage is for keychange columm. Actually my requirement is to find duplicate for key1,key2 and key3 and I have to sort key4(price) descending to get maximum price and capture that. If I am using only one sort stage Iam getting key4(price) with different valuess KeyChange columm value one.so Iam getting both price records even though they have same key1,key2,key3. so Iam using second sort stage. Iam able to achieve my target but an warning message is saying that sort stage already sorted on keys .How to eliminate this warning message.
Thanks,
somaraju
Iam capturing duplicate records with sort stage Keychange columm. To achieve this Iam using two duplicate stages one for sorting records on key1,key2,key3 and key4(price) and another sort stage is for keychange columm. Actually my requirement is to find duplicate for key1,key2 and key3 and I have to sort key4(price) descending to get maximum price and capture that. If I am using only one sort stage Iam getting key4(price) with different valuess KeyChange columm value one.so Iam getting both price records even though they have same key1,key2,key3. so Iam using second sort stage. Iam able to achieve my target but an warning message is saying that sort stage already sorted on keys .How to eliminate this warning message.
Thanks,
somaraju
somaraju