Remove duplicates

uppalapati2003 · Post by **uppalapati2003** » Mon Jun 23, 2008 1:22 am

Hello All,

in my source i am having duplicates,if any duplicates in the source i want reject those two records
kindly help me on that

Thanks

keshav0307 · Post by **keshav0307** » Mon Jun 23, 2008 1:28 am

this has been discussed many many times here in this forum. try search.

keshav0307 · Post by **keshav0307** » Mon Jun 23, 2008 1:38 am

your question is not very clear to me.

"if any duplicates in the source i want reject those two records"

you want to reject both the records

or

only want to remove the duplicate.

uppalapati2003 · Post by **uppalapati2003** » Mon Jun 23, 2008 1:44 am

i want remove two records

sreddy · Post by **sreddy** » Mon Jun 23, 2008 2:01 am

Uppalapati

Use Sort stages instead of Remove duplicate stages. Sort stage has got more grouping options and sort indicator options.

sort the records using the key field.In sort stage put "key change column = true".Then zero will be assigned to the duplicate records.then put a condition as which is record is zero then send it to reject link

-------------------------------------------------------------------------------------

The Remove Duplicates doesn't have a reject option, nor does the sort stage with remove duplicates checked.

To capture rejected duplicates use a Transformer. Partition and sort on your primary key. In a transformer keep the primary key stored in a Stage Variable. Compare incoming primary key to the stored primary key Stage Variable. If it is the same output the incoming row as a duplicate, if it is different output the row as unique and save the new primary key.

You need at least two stage variables, one to do the comparison and the other to store the key value:

Variable: Derivation
IsDuplicate: input.keyfield = SavedKey
SavedKey: input.keyfield

uppalapati2003 wrote:Hello All,

in my source i am having duplicates,if any duplicates in the source i want reject those two records
kindly help me on that

Thanks

uppalapati2003 · Post by **uppalapati2003** » Mon Jun 23, 2008 2:18 am

First of all thanks for u responce
I am not sure In this sceneraio whether both records will reject or single record

For Example
i have a data like this

10,AAA
20,BBB
20.BBB
30,CCC
in this my output should be the
10,AAA
30,CCC

The Id 20 has to be in the rejected file

Thanks

ray.wurlod · Post by **ray.wurlod** » Mon Jun 23, 2008 6:00 am

You need a "fork join" design. Use a Copy stage to send the first column through an aggregator to get counted, then join back to the detail rows with a Join stage. You will have the count along with each detail row. Then filter based on the value of the count.

DSXchange

Remove duplicates

Remove duplicates

Re: Remove duplicates