Page 1 of 1

Remove duplicates

Posted: Mon Jun 23, 2008 1:22 am
by uppalapati2003
Hello All,

in my source i am having duplicates,if any duplicates in the source i want reject those two records
kindly help me on that

Thanks

Posted: Mon Jun 23, 2008 1:28 am
by keshav0307
this has been discussed many many times here in this forum. try search.

Posted: Mon Jun 23, 2008 1:38 am
by keshav0307
your question is not very clear to me.

"if any duplicates in the source i want reject those two records"

you want to reject both the records

or

only want to remove the duplicate.

Posted: Mon Jun 23, 2008 1:44 am
by uppalapati2003
i want remove two records

Re: Remove duplicates

Posted: Mon Jun 23, 2008 2:01 am
by sreddy
Uppalapati
  • Use Sort stages instead of Remove duplicate stages. Sort stage has got more grouping options and sort indicator options.

    sort the records using the key field.In sort stage put "key change column = true".Then zero will be assigned to the duplicate records.then put a condition as which is record is zero then send it to reject link
-------------------------------------------------------------------------------------

The Remove Duplicates doesn't have a reject option, nor does the sort stage with remove duplicates checked.

To capture rejected duplicates use a Transformer. Partition and sort on your primary key. In a transformer keep the primary key stored in a Stage Variable. Compare incoming primary key to the stored primary key Stage Variable. If it is the same output the incoming row as a duplicate, if it is different output the row as unique and save the new primary key.

You need at least two stage variables, one to do the comparison and the other to store the key value:

Variable: Derivation
IsDuplicate: input.keyfield = SavedKey
SavedKey: input.keyfield


uppalapati2003 wrote:Hello All,

in my source i am having duplicates,if any duplicates in the source i want reject those two records
kindly help me on that

Thanks

Posted: Mon Jun 23, 2008 2:18 am
by uppalapati2003
First of all thanks for u responce
I am not sure In this sceneraio whether both records will reject or single record

For Example
i have a data like this

10,AAA
20,BBB
20.BBB
30,CCC
in this my output should be the
10,AAA
30,CCC

The Id 20 has to be in the rejected file

Thanks

Posted: Mon Jun 23, 2008 6:00 am
by ray.wurlod
You need a "fork join" design. Use a Copy stage to send the first column through an aggregator to get counted, then join back to the detail rows with a Join stage. You will have the count along with each detail row. Then filter based on the value of the count.