Page 1 of 1

remove duplicates

Posted: Mon Jul 27, 2009 3:27 am
by sunitha_cts
HI,

Instead of using Remove duplicate stage , is they any other way to remove duplicates from a table.

Thanks
sunitha

Posted: Mon Jul 27, 2009 3:38 am
by ArndW
Yes, there are other ways. All involve sorting the data. The Sort can remove duplicate rows and you can use stage variables in a transform stage to detect duplicate column values and drop those rows.

Re: remove duplicates

Posted: Mon Jul 27, 2009 3:51 am
by algfr
sunitha_cts wrote:HI,

Instead of using Remove duplicate stage , is they any other way to remove duplicates from a table.

Thanks
sunitha
If you query from a DB, you can also use the DISTINCT keyword in the query sentence. This is helpful to restrict as early as possible the number of rows that are retrieved.

Posted: Mon Jul 27, 2009 4:00 am
by ShaneMuir
It will also depend if you want to capture those duplicates for reporting purposes.

Posted: Mon Jul 27, 2009 4:07 am
by asnrece1
by using the sort stage and putting the Cluster index column properety then apply the filter or transformer then wil send the first records and filter the duplicated records. This will be useful in the middle of the job.

Posted: Mon Jul 27, 2009 4:46 am
by nagarjuna
you can use sort stage and unique option within it .

Posted: Mon Jul 27, 2009 5:22 am
by sunitha_cts
'unique' in which stage property

Posted: Mon Jul 27, 2009 5:35 am
by nagarjuna
i think we have an option unique in sort stage

Posted: Mon Jul 27, 2009 5:38 am
by sunitha_cts
No I guess

remove duplicates

Posted: Mon Jul 27, 2009 5:50 am
by chrisjones
Hi Sunita,


In the sort stage keep ALLOW DUPLICATES as False this will help you to remove duplicates.

Thanks,
Chris

Posted: Mon Jul 27, 2009 5:55 am
by sunitha_cts
Thanks Cris

Yes

Posted: Tue Jul 28, 2009 9:44 am
by aladap12
sunitha_cts wrote:Thanks Cris
Yes, we can remove duplicates using sort stage and also every output link..but make sure that your partition type is other that auto and select perform sort and unique.. that will give you unique records..

hope this makes sense..
krishna

Posted: Tue Jul 28, 2009 4:37 pm
by ray.wurlod
Sorting is specified on input links, not on output links.

Re: Yes

Posted: Tue Aug 11, 2009 5:45 am
by SBSA_DW
aladap12 wrote:
sunitha_cts wrote:Thanks Cris
Yes, we can remove duplicates using sort stage and also every output link..but make sure that your partition type is other that auto and select perform sort and unique.. that will give you unique records..

hope this makes sense..
krishna
Thanks for the info , but do you trap those duplicates records from transformer stage.