remove duplicates

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sunitha_cts
Participant
Posts: 98
Joined: Thu Feb 05, 2009 1:14 am
Location: visakhapatnam
Contact:

remove duplicates

Post by sunitha_cts »

HI,

Instead of using Remove duplicate stage , is they any other way to remove duplicates from a table.

Thanks
sunitha
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Yes, there are other ways. All involve sorting the data. The Sort can remove duplicate rows and you can use stage variables in a transform stage to detect duplicate column values and drop those rows.
algfr
Participant
Posts: 106
Joined: Fri Sep 09, 2005 7:42 am

Re: remove duplicates

Post by algfr »

sunitha_cts wrote:HI,

Instead of using Remove duplicate stage , is they any other way to remove duplicates from a table.

Thanks
sunitha
If you query from a DB, you can also use the DISTINCT keyword in the query sentence. This is helpful to restrict as early as possible the number of rows that are retrieved.
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

It will also depend if you want to capture those duplicates for reporting purposes.
asnrece1
Participant
Posts: 3
Joined: Thu Nov 20, 2008 6:11 am

Post by asnrece1 »

by using the sort stage and putting the Cluster index column properety then apply the filter or transformer then wil send the first records and filter the duplicated records. This will be useful in the middle of the job.
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Post by nagarjuna »

you can use sort stage and unique option within it .
Nag
sunitha_cts
Participant
Posts: 98
Joined: Thu Feb 05, 2009 1:14 am
Location: visakhapatnam
Contact:

Post by sunitha_cts »

'unique' in which stage property
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Post by nagarjuna »

i think we have an option unique in sort stage
Nag
sunitha_cts
Participant
Posts: 98
Joined: Thu Feb 05, 2009 1:14 am
Location: visakhapatnam
Contact:

Post by sunitha_cts »

No I guess
chrisjones
Participant
Posts: 194
Joined: Thu May 11, 2006 9:42 am

remove duplicates

Post by chrisjones »

Hi Sunita,


In the sort stage keep ALLOW DUPLICATES as False this will help you to remove duplicates.

Thanks,
Chris
Thanks,
Chris Jones
sunitha_cts
Participant
Posts: 98
Joined: Thu Feb 05, 2009 1:14 am
Location: visakhapatnam
Contact:

Post by sunitha_cts »

Thanks Cris
aladap12
Participant
Posts: 60
Joined: Fri Jul 20, 2007 1:15 pm
Location: NO

Yes

Post by aladap12 »

sunitha_cts wrote:Thanks Cris
Yes, we can remove duplicates using sort stage and also every output link..but make sure that your partition type is other that auto and select perform sort and unique.. that will give you unique records..

hope this makes sense..
krishna
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Sorting is specified on input links, not on output links.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
SBSA_DW
Premium Member
Premium Member
Posts: 6
Joined: Thu Jul 24, 2008 1:53 am

Re: Yes

Post by SBSA_DW »

aladap12 wrote:
sunitha_cts wrote:Thanks Cris
Yes, we can remove duplicates using sort stage and also every output link..but make sure that your partition type is other that auto and select perform sort and unique.. that will give you unique records..

hope this makes sense..
krishna
Thanks for the info , but do you trap those duplicates records from transformer stage.
Post Reply