remove all duplicate records

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ds_dwh
Participant
Posts: 39
Joined: Fri May 14, 2010 6:06 am

remove all duplicate records

Post by ds_dwh »

hi,

source wiil be like this:
co1,col2,col3
100,a,hyd
101,b,hyd
102,c,blore

if i use remove duplicate o/p will be like this:100,a,hyd
102,c,blore
but required o/p will be like this:102,c,blore

can any one help.......
ANJI
antonyraj.deva
Premium Member
Premium Member
Posts: 138
Joined: Wed Jul 16, 2008 9:51 pm
Location: Kolkata

Post by antonyraj.deva »

ds_dwh wrote:co1,col2,col3
100,a,hyd
101,b,hyd
102,c,blore
Remove Duplicate stage works based on a key column. Firstly, What is your key column? And also your required output is unclear.

Thanks,
Tony
sureshreddy2009
Participant
Posts: 62
Joined: Sat Mar 07, 2009 4:59 am
Location: Chicago
Contact:

Post by sureshreddy2009 »

If your requirement is to remove all records which are repeated more than once then this is the logic

step1:read all the records
step2:pass to aggregator and count on particular key column
step3:use filter to pass the records where count=1
if you use aggregator basically all columns can't come as output so take help of copy stage and join stage
Suresh Reddy
ETL Developer
Research Operations

"its important to know in which direction we are moving rather than where we are"
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Sort and partition on the third column, which is declared as the "key" for the purposes of the Remove Duplicates stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mayura
Participant
Posts: 40
Joined: Fri Aug 01, 2008 5:58 am
Location: Mumbai

Re: remove all duplicate records

Post by mayura »

ds_dwh wrote:hi,

source wiil be like this:
co1,col2,col3
100,a,hyd
101,b,hyd
102,c,blore

if i use remove duplicate o/p will be like this:100,a,hyd
102,c,blore
but required o/p will be like this:102,c,blore

can any one help.......

use col3 as key column (depends on your process requirement) then u will get the good records also if you are using remove duplicate stage click on sort and unique options inside it.
hope it will heplful...
:idea:
Mayura
g_rkrish
Participant
Posts: 264
Joined: Wed Feb 08, 2006 12:06 am

Re: remove all duplicate records

Post by g_rkrish »

ds_dwh wrote:hi,

100,a,hyd
101,b,hyd
102,c,blore

if i use remove duplicate o/p will be like this:100,a,hyd
102,c,blore
but required o/p will be like this:102,c,blore

can any one help.......
Will the third column will be of same size.then u can use as key but when your coulmn comes like blore and banglore then you can't remove that.
RK
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Re: remove all duplicate records

Post by ray.wurlod »

ds_dwh wrote:hi,

source wiil be like this:
co1,col2,col3
100,a,hyd
101,b,hyd
102,c,blore

if i use remove duplicate o/p will be like this:100,a,hyd
102,c,blore
but required o/p will be like this:102,c,blore

can any one help.......
Please create a written specification about how this output is to be produced. Is it that you only want rows for which no duplicate occurs? In that case, use a fork-join design, count the distinct values in col3 using an Aggregator stage, join to original detail data, then pass only those rows for which the count is 1.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply