hi,
source wiil be like this:
co1,col2,col3
100,a,hyd
101,b,hyd
102,c,blore
if i use remove duplicate o/p will be like this:100,a,hyd
102,c,blore
but required o/p will be like this:102,c,blore
can any one help.......
remove all duplicate records
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 138
- Joined: Wed Jul 16, 2008 9:51 pm
- Location: Kolkata
-
- Participant
- Posts: 62
- Joined: Sat Mar 07, 2009 4:59 am
- Location: Chicago
- Contact:
If your requirement is to remove all records which are repeated more than once then this is the logic
step1:read all the records
step2:pass to aggregator and count on particular key column
step3:use filter to pass the records where count=1
if you use aggregator basically all columns can't come as output so take help of copy stage and join stage
step1:read all the records
step2:pass to aggregator and count on particular key column
step3:use filter to pass the records where count=1
if you use aggregator basically all columns can't come as output so take help of copy stage and join stage
Suresh Reddy
ETL Developer
Research Operations
"its important to know in which direction we are moving rather than where we are"
ETL Developer
Research Operations
"its important to know in which direction we are moving rather than where we are"
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Re: remove all duplicate records
ds_dwh wrote:hi,
source wiil be like this:
co1,col2,col3
100,a,hyd
101,b,hyd
102,c,blore
if i use remove duplicate o/p will be like this:100,a,hyd
102,c,blore
but required o/p will be like this:102,c,blore
can any one help.......
use col3 as key column (depends on your process requirement) then u will get the good records also if you are using remove duplicate stage click on sort and unique options inside it.
hope it will heplful...
Mayura
Re: remove all duplicate records
Will the third column will be of same size.then u can use as key but when your coulmn comes like blore and banglore then you can't remove that.ds_dwh wrote:hi,
100,a,hyd
101,b,hyd
102,c,blore
if i use remove duplicate o/p will be like this:100,a,hyd
102,c,blore
but required o/p will be like this:102,c,blore
can any one help.......
RK
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Re: remove all duplicate records
Please create a written specification about how this output is to be produced. Is it that you only want rows for which no duplicate occurs? In that case, use a fork-join design, count the distinct values in col3 using an Aggregator stage, join to original detail data, then pass only those rows for which the count is 1.ds_dwh wrote:hi,
source wiil be like this:
co1,col2,col3
100,a,hyd
101,b,hyd
102,c,blore
if i use remove duplicate o/p will be like this:100,a,hyd
102,c,blore
but required o/p will be like this:102,c,blore
can any one help.......
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.