Page 1 of 1

Removing Duplicates

Posted: Tue Nov 22, 2011 12:19 pm
by haryedru
Hi this is my input
ID,Term
1,CONTRACT_NUMBER
1,CONTRACT_NUMBER
1,CONTRACT_NUMBER
2,Account
3,Account_Currency_Code
3,Account_Currency_Code
4,Account_Analyst

This should be my output

ID,TERM
1,CONTRACT_NUMBER
1,CONTRACT_NUMBER
1,CONTRACT_NUMBER
3,Account_Currency_Code
3,Account_Currency_Code

If a record is repeating i need that in the output.
Can anyone help me on this. I tried using stage variables in transformer but i am getting 1 record less like this

ID,TERM
1,CONTRACT_NUMBER
1,CONTRACT_NUMBER
3,Account_Currency_Code

thanks in advance

Posted: Tue Nov 22, 2011 2:18 pm
by ray.wurlod
Show us what you tried. This is readily do-able; particularly easily if the data have been processed through a Sort stage that generates a Key Change column (assuming you're using a parallel job, which is not certain). You can, of course, do the key change test using stage variables.

Use more stage variables to detect that the key change value is 0 and the other field is the same as that in the previous row.

Posted: Tue Nov 22, 2011 2:31 pm
by uegodawa
You can sort the data and passed it to a Aggregator stage by grouping ID and TERM then defined a calculated field count. Next use a transformer stage [Constraint] to pass the records where Count() > 1

Posted: Tue Nov 22, 2011 11:49 pm
by haryedru
Hi Upul and Ray sorry for giving the requirment wrong.

this is my input
ID,Term
1,CONTRACT_NUMBER
2,CONTRACT_NUMBER
3,CONTRACT_NUMBER
4,Account
5,Account_Currency_Code
6,Account_Currency_Code
7,Account_Analyst

This should be my output

ID,TERM
1,CONTRACT_NUMBER
2,CONTRACT_NUMBER
3,CONTRACT_NUMBER
5,Account_Currency_Code
6,Account_Currency_Code

@ray to your question in transformer i used
currentkey=inputlink.col1
previouskey=currentkey
stagevar3: (currentkey=previouskey)

Posted: Wed Nov 23, 2011 2:15 am
by manoj_23sakthi
Hi,

If your input is data base u can achive this in query:
with (Term1,Term2 ) as (select Term Term1 ,count (Term ) Term2 from Table group by Term2 > 1)

select ID,Term from table where Term = Term1

Else..
follow uegodawa idea...
Regards
Manoj kumar

Posted: Wed Nov 23, 2011 4:41 am
by chandra.shekhar@tcs.com
Use these variables in the Tfr
input.Col = svOldKey
If svOldKey = svNewKey then svCount1 = svCount1+1 else 1 = svCount1
svOldKey = svNewKey

And in the output use constraint as sv2 > 1
Thats it.

Posted: Wed Nov 23, 2011 8:08 am
by chulett
haryedru wrote:Hi Upul and Ray sorry for giving the requirment wrong.
So this means a simple transformer stage variable approach cannot be used. You'll need to fall back on the dreaded "fork join" design with the aggregator brach getting counts per key and then when joining back together only passing records with keys that have a count > 1.

Posted: Wed Nov 23, 2011 10:08 am
by ShaneMuir
Could you do it using a sort stage and key change column?

Sort the data coming in the required way, by the looks of it, by ID and TERM.

Set a key change column to happen on the TERM only.

On the output of the sort stage, sort the data again, but sort the keyChange column ascending also.

So at this point the data would look like

Code: Select all

ID,Term, keyChange 
3,CONTRACT_NUMBER,0 
2,CONTRACT_NUMBER,0
1,CONTRACT_NUMBER,1 
4,Account,1 
6,Account_Currency_Code,0 
5,Account_Currency_Code,1 
7,Account_Analyst,1 
In a transformer, have a couple of stage variables, one to hold the current term, and one to hold the previous term, and another to determine whether to write the record.

Code: Select all

CurrentTermSV = Input.Term
WriteRecordSV = If keyChange = 0 then @TRUE else if currentTermSV=PrevTermSV then @ TRUE else @FALSE
PrevTermSV = Input.Term

Set the output constraint to 'WriteRecordSV'

Posted: Thu Nov 24, 2011 4:37 am
by haryedru
chulett wrote:
haryedru wrote:Hi Upul and Ray sorry for giving the requirment wrong.
So this means a simple transformer stage variable approach cannot be used. You'll need to fall back on the dreaded "fork join" design with the aggregator brach getting counts per key and then when joining back together only passing records with keys that have a count > 1.
Hi Chulett

It worked thanks for the help...