Removing Duplicates
Moderators: chulett, rschirm, roy
Removing Duplicates
Hi this is my input
ID,Term
1,CONTRACT_NUMBER
1,CONTRACT_NUMBER
1,CONTRACT_NUMBER
2,Account
3,Account_Currency_Code
3,Account_Currency_Code
4,Account_Analyst
This should be my output
ID,TERM
1,CONTRACT_NUMBER
1,CONTRACT_NUMBER
1,CONTRACT_NUMBER
3,Account_Currency_Code
3,Account_Currency_Code
If a record is repeating i need that in the output.
Can anyone help me on this. I tried using stage variables in transformer but i am getting 1 record less like this
ID,TERM
1,CONTRACT_NUMBER
1,CONTRACT_NUMBER
3,Account_Currency_Code
thanks in advance
ID,Term
1,CONTRACT_NUMBER
1,CONTRACT_NUMBER
1,CONTRACT_NUMBER
2,Account
3,Account_Currency_Code
3,Account_Currency_Code
4,Account_Analyst
This should be my output
ID,TERM
1,CONTRACT_NUMBER
1,CONTRACT_NUMBER
1,CONTRACT_NUMBER
3,Account_Currency_Code
3,Account_Currency_Code
If a record is repeating i need that in the output.
Can anyone help me on this. I tried using stage variables in transformer but i am getting 1 record less like this
ID,TERM
1,CONTRACT_NUMBER
1,CONTRACT_NUMBER
3,Account_Currency_Code
thanks in advance
Hari
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Show us what you tried. This is readily do-able; particularly easily if the data have been processed through a Sort stage that generates a Key Change column (assuming you're using a parallel job, which is not certain). You can, of course, do the key change test using stage variables.
Use more stage variables to detect that the key change value is 0 and the other field is the same as that in the previous row.
Use more stage variables to detect that the key change value is 0 and the other field is the same as that in the previous row.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Hi Upul and Ray sorry for giving the requirment wrong.
this is my input
ID,Term
1,CONTRACT_NUMBER
2,CONTRACT_NUMBER
3,CONTRACT_NUMBER
4,Account
5,Account_Currency_Code
6,Account_Currency_Code
7,Account_Analyst
This should be my output
ID,TERM
1,CONTRACT_NUMBER
2,CONTRACT_NUMBER
3,CONTRACT_NUMBER
5,Account_Currency_Code
6,Account_Currency_Code
@ray to your question in transformer i used
currentkey=inputlink.col1
previouskey=currentkey
stagevar3: (currentkey=previouskey)
this is my input
ID,Term
1,CONTRACT_NUMBER
2,CONTRACT_NUMBER
3,CONTRACT_NUMBER
4,Account
5,Account_Currency_Code
6,Account_Currency_Code
7,Account_Analyst
This should be my output
ID,TERM
1,CONTRACT_NUMBER
2,CONTRACT_NUMBER
3,CONTRACT_NUMBER
5,Account_Currency_Code
6,Account_Currency_Code
@ray to your question in transformer i used
currentkey=inputlink.col1
previouskey=currentkey
stagevar3: (currentkey=previouskey)
Hari
-
- Participant
- Posts: 47
- Joined: Tue Feb 23, 2010 12:16 am
- Location: CHENNAI
-
- Premium Member
- Posts: 353
- Joined: Mon Jan 17, 2011 5:03 am
- Location: Mumbai, India
So this means a simple transformer stage variable approach cannot be used. You'll need to fall back on the dreaded "fork join" design with the aggregator brach getting counts per key and then when joining back together only passing records with keys that have a count > 1.haryedru wrote:Hi Upul and Ray sorry for giving the requirment wrong.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Could you do it using a sort stage and key change column?
Sort the data coming in the required way, by the looks of it, by ID and TERM.
Set a key change column to happen on the TERM only.
On the output of the sort stage, sort the data again, but sort the keyChange column ascending also.
So at this point the data would look like
In a transformer, have a couple of stage variables, one to hold the current term, and one to hold the previous term, and another to determine whether to write the record.
Set the output constraint to 'WriteRecordSV'
Sort the data coming in the required way, by the looks of it, by ID and TERM.
Set a key change column to happen on the TERM only.
On the output of the sort stage, sort the data again, but sort the keyChange column ascending also.
So at this point the data would look like
Code: Select all
ID,Term, keyChange
3,CONTRACT_NUMBER,0
2,CONTRACT_NUMBER,0
1,CONTRACT_NUMBER,1
4,Account,1
6,Account_Currency_Code,0
5,Account_Currency_Code,1
7,Account_Analyst,1
Code: Select all
CurrentTermSV = Input.Term
WriteRecordSV = If keyChange = 0 then @TRUE else if currentTermSV=PrevTermSV then @ TRUE else @FALSE
PrevTermSV = Input.Term
Set the output constraint to 'WriteRecordSV'
Hi Chulettchulett wrote:So this means a simple transformer stage variable approach cannot be used. You'll need to fall back on the dreaded "fork join" design with the aggregator brach getting counts per key and then when joining back together only passing records with keys that have a count > 1.haryedru wrote:Hi Upul and Ray sorry for giving the requirment wrong.
It worked thanks for the help...
Hari