Counting duplicates
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 306
- Joined: Wed Jun 21, 2006 11:41 am
Counting duplicates
Hi,
I am extracting data from the mainframe(ODBC stage) and my results give me a unique combination of PLAN_ID and CONTRACT_ID sorted on the same two fields. I need to add another field that would keep count of duplicate PLAN_ID's......
PLAN_ID CONTRACT_ID NEW_COUNTER
A A 0
B A 0
B B 1
B C 2
J A 0
R B 0
R D 1
Thanks - - John
I am extracting data from the mainframe(ODBC stage) and my results give me a unique combination of PLAN_ID and CONTRACT_ID sorted on the same two fields. I need to add another field that would keep count of duplicate PLAN_ID's......
PLAN_ID CONTRACT_ID NEW_COUNTER
A A 0
B A 0
B B 1
B C 2
J A 0
R B 0
R D 1
Thanks - - John
If it is sorted on PLan_id and Contract_Id , stage them in stage variable
like
Add svCountKey to your output columns.
like
Code: Select all
Initialise svCountKey=0
svCountKey=
If Plan_id:Contract_id=svkey Then svCountKey+1 else svCountKey
svKey=plan_id:Contract_id
Kris
Where's the "Any" key?-Homer Simpson
Where's the "Any" key?-Homer Simpson
-
- Premium Member
- Posts: 306
- Joined: Wed Jun 21, 2006 11:41 am
-
- Premium Member
- Posts: 306
- Joined: Wed Jun 21, 2006 11:41 am
Yeh, There should be a seperate stage variable section which is above all of your output links on the right hand side of your transformer pane.JPalatianos wrote:Can I place it in teh derivation section??
If you dont find any of such, there should be a icon saying show or hide stage variables. Better yet try to read the Datastage server manual or click on the help button, it will be useful not only for this job but many questions you might have in the future.
Kris
Where's the "Any" key?-Homer Simpson
Where's the "Any" key?-Homer Simpson
Hi John,JPalatianos wrote:Thanks Kris......that's what I was trying to do but being a newbie I have some follow up questions.
I just figured out where I can declare the stage variables......could you give me a clue as to where I can place the code?
Sorry,but I am slowly catching on!
The stage variables are declared in a transformer stage. In the transformer stage properties you can find a option called stage propeties on top left side, here add a variable or if the stage variable option is on right side( if not in hide option) you can right click and click on append new stage variable. And place the code in the "derviation". And give this stage variable to the output column.
I came to know all this by reading DATESTAGE MANUALS. Why can not you read them too..
-
- Premium Member
- Posts: 306
- Joined: Wed Jun 21, 2006 11:41 am
According to the sample data, it seems like the duplicates are to be detected on PLAN_ID.
For that, i dont think the two columns need to be concatenated.
I added @INROWNUM = 1 in Kris's code to handle the first record or else it wont have anything to compare against.
For that, i dont think the two columns need to be concatenated.
Code: Select all
Initialise svCountKey=0
svCountKey=
If Plan_id=svkey OR @INROWNUM=1 Then svCountKey+1 else svCountKey
svKey=plan_id
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
-
- Participant
- Posts: 32
- Joined: Wed Dec 28, 2005 9:02 pm
DSguru2B wrote:According to the sample data, it seems like the duplicates are to be detected on PLAN_ID.
For that, i dont think the two columns need to be concatenated.I added @INROWNUM = 1 in Kris's code to handle the first record or else it wont have anything to compare against.Code: Select all
Initialise svCountKey=0 svCountKey= If Plan_id=svkey OR @INROWNUM=1 Then svCountKey+1 else svCountKey svKey=plan_id
I'm a newbie also... can anyone explain more detail on this part?
Initialise svCountKey=0
svCountKey=
If Plan_id=svkey OR @INROWNUM=1 Then svCountKey+1 else svCountKey
svKey=plan_id
May i know how is output look like so that i will be much more understand...
Thank you
As a newbie, you might want to check this post. ADN doesn't exist anymore but you can get the documents it discusses at Kim Duke's site. They help explain the concept.
Basically you are comparing a current value (Plan_id) to the value from the previous row (svkey) to see if it has changed. If it hasn't, increment a counter of duplicates found. When it changes, just set the value of the counter to itself - a way to do nothing essentially.
Sometimes you need to do something specific with the first row, hence the check for @INROWNUM = 1. Here, I don't see the point as the first row shouldn't be counted as a 'duplicate' as far as I can see.
It's also about properly initializing the values when you define them.
Basically you are comparing a current value (Plan_id) to the value from the previous row (svkey) to see if it has changed. If it hasn't, increment a counter of duplicates found. When it changes, just set the value of the counter to itself - a way to do nothing essentially.
Sometimes you need to do something specific with the first row, hence the check for @INROWNUM = 1. Here, I don't see the point as the first row shouldn't be counted as a 'duplicate' as far as I can see.
![Confused :?](./images/smilies/icon_confused.gif)
It's also about properly initializing the values when you define them.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers