Counting duplicates

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
JPalatianos
Premium Member
Premium Member
Posts: 306
Joined: Wed Jun 21, 2006 11:41 am

Counting duplicates

Post by JPalatianos »

Hi,
I am extracting data from the mainframe(ODBC stage) and my results give me a unique combination of PLAN_ID and CONTRACT_ID sorted on the same two fields. I need to add another field that would keep count of duplicate PLAN_ID's......

PLAN_ID CONTRACT_ID NEW_COUNTER
A A 0
B A 0
B B 1
B C 2
J A 0
R B 0
R D 1


Thanks - - John
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

If it is sorted on PLan_id and Contract_Id , stage them in stage variable
like

Code: Select all

Initialise svCountKey=0 
svCountKey=
If Plan_id:Contract_id=svkey Then svCountKey+1 else svCountKey
svKey=plan_id:Contract_id
Add svCountKey to your output columns.
Kris

Where's the "Any" key?-Homer Simpson
JPalatianos
Premium Member
Premium Member
Posts: 306
Joined: Wed Jun 21, 2006 11:41 am

Post by JPalatianos »

Thanks Kris......that's what I was trying to do but being a newbie I have some follow up questions.

I just figured out where I can declare the stage variables......could you give me a clue as to where I can place the code?

Sorry,but I am slowly catching on!
JPalatianos
Premium Member
Premium Member
Posts: 306
Joined: Wed Jun 21, 2006 11:41 am

Post by JPalatianos »

Can I place it in teh derivation section??
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

JPalatianos wrote:Can I place it in teh derivation section??
Yeh, There should be a seperate stage variable section which is above all of your output links on the right hand side of your transformer pane.
If you dont find any of such, there should be a icon saying show or hide stage variables. Better yet try to read the Datastage server manual or click on the help button, it will be useful not only for this job but many questions you might have in the future.
Kris

Where's the "Any" key?-Homer Simpson
meena
Participant
Posts: 430
Joined: Tue Sep 13, 2005 12:17 pm

Post by meena »

JPalatianos wrote:Thanks Kris......that's what I was trying to do but being a newbie I have some follow up questions.

I just figured out where I can declare the stage variables......could you give me a clue as to where I can place the code?

Sorry,but I am slowly catching on!
Hi John,
The stage variables are declared in a transformer stage. In the transformer stage properties you can find a option called stage propeties on top left side, here add a variable or if the stage variable option is on right side( if not in hide option) you can right click and click on append new stage variable. And place the code in the "derviation". And give this stage variable to the output column.
I came to know all this by reading DATESTAGE MANUALS. Why can not you read them too..
JPalatianos
Premium Member
Premium Member
Posts: 306
Joined: Wed Jun 21, 2006 11:41 am

Post by JPalatianos »

Sorry to bother everyone....I thought I had all the available documentation in front of me but it was only the Tutorial. Just found the server job developer's guide that would have answered my question.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Don't worry about it, at least you seem to be trying to figure it out on your own. Sometimes people don't even do that much.

Keep at it and if you have more questions, post them. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

According to the sample data, it seems like the duplicates are to be detected on PLAN_ID.
For that, i dont think the two columns need to be concatenated.

Code: Select all

Initialise svCountKey=0 
svCountKey=
If Plan_id=svkey OR @INROWNUM=1 Then svCountKey+1 else svCountKey 
svKey=plan_id
I added @INROWNUM = 1 in Kris's code to handle the first record or else it wont have anything to compare against.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
chowmunyee
Participant
Posts: 32
Joined: Wed Dec 28, 2005 9:02 pm

Post by chowmunyee »

DSguru2B wrote:According to the sample data, it seems like the duplicates are to be detected on PLAN_ID.
For that, i dont think the two columns need to be concatenated.

Code: Select all

Initialise svCountKey=0 
svCountKey=
If Plan_id=svkey OR @INROWNUM=1 Then svCountKey+1 else svCountKey 
svKey=plan_id
I added @INROWNUM = 1 in Kris's code to handle the first record or else it wont have anything to compare against.

I'm a newbie also... can anyone explain more detail on this part?

Initialise svCountKey=0
svCountKey=
If Plan_id=svkey OR @INROWNUM=1 Then svCountKey+1 else svCountKey
svKey=plan_id

May i know how is output look like so that i will be much more understand...


Thank you
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

As a newbie, you might want to check this post. ADN doesn't exist anymore but you can get the documents it discusses at Kim Duke's site. They help explain the concept.

Basically you are comparing a current value (Plan_id) to the value from the previous row (svkey) to see if it has changed. If it hasn't, increment a counter of duplicates found. When it changes, just set the value of the counter to itself - a way to do nothing essentially.

Sometimes you need to do something specific with the first row, hence the check for @INROWNUM = 1. Here, I don't see the point as the first row shouldn't be counted as a 'duplicate' as far as I can see. :?

It's also about properly initializing the values when you define them.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply