Counting duplicates

JPalatianos · Post by **JPalatianos** » Fri Jul 21, 2006 10:05 am

Hi,
I am extracting data from the mainframe(ODBC stage) and my results give me a unique combination of PLAN_ID and CONTRACT_ID sorted on the same two fields. I need to add another field that would keep count of duplicate PLAN_ID's......

PLAN_ID CONTRACT_ID NEW_COUNTER
A A 0
B A 0
B B 1
B C 2
J A 0
R B 0
R D 1

Thanks - - John

kris007 · Post by **kris007** » Fri Jul 21, 2006 10:17 am

If it is sorted on PLan_id and Contract_Id , stage them in stage variable
like

Code: Select all

Initialise svCountKey=0 
svCountKey=
If Plan_id:Contract_id=svkey Then svCountKey+1 else svCountKey
svKey=plan_id:Contract_id

Add svCountKey to your output columns.

JPalatianos · Post by **JPalatianos** » Fri Jul 21, 2006 10:24 am

Thanks Kris......that's what I was trying to do but being a newbie I have some follow up questions.

I just figured out where I can declare the stage variables......could you give me a clue as to where I can place the code?

Sorry,but I am slowly catching on!

JPalatianos · Post by **JPalatianos** » Fri Jul 21, 2006 10:31 am

Can I place it in teh derivation section??

kris007 · Post by **kris007** » Fri Jul 21, 2006 10:34 am

JPalatianos wrote:Can I place it in teh derivation section??

Yeh, There should be a seperate stage variable section which is above all of your output links on the right hand side of your transformer pane.
If you dont find any of such, there should be a icon saying show or hide stage variables. Better yet try to read the Datastage server manual or click on the help button, it will be useful not only for this job but many questions you might have in the future.

meena · Post by **meena** » Fri Jul 21, 2006 10:56 am

JPalatianos wrote:Thanks Kris......that's what I was trying to do but being a newbie I have some follow up questions.

I just figured out where I can declare the stage variables......could you give me a clue as to where I can place the code?

Sorry,but I am slowly catching on!

Hi John,
The stage variables are declared in a transformer stage. In the transformer stage properties you can find a option called stage propeties on top left side, here add a variable or if the stage variable option is on right side( if not in hide option) you can right click and click on append new stage variable. And place the code in the "derviation". And give this stage variable to the output column.
I came to know all this by reading DATESTAGE MANUALS. Why can not you read them too..

JPalatianos · Post by **JPalatianos** » Fri Jul 21, 2006 11:12 am

Sorry to bother everyone....I thought I had all the available documentation in front of me but it was only the Tutorial. Just found the server job developer's guide that would have answered my question.

chulett · Post by **chulett** » Fri Jul 21, 2006 12:11 pm

Don't worry about it, at least you seem to be trying to figure it out on your own. Sometimes people don't even do that much.

Keep at it and if you have more questions, post them.

DSguru2B · Post by **DSguru2B** » Fri Jul 21, 2006 8:33 pm

According to the sample data, it seems like the duplicates are to be detected on PLAN_ID.
For that, i dont think the two columns need to be concatenated.

Code: Select all

Initialise svCountKey=0 
svCountKey=
If Plan_id=svkey OR @INROWNUM=1 Then svCountKey+1 else svCountKey 
svKey=plan_id

I added @INROWNUM = 1 in Kris's code to handle the first record or else it wont have anything to compare against.

chowmunyee · Post by **chowmunyee** » Fri Aug 04, 2006 10:50 pm

DSguru2B wrote:According to the sample data, it seems like the duplicates are to be detected on PLAN_ID.
For that, i dont think the two columns need to be concatenated.
Code: Select all
Initialise svCountKey=0 
svCountKey=
If Plan_id=svkey OR @INROWNUM=1 Then svCountKey+1 else svCountKey 
svKey=plan_id
I added @INROWNUM = 1 in Kris's code to handle the first record or else it wont have anything to compare against.

I'm a newbie also... can anyone explain more detail on this part?

Initialise svCountKey=0
svCountKey=
If Plan_id=svkey OR @INROWNUM=1 Then svCountKey+1 else svCountKey
svKey=plan_id

May i know how is output look like so that i will be much more understand...

Thank you

chulett · Post by **chulett** » Sat Aug 05, 2006 6:06 am

As a newbie, you might want to check this post. ADN doesn't exist anymore but you can get the documents it discusses at Kim Duke's site. They help explain the concept.

Basically you are comparing a current value (Plan_id) to the value from the previous row (svkey) to see if it has changed. If it hasn't, increment a counter of duplicates found. When it changes, just set the value of the counter to itself - a way to do nothing essentially.

Sometimes you need to do something specific with the first row, hence the check for @INROWNUM = 1. Here, I don't see the point as the first row shouldn't be counted as a 'duplicate' as far as I can see.

It's also about properly initializing the values when you define them.