Duplicate count
Moderators: chulett, rschirm, roy
Duplicate count
Hi,
I have a requirement to get a count of duplicates in target.please find out below i/p and o/p data.
I/P data:
No Name
----------------
1 XXX
2 XXX
3 XXX
4 YYY
5 YYY
6 ZZZ
O/P Data:
No Name DuplicateCount
-----------------------------------
1 XXX 1
2 XXX 2
3 XXX 3
4 YYY 1
5 YYY 2
6 ZZZ 1
Please post logic to achieve this requirement.
Thanks in advance
Krishna.
I have a requirement to get a count of duplicates in target.please find out below i/p and o/p data.
I/P data:
No Name
----------------
1 XXX
2 XXX
3 XXX
4 YYY
5 YYY
6 ZZZ
O/P Data:
No Name DuplicateCount
-----------------------------------
1 XXX 1
2 XXX 2
3 XXX 3
4 YYY 1
5 YYY 2
6 ZZZ 1
Please post logic to achieve this requirement.
Thanks in advance
Krishna.
-
- Premium Member
- Posts: 232
- Joined: Fri Aug 04, 2006 1:20 am
- Location: Bangalore
Re: Duplicate count
Hi Krish,
Create three stage variables as suggested by Shane as follows
(sort the record first)
Stage Variables and derivations (use the same order of stage variable)
SVCurrentCount --If SVPreviousName is NULL or SVPreviousName <> DSLink.Name Then 1 Else SVPreviousCount +1
SVPreviousCount --SVCurrentCount
SVPreviousName --DSLink.Name
Mapping
No :DSLink.No
Name :DSLink.Name
Duplicatecount : SVCurrentCount
Regards
Thiru
Create three stage variables as suggested by Shane as follows
(sort the record first)
Stage Variables and derivations (use the same order of stage variable)
SVCurrentCount --If SVPreviousName is NULL or SVPreviousName <> DSLink.Name Then 1 Else SVPreviousCount +1
SVPreviousCount --SVCurrentCount
SVPreviousName --DSLink.Name
Mapping
No :DSLink.No
Name :DSLink.Name
Duplicatecount : SVCurrentCount
Regards
Thiru
krishna81 wrote:Hi,
I have a requirement to get a count of duplicates in target.please find out below i/p and o/p data.
I/P data:
No Name
----------------
1 XXX
2 XXX
3 XXX
4 YYY
5 YYY
6 ZZZ
O/P Data:
No Name DuplicateCount
-----------------------------------
1 XXX 1
2 XXX 2
3 XXX 3
4 YYY 1
5 YYY 2
6 ZZZ 1
Please post logic to achieve this requirement.
Thanks in advance
Krishna.
Thirunavukkarasu
Bangalore
Bangalore
-
- Participant
- Posts: 78
- Joined: Fri Mar 02, 2007 4:54 am
- Location: CHENNAI
Hi Guys,
Sorry for late responce,
i tried which u suggestesd above and i finally got the result.
thanks alot for posting this result and quick response.
still i have doubt about below
but earlier when i test with the reverse order of your stage variables order i got below result .
Order of stage variables:
SVPreviousName --DSLink.Name
SVPreviousCount --SVCurrentCount
SVCurrentCount --If SVPreviousName is NULL or SVPreviousName <> DSLink.Name Then 1 Else SVPreviousCount +1
No Name DuplicateCount
-----------------------------------
1 XXX 1
2 XXX 1
3 XXX 1
4 YYY 2
5 YYY 2
6 ZZZ 3
could any one Please clarify me how could we follow the order of Stage Variables.
once again thanks for previous reply.
Regards
Krishna
Sorry for late responce,
i tried which u suggestesd above and i finally got the result.
thanks alot for posting this result and quick response.
still i have doubt about below
but earlier when i test with the reverse order of your stage variables order i got below result .
Order of stage variables:
SVPreviousName --DSLink.Name
SVPreviousCount --SVCurrentCount
SVCurrentCount --If SVPreviousName is NULL or SVPreviousName <> DSLink.Name Then 1 Else SVPreviousCount +1
No Name DuplicateCount
-----------------------------------
1 XXX 1
2 XXX 1
3 XXX 1
4 YYY 2
5 YYY 2
6 ZZZ 3
could any one Please clarify me how could we follow the order of Stage Variables.
once again thanks for previous reply.
Regards
Krishna
Sorry you have lost me now Krishna. So you have got the process working? But you are just wondering why when you reverse the order of the stage variables you get a different result?krishna81 wrote:i tried which u suggestesd above and i finally got the result.
thanks alot for posting this result and quick response.
still i have doubt about below
but earlier when i test with the reverse order of your stage variables order i got below result .
...
could any one Please clarify me how could we follow the order of Stage Variables.
The order of the stage variables is very important as they are processed top to bottom for each incoming record. This means that variables lower down the list can contain information from the previous record until the current record is processed for all the variables above it.
This is why svPreviousName has to be the last one, so it will hold the last value passed and so that svCurrentCount can compare that previous value to the new incoming value. If it was first on the list you would you would be comparing the incoming record to itself.
-
- Premium Member
- Posts: 301
- Joined: Thu Jul 14, 2005 10:27 am
- Location: Melbourne, Australia
- Contact:
For completeness, here's the evaluation order of elements in a transformer (as I understand them):
J.
Code: Select all
Evaluate each stage variable's initial value
For each input row:
Evaluate each stage variable derivation value, top to bottom
For each output link:
Evaluate each column derivation value
Evaluate output constraint
Write the output record, if appropriate
Next output link
Next input row
<b>John McKeever</b>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
In server there's a little more that might happen. There's an ancient and long post by me that explains all the gory detail. You left out:
- before any rows are processed execute before-stage subroutine if any
for each reference input link evaluate the key expression on each Key column and request matching rows from the link setting link variables appropriately
for each output link use Put() function to send row (if constraint expression is satisfied), detect failures and set link variables appropriately
after all rows have been processed execute after-stage subroutine if any
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
I think a easier way is to use a stage variable, e.g. svDupCnt, and routine RowProcCompareWithPreviousValue (precondition is the Names are sorted).
Derivation to svDupCnt:
This way, you don't need to worry about the order of stage variables.
shepli
Derivation to svDupCnt:
Code: Select all
If RowProcCompareWithPreviousValue(Input.Name) Then svDupCnt+1 Else 1
shepli