How to remove duplicates using LastRowInGroup?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
deepa_shenoy
Participant
Posts: 95
Joined: Thu Sep 24, 2009 12:15 am
Location: India

How to remove duplicates using LastRowInGroup?

Post by deepa_shenoy »

Hi,

How to identify duplicates using LastRowInGroup() in the Transformer?

My input data is

ID EFF_D NAME COMPANY
1 2001-01-01 ABCD TET
1 2011-01-01 ABCD TET
2 2001-01-01 XYZ TS
3 1999-01-01 PQR WRO

My output data should be

ID EFF_D NAME COMPANY duplicate
1 2001-01-01 ABCD TET N
1 2011-01-01 ABCD TET Y
2 2001-01-01 XYZ TS N
3 1999-01-01 PQR WRO N

Thanks.
-D
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

What have you tried?

And LastRowInGroup() doesn't identify duplicates, it simply let's you know if you are looking at the last row in any given "group".
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you're keeping a count of the records in the group you can identify that the last record in the group is a duplicate of the first (and every other), but that does not give you the capacity to remove the duplicate(s). Why not use a Remove Duplicates stage?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Looks like they really need to identify duplicates rather than remove them.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Then all you need to do is to use stage variables to compare the current row with the previous row (assuming that they're appropriately sorted and partitioned) and set an indicator column value on the output.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply