facing issue with remving duplicates

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
SPuneet
Participant
Posts: 28
Joined: Thu Jul 19, 2012 12:52 am

facing issue with remving duplicates

Post by SPuneet »

I have a job where the data is a s follows

ID Seq col1 col2 col3
___________________
10 1 Y Null Null
10 2 N Null O
10 3 N Null Null
10 4 Y Null O
11 1 N Null Null
11 2 N A Null
11 3 Y B O

I have the data sorted according to id and seq both ascending using a sort operator. Now i need to retain the row with the last seq number i.e the output should be

ID Seq col1 col2 col3
___________________
10 4 Y Null O
11 3 Y B O


I am using a sort operator ( wherei sort by id and seq ascending) followed by remove duplicate operator . here i specify the key as ID and the Duplicate to retain as 'Last'

But i am not getting teh desired result. It picks any row but not teh one with the last sequence.

need help where i am doing it wrong.


Regards,
SPuneet
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Specify how the data are partitioned.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
SPuneet
Participant
Posts: 28
Joined: Thu Jul 19, 2012 12:52 am

Post by SPuneet »

I am using auto partioning throughout
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Explicity hash partition on "ID" as early in the job as possible and see if the result changes.
Sagnik Mukherjee
Participant
Posts: 14
Joined: Wed Jul 25, 2012 5:29 am
Location: Mumbai

Post by Sagnik Mukherjee »

Hi,
Is it ok if you can get your desired output using only a transformer??
Please let me know.
Sagnik
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

We already have topics on that from SPuneet.

They seem to be experimenting as we keep seeing basically the same set of incoming data with different requirements and techniques posted. We've already done aggregator and transformer solutions, guess it's now time for sorting and Remove Duplicates. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply