I have a job where the data is a s follows
ID Seq col1 col2 col3
___________________
10 1 Y Null Null
10 2 N Null O
10 3 N Null Null
10 4 Y Null O
11 1 N Null Null
11 2 N A Null
11 3 Y B O
I have the data sorted according to id and seq both ascending using a sort operator. Now i need to retain the row with the last seq number i.e the output should be
ID Seq col1 col2 col3
___________________
10 4 Y Null O
11 3 Y B O
I am using a sort operator ( wherei sort by id and seq ascending) followed by remove duplicate operator . here i specify the key as ID and the Duplicate to retain as 'Last'
But i am not getting teh desired result. It picks any row but not teh one with the last sequence.
need help where i am doing it wrong.
Regards,
SPuneet
facing issue with remving duplicates
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Explicity hash partition on "ID" as early in the job as possible and see if the result changes.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 14
- Joined: Wed Jul 25, 2012 5:29 am
- Location: Mumbai
We already have topics on that from SPuneet.
They seem to be experimenting as we keep seeing basically the same set of incoming data with different requirements and techniques posted. We've already done aggregator and transformer solutions, guess it's now time for sorting and Remove Duplicates.
They seem to be experimenting as we keep seeing basically the same set of incoming data with different requirements and techniques posted. We've already done aggregator and transformer solutions, guess it's now time for sorting and Remove Duplicates.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers