Filter on Max Date ?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bond88
Participant
Posts: 109
Joined: Mon Oct 15, 2012 10:05 am
Location: USA

Filter on Max Date ?

Post by bond88 »

Hi,
I want to select unique records based on max date. Could you please suggest a best approach to achieve the below output?

Input:

ID--Date

1---10/11/2011
1---12/08/2005
1---01/15/2012
2---02/18/2010
2---03/04/2013

Output:

ID---Date

1---01/15/2012
2---03/04/2013

27-30 million records at input.

Thanks,
Bhanu
bond88
Participant
Posts: 109
Joined: Mon Oct 15, 2012 10:05 am
Location: USA

Post by bond88 »

I am using sort stage and then remove duplicate stage to implement above logic. Is there any better way to get this done? Suggestion please.
Bhanu
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Not really.

You could do the filtering in a Tranformer stage, using last record in group detection, but a Remove Duplicates is entirely adequate. Either approach requires data sorted by ID and by date, and partitioned by ID.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Aggregation. Not saying it's better but it is another method.

And what you have is not a "workaround" it is a resolution and I am marking it as such.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply