Page 1 of 1

Filter on Max Date ?

Posted: Tue Jun 25, 2013 2:33 pm
by bond88
Hi,
I want to select unique records based on max date. Could you please suggest a best approach to achieve the below output?

Input:

ID--Date

1---10/11/2011
1---12/08/2005
1---01/15/2012
2---02/18/2010
2---03/04/2013

Output:

ID---Date

1---01/15/2012
2---03/04/2013

27-30 million records at input.

Thanks,

Posted: Tue Jun 25, 2013 3:30 pm
by bond88
I am using sort stage and then remove duplicate stage to implement above logic. Is there any better way to get this done? Suggestion please.

Posted: Tue Jun 25, 2013 4:52 pm
by ray.wurlod
Not really.

You could do the filtering in a Tranformer stage, using last record in group detection, but a Remove Duplicates is entirely adequate. Either approach requires data sorted by ID and by date, and partitioned by ID.

Posted: Tue Jun 25, 2013 5:59 pm
by chulett
Aggregation. Not saying it's better but it is another method.

And what you have is not a "workaround" it is a resolution and I am marking it as such.