Page 1 of 1

logic for grouping

Posted: Tue Jun 23, 2009 2:06 am
by prasson_ibm
Hi ,
I have source data sample as
msisdn,date
750500337,6/4/2009
750500337 ,6/5/2009
750500337 ,6/6/2009
750500337 ,6/7/2009
750500467,6/4/2009
750500467 ,6/5/2009
750500467,6/6/2009
750500467 ,6/7/2009



and i want output as
750500337,6/4/2009
750500467,6/4/2009
i.e. minimum of date in each msisdn group.

Please help me how to do this..??

Posted: Tue Jun 23, 2009 2:10 am
by keshav0307
use a sort stage,
sort on MISDN and DATE ASC.
remove duplicate on MISDN and keep the first record.

Posted: Tue Jun 23, 2009 2:21 am
by priyadarshikunal
Or use an aggregator. That's why it is there.

group on MISDN and take min of DATE. Also don't forget to set preserve data type to true.

Posted: Tue Jun 23, 2009 5:23 am
by prasson_ibm
keshav0307 wrote:use a sort stage,
sort on MISDN and DATE ASC.
remove duplicate on MISDN and keep the first record.
Yes i am doing the same thing but target data is mismatching,due to multiple nodes :oops:

When i am running on single node,it works fine.
So is there any way so that if i run the job on multiple node,it will give the exact result??

Posted: Tue Jun 23, 2009 5:33 am
by Sainath.Srinivasan
Hash Partition on Msisdn.

Posted: Tue Jun 23, 2009 6:17 am
by chulett
In other words, you need to ensure each partition includes all values for the grouping keys in that partition.