derive max occurrence
Posted: Sat Jun 01, 2013 1:18 am
Hi,
Have a requirement to find max occurrence value in a column for a group.
Have given the sample data below.
One approach that have used is to group by key1 key2 and value1 get the count(*) ,then pass it through remove duplicate stage to keep the max count row alone.
But since I have to find the max occurrence in 3(could be more) columns here
I would have to repeat it for all 3 and join the data back.
Looking for a simpler approach here , to be done with say some aggregation function???
INPUT
KEY1 KEY2 VALUE1 VALUE2 VALUE3
A Z 10 50 10
A Z 0 0 10
A Z 10 0 10
A Z 10 0 20
B Y 100 0 1
B Y 15 5 1
B Y 15 5 1
output
KEY1 KEY2 VALUE1 VALUE2 VALUE3
A Z 10 0 10
B Y 15 5 1
Have a requirement to find max occurrence value in a column for a group.
Have given the sample data below.
One approach that have used is to group by key1 key2 and value1 get the count(*) ,then pass it through remove duplicate stage to keep the max count row alone.
But since I have to find the max occurrence in 3(could be more) columns here
I would have to repeat it for all 3 and join the data back.
Looking for a simpler approach here , to be done with say some aggregation function???
INPUT
KEY1 KEY2 VALUE1 VALUE2 VALUE3
A Z 10 50 10
A Z 0 0 10
A Z 10 0 10
A Z 10 0 20
B Y 100 0 1
B Y 15 5 1
B Y 15 5 1
output
KEY1 KEY2 VALUE1 VALUE2 VALUE3
A Z 10 0 10
B Y 15 5 1