Distinct Count With Aggregator

snassimr · Post by **snassimr** » Tue Aug 30, 2005 1:52 am

Hi !

I need perfom the followwing task :

Column1 Column2 Column3
1 1 3
1 1 3
2 3 4
2 3 4
2 3 4

I need to get :

1 1 1
2 3 1

the last number is the number of disctinct values of column 3 group byed according column 1 and column 2.
Can I do it with aggregator ???

ray.wurlod · Post by **ray.wurlod** » Tue Aug 30, 2005 2:11 am

I don't believe there is a "count distinct" aggregate function in the server job Aggregator stage. You could, of course, group by all three and simply count col3 then pass those results through a second Aggregator stage to group by col1 and col2 and to sum the summedcol3 generated by the first Aggregator stage.

Or you could work with stage variables in a Transformer stage (and sorted input) to achieve the same result.

elavenil · Post by **elavenil** » Tue Aug 30, 2005 2:14 am

Yes. This can be done using aggregator stage but your input should be loaded first into hash file and then aggregator then do the count(3rd column) group by 1,2. While loading into hash file, define all three columns as key and read from hash file and change the keys (key would be the first two columns) and that will give you the results as expected.

HTWH.

Regards
Saravanan