Distinct Count With Aggregator

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
snassimr
Premium Member
Premium Member
Posts: 281
Joined: Tue May 17, 2005 5:27 am

Distinct Count With Aggregator

Post by snassimr »

Hi !

I need perfom the followwing task :

Column1 Column2 Column3
1 1 3
1 1 3
2 3 4
2 3 4
2 3 4

I need to get :

1 1 1
2 3 1

the last number is the number of disctinct values of column 3 group byed according column 1 and column 2.
Can I do it with aggregator ???
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I don't believe there is a "count distinct" aggregate function in the server job Aggregator stage. You could, of course, group by all three and simply count col3 then pass those results through a second Aggregator stage to group by col1 and col2 and to sum the summedcol3 generated by the first Aggregator stage.

Or you could work with stage variables in a Transformer stage (and sorted input) to achieve the same result.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
elavenil
Premium Member
Premium Member
Posts: 467
Joined: Thu Jan 31, 2002 10:20 pm
Location: Singapore

Post by elavenil »

Yes. This can be done using aggregator stage but your input should be loaded first into hash file and then aggregator then do the count(3rd column) group by 1,2. While loading into hash file, define all three columns as key and read from hash file and change the keys (key would be the first two columns) and that will give you the results as expected.

HTWH.

Regards
Saravanan
Post Reply