How do I count distinct rows in the aggregator?

vmcburney · Post by **vmcburney** » Thu Feb 10, 2005 10:16 pm

I'm struggling to find a good distinct count function within the aggregation stage. It's something I use a lot in SQL group bys.

The aggregator has a Count Rows aggregation type but this is quite inflexible, it counts total rows and cannot count distinct instances of particular columns.

When I switch to Calculation mode and choose a column to count, such as customer ID, I don't see any count or distinct count aggregation options. I see fancy smancy functions such as percent coefficient and variance but not the plain old distinct count.

Anyone know how to do this?

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Fri Feb 11, 2005 3:11 am

You can achieve this by using 2 agg stages.

First one - group by your key
Second one - count your key and group by some dummy column (say constant 1) or use max(@OUTROWNUM)

vmcburney · Post by **vmcburney** » Fri Feb 11, 2005 6:11 pm

I am astonished that you cannot do a simple count aggregation at the same time as other aggregations such as max and min and that you cannot do a distinct count at all. It seems like someone built a car and left the wheels out.

I will play around with sorts and transformers and maybe custom stages to find some way around this.

ray.wurlod · Post by **ray.wurlod** » Fri Feb 11, 2005 8:40 pm

Then you'll want to be able to do more than one!

How many databases allow multiple SELECT DISTINCT in the one query? For example

Code: Select all

SELECT DISTINCT COLUMN_01, DISTINCT COLUMN_02 FROM TABLE;

or

Code: Select all

SELECT COUNT(DISTINCT COLUMN_01), COUNT(DISTINCT COLUMN_02) FROM TABLENAME;

Not many. (Red Brick 6.20 and later can, I don't know any other.)

dsxuserrio · Post by **dsxuserrio** » Sun Feb 13, 2005 8:50 pm

vmcburney
As you were mentioning
"I will play around with sorts and transformers and maybe custom stages to find some way around this."

Just for count distinct you can use sort with stable unique option.
Of course in the main aggregator stage you want all your records. So just for doing a count you need one more stage.
Thanks

jasper · Post by **jasper** » Wed Mar 09, 2005 3:57 am

Code:
SELECT COUNT(DISTINCT COLUMN_01), COUNT(DISTINCT COLUMN_02) FROM TABLENAME

Even oracle can do this, so it can't be hard.

I'm also wondering why this is so hard, for me it's very standard in an aggregator to have something like
select count(A),count(distinct A),count(B),count(distinct B)
If this is so hard to code in DS, they should rethink a lot about the aggregator stage.

roy · Post by **roy** » Wed Mar 09, 2005 6:24 am

Hi,
You could order by this column (as well as other key columns) and make all reapearing instances have NULL in a transformer prior to the aggregator stage, this should make your count equal to a count distinct on that column.

IHTH,