I'm struggling to find a good distinct count function within the aggregation stage. It's something I use a lot in SQL group bys.
The aggregator has a Count Rows aggregation type but this is quite inflexible, it counts total rows and cannot count distinct instances of particular columns.
When I switch to Calculation mode and choose a column to count, such as customer ID, I don't see any count or distinct count aggregation options. I see fancy smancy functions such as percent coefficient and variance but not the plain old distinct count.
Anyone know how to do this?
How do I count distinct rows in the aggregator?
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
How do I count distinct rows in the aggregator?
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
I am astonished that you cannot do a simple count aggregation at the same time as other aggregations such as max and min and that you cannot do a distinct count at all. It seems like someone built a car and left the wheels out.
I will play around with sorts and transformers and maybe custom stages to find some way around this.
I will play around with sorts and transformers and maybe custom stages to find some way around this.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Then you'll want to be able to do more than one!
How many databases allow multiple SELECT DISTINCT in the one query? For example or Not many. (Red Brick 6.20 and later can, I don't know any other.)
How many databases allow multiple SELECT DISTINCT in the one query? For example
Code: Select all
SELECT DISTINCT COLUMN_01, DISTINCT COLUMN_02 FROM TABLE;
Code: Select all
SELECT COUNT(DISTINCT COLUMN_01), COUNT(DISTINCT COLUMN_02) FROM TABLENAME;
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 82
- Joined: Thu Dec 02, 2004 10:27 pm
- Location: INDIA
vmcburney
As you were mentioning
"I will play around with sorts and transformers and maybe custom stages to find some way around this."
Just for count distinct you can use sort with stable unique option.
Of course in the main aggregator stage you want all your records. So just for doing a count you need one more stage.
Thanks
As you were mentioning
"I will play around with sorts and transformers and maybe custom stages to find some way around this."
Just for count distinct you can use sort with stable unique option.
Of course in the main aggregator stage you want all your records. So just for doing a count you need one more stage.
Thanks
dsxuserrio
Kannan.N
Bangalore,INDIA
Kannan.N
Bangalore,INDIA
Even oracle can do this, so it can't be hard.Code:
SELECT COUNT(DISTINCT COLUMN_01), COUNT(DISTINCT COLUMN_02) FROM TABLENAME
I'm also wondering why this is so hard, for me it's very standard in an aggregator to have something like
select count(A),count(distinct A),count(B),count(distinct B)
If this is so hard to code in DS, they should rethink a lot about the aggregator stage.
Hi,
You could order by this column (as well as other key columns) and make all reapearing instances have NULL in a transformer prior to the aggregator stage, this should make your count equal to a count distinct on that column.
IHTH,
You could order by this column (as well as other key columns) and make all reapearing instances have NULL in a transformer prior to the aggregator stage, this should make your count equal to a count distinct on that column.
IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org