How to Count Distinct within aggregation
Posted: Mon Dec 10, 2012 4:55 pm
Requirement:
We are building a summary table with pre-aggregated data.
The table has 10 "key" columns, and we aggregate on 20 different combinations of these (e.g. on keys 1/2/3/4 and keys 1/2/5/6).
The trick is that we need to count the number of distinct items that were "rolled up" during the aggregation.
Here's a simplified version of what we want, using DB2 SQL :
Output:
I am actually struggling to think of a way to do this via DataStage, so am open to ideas.
Note that data volumes are pretty low, which might open up some more options.
A search finds some old posts such as this one: viewtopic.php?t=91232
I was wondering whether there was anything included in v8.5 that made this a bit easier.
Or should I give up trying to do this in DataStage, and just do the aggregations in DB2?
We are building a summary table with pre-aggregated data.
The table has 10 "key" columns, and we aggregate on 20 different combinations of these (e.g. on keys 1/2/3/4 and keys 1/2/5/6).
The trick is that we need to count the number of distinct items that were "rolled up" during the aggregation.
Here's a simplified version of what we want, using DB2 SQL :
Code: Select all
with temp (Round,Company,Contract,Category,Section,SubSect,Actual,Max,RegionActual,RegionMax)
as (values
(25,'Acme','BBB','Z','Groceries','Dairy',800,1000,null,null),
(25,'Acme','GGG','Z','Groceries','Snacks',300,900,null,null),
(25,'Acme','BBB','Z','Groceries','Fruit & Veg',null,null,550,1000),
(25,'Acme','CCC','Y','Groceries','Fruit & Veg',null,null,450,1000),
(25,'Acme','SSS','Y','Clothing','Mens',null,null,750,1000),
(25,'Acme','DDD','X','Clothing','Mens',null,null,850,1000),
(25,'Blade','AAA','W','Groceries','Dairy',900,1000,null,null),
(25,'Blade','AAA','W','Groceries','Snacks',400,900,null,null),
(25,'Blade','AAA','W','Groceries','Fruit & Veg',null,null,350,1000),
(25,'Blade','CCC','Y','Groceries','Fruit & Veg',null,null,250,1000),
(26,'Acme','BBB','Z','Groceries','Dairy',700,1000,null,null),
(26,'Acme','GGG','Z','Groceries','Schedule',100,800,null,null),
(26,'Acme','BBB','Z','Groceries','Fruit & Veg',null,null,650,1000),
(26,'Blade','AAA','W','Groceries','Dairy',600,1000,null,null),
(26,'Blade','AAA','W','Groceries','Snacks',200,800,null,null),
(26,'Blade','AAA','W','Groceries','Fruit & Veg',null,null,150,1000)
)
select
Round,Company,Section,
sum(Actual) as SumAct,sum(Max) as SumMax,sum(RegionActual) as SumRegAct,sum(RegionMax) as SumRegMax,
count(distinct Contract) as CountContract,count(distinct Category) as CountCat
from temp
group by Round,Company,Section;
Code: Select all
ROUND COMPANY SECTION SUMACT SUMMAX SUMREGACT SUMREGMAX COUNTCONTRACT COUNTCAT
----- ------- --------- ------ ------ --------- --------- ------------- --------
25 Acme Clothing NULL NULL 1600 2000 2 2
25 Acme Groceries 1100 1900 1000 2000 3 2
25 Blade Groceries 1300 1900 600 2000 2 2
26 Acme Groceries 800 1800 650 1000 2 1
26 Blade Groceries 800 1800 150 1000 1 1
Note that data volumes are pretty low, which might open up some more options.
A search finds some old posts such as this one: viewtopic.php?t=91232
I was wondering whether there was anything included in v8.5 that made this a bit easier.
Or should I give up trying to do this in DataStage, and just do the aggregations in DB2?