sql GROUP BY CUBE- how would you recreate this in datastage?

dohertys · Post by **dohertys** » Fri Oct 26, 2007 4:54 am

I'm producing a report based on the following sql

select key1, key2, key3, sum(value1), SomeCalculation(value2), Avg(Value3)

from MyTable
group by cube key1, key2, key3

If I instead just brought back the basic rows from the database (without the group by cube and the calculations), how would I recreate this 'group by cube' processing in Datastage?

Thanks

chulett · Post by **chulett** » Fri Oct 26, 2007 6:04 am

Me, I wouldn't and would use a BI tool for that, not an ETL tool... but that's just me.

To get help on this, first I'd suggest that you explain what 'GROUP BY CUBE' does exactly rather than expecting everyone to be familiar with it already. Or forget the sql specific syntax and explain in words your processing requirements...

dohertys · Post by **dohertys** » Fri Oct 26, 2007 6:59 am

Me, I wouldn't and would use a BI tool for that, not an ETL tool... but that's just me.

I would agree, but I've been asked to produce this data into a table so that the front end developers don't have to do anything too complicated (just make it pretty and do simple selects).

Good point about my lack of explanation too, I should try to explain that better. I was trying to save myself some typing (and some thinking).

As I understand it, the 'GROUP BY CUBE' operator in sql will perform aggregate functions for every possible combination of the key fields.

e.g.
If I have the data

Key1, Key2, Value
A, A, 1
A, B, 2
B, B, 3

and did the query
select Key1, Key2, sum(Value)
from MyTable
group by cube Key1, Key2

I would get

Key1, Key2, sum(Value)
row1 A, A, 1
row2 A, B, 2
row3 A, -, 3
row4 B, B, 3
row5 B, -, 3
row6 -, A, 1
row7 -, B, 5
row8 -, -, 6

i.e. its produced an calculation of the aggregate for every combination of the keys, and also for summarys (e.g . row 3 is the summary for all the records with Key1 = 'A' ; row5 is the summary for all records with Key1 = 'B'; row6 is the summary for all records with Key2= = 'A' ; row7 is the summary record for Key2 = 'B'; and row8 is the overall summary record).

So, it there a 'simple' way of doing this in Datastage? Or should I leave it in the sql where it belongs?

Maveric · Post by **Maveric** » Fri Oct 26, 2007 7:07 am

To my knowledge there is no way this can be done in one stage. Need two sort stages, two aggregator stages and then a filter. And then a third aggregator stage to get the last record. Looks to me like its better off in SQL.

dohertys · Post by **dohertys** » Fri Oct 26, 2007 7:11 am

Right, I'm convinced then. SQL it is.

Thanks for your responses.