sql GROUP BY CUBE- how would you recreate this in datastage?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dohertys
Participant
Posts: 39
Joined: Thu Oct 11, 2007 3:26 am
Location: Sheffield

sql GROUP BY CUBE- how would you recreate this in datastage?

Post by dohertys »

I'm producing a report based on the following sql

select key1, key2, key3, sum(value1), SomeCalculation(value2), Avg(Value3)

from MyTable
group by cube key1, key2, key3


If I instead just brought back the basic rows from the database (without the group by cube and the calculations), how would I recreate this 'group by cube' processing in Datastage?

Thanks
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Me, I wouldn't and would use a BI tool for that, not an ETL tool... but that's just me. :wink:

To get help on this, first I'd suggest that you explain what 'GROUP BY CUBE' does exactly rather than expecting everyone to be familiar with it already. Or forget the sql specific syntax and explain in words your processing requirements...
-craig

"You can never have too many knives" -- Logan Nine Fingers
dohertys
Participant
Posts: 39
Joined: Thu Oct 11, 2007 3:26 am
Location: Sheffield

Post by dohertys »

Me, I wouldn't and would use a BI tool for that, not an ETL tool... but that's just me.
I would agree, but I've been asked to produce this data into a table so that the front end developers don't have to do anything too complicated (just make it pretty and do simple selects).

Good point about my lack of explanation too, I should try to explain that better. I was trying to save myself some typing (and some thinking).

As I understand it, the 'GROUP BY CUBE' operator in sql will perform aggregate functions for every possible combination of the key fields.

e.g.
If I have the data

Key1, Key2, Value
A, A, 1
A, B, 2
B, B, 3

and did the query
select Key1, Key2, sum(Value)
from MyTable
group by cube Key1, Key2

I would get

Key1, Key2, sum(Value)
row1 A, A, 1
row2 A, B, 2
row3 A, -, 3
row4 B, B, 3
row5 B, -, 3
row6 -, A, 1
row7 -, B, 5
row8 -, -, 6


i.e. its produced an calculation of the aggregate for every combination of the keys, and also for summarys (e.g . row 3 is the summary for all the records with Key1 = 'A' ; row5 is the summary for all records with Key1 = 'B'; row6 is the summary for all records with Key2= = 'A' ; row7 is the summary record for Key2 = 'B'; and row8 is the overall summary record).

So, it there a 'simple' way of doing this in Datastage? Or should I leave it in the sql where it belongs?
Maveric
Participant
Posts: 388
Joined: Tue Mar 13, 2007 1:28 am

Post by Maveric »

To my knowledge there is no way this can be done in one stage. Need two sort stages, two aggregator stages and then a filter. And then a third aggregator stage to get the last record. Looks to me like its better off in SQL.
dohertys
Participant
Posts: 39
Joined: Thu Oct 11, 2007 3:26 am
Location: Sheffield

Post by dohertys »

Right, I'm convinced then. SQL it is.

Thanks for your responses.
Post Reply