Rollup Data

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
samit_9999
Participant
Posts: 20
Joined: Thu Oct 06, 2005 12:23 pm

Rollup Data

Post by samit_9999 »

Hi ,

I have a Table with such kind of info

Region Dept Emp Sal
South 10 Sam 10
South 10 Tam 20
South 20 Pam 40
North 30 Cam 50
North 40 Mam 100
North 40 Dam 50

I need information Rolled up at both Region and Dept Level
Total Salary for Region South is 70 Dept 10 = 30 Dept 20=40
Total Salary for Region North is 200 Dept 30 = 50 Dept 40 = 150

I need to generate a file through DataStage job with the following output

Region Dept Dep_Sal_Total Region_Sal_Total
South 10 30 70
South 20 40 70
North 30 50 200
North 40 150 200

Backend is TeraData.

Please help!!
Krazykoolrohit
Charter Member
Charter Member
Posts: 560
Joined: Wed Jul 13, 2005 5:36 am
Location: Ohio

Post by Krazykoolrohit »

I can suggest you two solutions:

1. write a SQL query in the ODBC stage aggregating your columns and sending the output to the new columns. something like:

select Region,Dept,Dep_Sal_Total,Region_Sal_Total

from <table>

where

Dep_Sal_Total in (select dept,Sum(Sal) from <table> group by dept)
and Region_Sal_Total in (select region,Sum(Sal) from <table> group by sal)

group by Dep_Sal_Total,Region_Sal_Total

Add Region,Dept,Dep_Sal_Total,Region_Sal_Total in the columns definition.

Use the query as a hint. I am unable to test and run the query for you right now.


2. use an aggregator stage if you want to sum on individual columns. use a transformer in the end to organize the output and drop the rows which you dont want.

For more inputs of aggregator reer to the documentation available with installation
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

do it in the sql itself

Code: Select all

select A.Region, A.Dept, Sum(A.Sal) As Dep_Sal_Total, Temp.Sal
from table A
inner join (select B.Region,sum(B.Sal)As Sal from table B group by Region) as Temp
on A.Region = Temp.Region
group by A.Region, A.Dept, Temp.Sal


table A, table B both are the same table.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Ok what just happened to my post :roll:
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Krazykoolrohit
Charter Member
Charter Member
Posts: 560
Joined: Wed Jul 13, 2005 5:36 am
Location: Ohio

Post by Krazykoolrohit »

SOS roy.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

How did you guys manage this? Now we're all stuck inside Krazykoolrohit's world... help! :lol:
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

I know, tell me about it.
Roy...Need you. Roy :P
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Walter, more likely, our dear webmaster, is the one to fix this. Looks like the message board software has gone somewhat agley.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
samit_9999
Participant
Posts: 20
Joined: Thu Oct 06, 2005 12:23 pm

Post by samit_9999 »

Thank you DSGuru.

However the problem is much bigger.

I am selecting around 60 columns from some 4-5 Dimensional Views and i am almost at the rim of hitting the 64 Join Parser Limit.So i do not really wanna join to the same view once again .

Is there any option through DataStage that i can achieve the same results?
Krazykoolrohit
Charter Member
Charter Member
Posts: 560
Joined: Wed Jul 13, 2005 5:36 am
Location: Ohio

Post by Krazykoolrohit »

why dont you do all your joins at the database side and populate a final view. use datastage to extract data from there. would be much faster
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Offcourse you can. Break the query into two.
Query1:

Code: Select all

select B.Region,sum(B.Sal)As Sal from table B group by Region 
Since this query is at the Region level, it will retrieve lesser amount of rows as compared to the second query, load this into hashed file, keyed on Region.

Query2:

Code: Select all

select A.Region, A.Dept, Sum(A.Sal) As Dep_Sal_Total, Temp.Sal 
from table A 
group by A.Region, A.Dept, Temp.Sal 
Query 2 will be your source. Do a lookup to the hashed file.
A NOT(LKUPLINK.NOTFOUND) is equivalent to an inner join

But then again, if datasize is massive, go for KrazyKoolRohit's suggestion.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
samit_9999
Participant
Posts: 20
Joined: Thu Oct 06, 2005 12:23 pm

Post by samit_9999 »

Thanks to all of u
Post Reply