How to calculate running totals (cumulative) based on a col?
Moderators: chulett, rschirm, roy
How to calculate running totals (cumulative) based on a col?
Hi,
I am working on a task to calculate cumulative (running totals) of a column based on another column (code).
Example:
s.no code amount
1 10 100
2 20 150
3 10 50
4 10 150
5 20 30
Output
s.no code amount cumulative
1 10 100 100
2 20 150 160
3 10 50 150 (100+50)
4 10 150 300 (150+150)
5 20 30 190 (160+30)
I want to apply this logic on some million rows.
Please suggest me how to proceed on this.
Thank you,
I am working on a task to calculate cumulative (running totals) of a column based on another column (code).
Example:
s.no code amount
1 10 100
2 20 150
3 10 50
4 10 150
5 20 30
Output
s.no code amount cumulative
1 10 100 100
2 20 150 160
3 10 50 150 (100+50)
4 10 150 300 (150+150)
5 20 30 190 (160+30)
I want to apply this logic on some million rows.
Please suggest me how to proceed on this.
Thank you,
Bhanu
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Use stage variable to remember the earlier row and to generate the running total. Make it the same data type as the output column, which must be compatible with the data type of Amount. Initialize the stage variable to 0.
Code: Select all
svCumulative = svCumulative + svprevAmount
Last edited by ray.wurlod on Sat May 25, 2013 3:31 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 376
- Joined: Sat Jan 07, 2012 12:25 pm
- Location: Piscataway
If your source is a database, I suggest you push this processing to the database. SQL OLAP functions are very efficient will provided optimal performance for the stated requirement unless of course you WANT to implement it using Datastage only.
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn
Life is really simple, but we insist on making it complicated.
Data Integration Consultant at AWS
Connect With Me On LinkedIn
Life is really simple, but we insist on making it complicated.
Thanks Ray,ray.wurlod wrote:Use stage variable to remember the earlier row and to generate the running total. Make it the same data type as the output column, which must be compatible with the data type of Amount. Initialize t ...
I already tried that I am getting 0's and 1's that I didn't understand. It was working fine to calculate running total of a single column with out any conditions. But in this case I want calculate running totals based on CODE (It has to calculate running totals of each CODE individually). Please take a look at the following logic and correct me if I am wrong.
Stage variable's:
TEMP -
if (DSLink21.CATEGORY='10')
Then TEMP = TEMP1+ DSLink21.BUDGET_AMOUNT
Else If (DSLink21.CATEGORY='20')
Then TEMP = TEMP2+ DSLink21.BUDGET_AMOUNT
Else TEMP = TEMP3+ DSLink21.BUDGET_AMOUNT
TEMP1-
If (DSLink21.CATEGORY='10') Then TEMP2=TEMP else TEMP22=0
TEMP2-
If (DSLink21.CATEGORY='20') Then TEMP2=TEMP else TEMP22=0
TEMP3-
If (DSLink21.CATEGORY='30') Then TEMP2=TEMP else TEMP22=0
TEMP22-
All stage variables declared as decimal and initialize as 0 and Category is varchar thats why I used quotes '' inside if condition. TEMP1, TEMP2, TEMP3 has to store the running totals of corresponding codes 10,20 and 30. TEMP variable assigned to output column "Cumulative".
Bhanu
Is it certain that you only have '10', '20' and 'other' as values for CATEGORY? If so, I'd do 3 stage variables, each initialized to 0:
Cat10 derivation IF DSLink21.CATEGORY='10' THEN Cat10+DSLink21.BUDGET_AMOUNT ELSE Cat10
Cat20 derivation IF DSLink21.CATEGORY='20' THEN Cat20+DSLink21.BUDGET_AMOUNT ELSE Cat20
CatNN derivation IF DSLink21.CATEGORY#'10' AND DSLink21.CATEGORY#'20' THEN CatNN+DSLink21.BUDGET_AMOUNT ELSE CatNN
That takes care of your running totals for the 3 categories. Now define the derivation of your Cumulative column as
IF DSLink21.CATEGORY='10' THEN Cat10 ELSE IF DSLink21.CATEGORY='20' THEN Cat20 ELSE CatNN
Cat10 derivation IF DSLink21.CATEGORY='10' THEN Cat10+DSLink21.BUDGET_AMOUNT ELSE Cat10
Cat20 derivation IF DSLink21.CATEGORY='20' THEN Cat20+DSLink21.BUDGET_AMOUNT ELSE Cat20
CatNN derivation IF DSLink21.CATEGORY#'10' AND DSLink21.CATEGORY#'20' THEN CatNN+DSLink21.BUDGET_AMOUNT ELSE CatNN
That takes care of your running totals for the 3 categories. Now define the derivation of your Cumulative column as
IF DSLink21.CATEGORY='10' THEN Cat10 ELSE IF DSLink21.CATEGORY='20' THEN Cat20 ELSE CatNN
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 251
- Joined: Mon Jun 09, 2008 5:52 am
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Stage Variables are like this:
TEMP1 - IF DSLink21.CAT='10' THEN TEMP1+DSLink21.BUDGET_AMOUNT ELSE TEMP1
TEMP2 - IF DSLink21.CAT='20' THEN TEMP2+DSLink21.BUDGET_AMOUNT ELSE TEMP2
..................
TEMP - if (DSLink21.CATEGORY='10')
Then TEMP1
Else If (DSLink21.CATEGORY='20')
Then TEMP2
Else TEMP3
I have one more doubt if I want to sort the data by multiple columns (after one by one) can I use sort stage or can I use the order by clause in the sql query itself?
TEMP1 - IF DSLink21.CAT='10' THEN TEMP1+DSLink21.BUDGET_AMOUNT ELSE TEMP1
TEMP2 - IF DSLink21.CAT='20' THEN TEMP2+DSLink21.BUDGET_AMOUNT ELSE TEMP2
..................
TEMP - if (DSLink21.CATEGORY='10')
Then TEMP1
Else If (DSLink21.CATEGORY='20')
Then TEMP2
Else TEMP3
I have one more doubt if I want to sort the data by multiple columns (after one by one) can I use sort stage or can I use the order by clause in the sql query itself?
Bhanu
Yes.bond88 wrote:I want to sort the data by multiple columns (after one by one) can I use sort stage or can I use the order by clause in the sql query itself?
Either one can be used but for the latter you may still need to add a Sort stage set to "don't sort already sorted" or whatever it says nowadays so it doesn't re-sort it for you.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
I don't see a derivation for TEMP3.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
There is a small problem. Finally summation (Running total) is working but it's not calculating in the order. I want to calculate running totals based on the month (accounting period column) 1 to 12. I am sorting at the source query and I tried sorting using sort stage also but it is calculating in random order and not in ascending or descending. Any suggestions please.
Thanks,
Thanks,
Bhanu