Page 1 of 2

How to calculate running totals (cumulative) based on a col?

Posted: Fri May 24, 2013 12:20 pm
by bond88
Hi,
I am working on a task to calculate cumulative (running totals) of a column based on another column (code).

Example:

s.no code amount

1 10 100
2 20 150
3 10 50
4 10 150
5 20 30


Output

s.no code amount cumulative

1 10 100 100
2 20 150 160
3 10 50 150 (100+50)
4 10 150 300 (150+150)
5 20 30 190 (160+30)

I want to apply this logic on some million rows.

Please suggest me how to proceed on this.

Thank you,

Posted: Fri May 24, 2013 4:14 pm
by ray.wurlod
Use stage variable to remember the earlier row and to generate the running total. Make it the same data type as the output column, which must be compatible with the data type of Amount. Initialize the stage variable to 0.

Code: Select all

svCumulative = svCumulative + svprevAmount

Posted: Sat May 25, 2013 12:01 pm
by jerome_rajan
If your source is a database, I suggest you push this processing to the database. SQL OLAP functions are very efficient will provided optimal performance for the stated requirement unless of course you WANT to implement it using Datastage only. :)

Posted: Tue May 28, 2013 9:52 am
by bond88
ray.wurlod wrote:Use stage variable to remember the earlier row and to generate the running total. Make it the same data type as the output column, which must be compatible with the data type of Amount. Initialize t ...
Thanks Ray,
I already tried that I am getting 0's and 1's that I didn't understand. It was working fine to calculate running total of a single column with out any conditions. But in this case I want calculate running totals based on CODE (It has to calculate running totals of each CODE individually). Please take a look at the following logic and correct me if I am wrong.


Stage variable's:

TEMP -

if (DSLink21.CATEGORY='10')
Then TEMP = TEMP1+ DSLink21.BUDGET_AMOUNT
Else If (DSLink21.CATEGORY='20')
Then TEMP = TEMP2+ DSLink21.BUDGET_AMOUNT
Else TEMP = TEMP3+ DSLink21.BUDGET_AMOUNT


TEMP1-

If (DSLink21.CATEGORY='10') Then TEMP2=TEMP else TEMP22=0

TEMP2-

If (DSLink21.CATEGORY='20') Then TEMP2=TEMP else TEMP22=0

TEMP3-

If (DSLink21.CATEGORY='30') Then TEMP2=TEMP else TEMP22=0

TEMP22-


All stage variables declared as decimal and initialize as 0 and Category is varchar thats why I used quotes '' inside if condition. TEMP1, TEMP2, TEMP3 has to store the running totals of corresponding codes 10,20 and 30. TEMP variable assigned to output column "Cumulative".

Posted: Tue May 28, 2013 10:02 am
by ArndW
Is it certain that you only have '10', '20' and 'other' as values for CATEGORY? If so, I'd do 3 stage variables, each initialized to 0:
Cat10 derivation IF DSLink21.CATEGORY='10' THEN Cat10+DSLink21.BUDGET_AMOUNT ELSE Cat10
Cat20 derivation IF DSLink21.CATEGORY='20' THEN Cat20+DSLink21.BUDGET_AMOUNT ELSE Cat20
CatNN derivation IF DSLink21.CATEGORY#'10' AND DSLink21.CATEGORY#'20' THEN CatNN+DSLink21.BUDGET_AMOUNT ELSE CatNN

That takes care of your running totals for the 3 categories. Now define the derivation of your Cumulative column as

IF DSLink21.CATEGORY='10' THEN Cat10 ELSE IF DSLink21.CATEGORY='20' THEN Cat20 ELSE CatNN

Posted: Tue May 28, 2013 11:44 am
by srinivas.g
You can use LastRowInGroup(%inputcol%) transformer fucntion. Input column is code.

Posted: Tue May 28, 2013 3:23 pm
by bond88
Hi ArndW,
I am getting all zero's in cumulative column. Could you please guide me?

Thank you,

Regards,
Lohith Sama.

Posted: Tue May 28, 2013 3:30 pm
by ray.wurlod
Please post YOUR stage variable derivations.

In particular derivation expressions must NOT contain assignment statements.

Posted: Wed May 29, 2013 11:08 am
by bond88
Stage Variables are like this:

TEMP1 - IF DSLink21.CAT='10' THEN TEMP1+DSLink21.BUDGET_AMOUNT ELSE TEMP1
TEMP2 - IF DSLink21.CAT='20' THEN TEMP2+DSLink21.BUDGET_AMOUNT ELSE TEMP2

..................

TEMP - if (DSLink21.CATEGORY='10')
Then TEMP1
Else If (DSLink21.CATEGORY='20')
Then TEMP2
Else TEMP3

I have one more doubt if I want to sort the data by multiple columns (after one by one) can I use sort stage or can I use the order by clause in the sql query itself?

Posted: Wed May 29, 2013 11:30 am
by chulett
bond88 wrote:I want to sort the data by multiple columns (after one by one) can I use sort stage or can I use the order by clause in the sql query itself?
Yes. :wink:

Either one can be used but for the latter you may still need to add a Sort stage set to "don't sort already sorted" or whatever it says nowadays so it doesn't re-sort it for you.

Posted: Thu May 30, 2013 1:10 am
by ArndW
I don't see a derivation for TEMP3.

Posted: Thu May 30, 2013 9:53 am
by bond88
Sorry ArndW,
I missed it in that post.

Thank you,

Posted: Mon Jun 03, 2013 9:16 am
by bond88
There is a small problem. Finally summation (Running total) is working but it's not calculating in the order. I want to calculate running totals based on the month (accounting period column) 1 to 12. I am sorting at the source query and I tried sorting using sort stage also but it is calculating in random order and not in ascending or descending. Any suggestions please.

Thanks,

Posted: Mon Jun 03, 2013 9:40 am
by chulett
How many nodes is the job running on and how are you partitioning the data?

Posted: Mon Jun 03, 2013 9:42 am
by bond88
Hi Chulett,
I am running it in sequential mode. I didn't enable partition reads and I selected sequential mode under transformer stage also.

Thanks,