How to calculate running totals (cumulative) based on a col?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

bond88
Participant
Posts: 109
Joined: Mon Oct 15, 2012 10:05 am
Location: USA

How to calculate running totals (cumulative) based on a col?

Post by bond88 »

Hi,
I am working on a task to calculate cumulative (running totals) of a column based on another column (code).

Example:

s.no code amount

1 10 100
2 20 150
3 10 50
4 10 150
5 20 30


Output

s.no code amount cumulative

1 10 100 100
2 20 150 160
3 10 50 150 (100+50)
4 10 150 300 (150+150)
5 20 30 190 (160+30)

I want to apply this logic on some million rows.

Please suggest me how to proceed on this.

Thank you,
Bhanu
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use stage variable to remember the earlier row and to generate the running total. Make it the same data type as the output column, which must be compatible with the data type of Amount. Initialize the stage variable to 0.

Code: Select all

svCumulative = svCumulative + svprevAmount
Last edited by ray.wurlod on Sat May 25, 2013 3:31 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jerome_rajan
Premium Member
Premium Member
Posts: 376
Joined: Sat Jan 07, 2012 12:25 pm
Location: Piscataway

Post by jerome_rajan »

If your source is a database, I suggest you push this processing to the database. SQL OLAP functions are very efficient will provided optimal performance for the stated requirement unless of course you WANT to implement it using Datastage only. :)
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn

Life is really simple, but we insist on making it complicated.
bond88
Participant
Posts: 109
Joined: Mon Oct 15, 2012 10:05 am
Location: USA

Post by bond88 »

ray.wurlod wrote:Use stage variable to remember the earlier row and to generate the running total. Make it the same data type as the output column, which must be compatible with the data type of Amount. Initialize t ...
Thanks Ray,
I already tried that I am getting 0's and 1's that I didn't understand. It was working fine to calculate running total of a single column with out any conditions. But in this case I want calculate running totals based on CODE (It has to calculate running totals of each CODE individually). Please take a look at the following logic and correct me if I am wrong.


Stage variable's:

TEMP -

if (DSLink21.CATEGORY='10')
Then TEMP = TEMP1+ DSLink21.BUDGET_AMOUNT
Else If (DSLink21.CATEGORY='20')
Then TEMP = TEMP2+ DSLink21.BUDGET_AMOUNT
Else TEMP = TEMP3+ DSLink21.BUDGET_AMOUNT


TEMP1-

If (DSLink21.CATEGORY='10') Then TEMP2=TEMP else TEMP22=0

TEMP2-

If (DSLink21.CATEGORY='20') Then TEMP2=TEMP else TEMP22=0

TEMP3-

If (DSLink21.CATEGORY='30') Then TEMP2=TEMP else TEMP22=0

TEMP22-


All stage variables declared as decimal and initialize as 0 and Category is varchar thats why I used quotes '' inside if condition. TEMP1, TEMP2, TEMP3 has to store the running totals of corresponding codes 10,20 and 30. TEMP variable assigned to output column "Cumulative".
Bhanu
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Is it certain that you only have '10', '20' and 'other' as values for CATEGORY? If so, I'd do 3 stage variables, each initialized to 0:
Cat10 derivation IF DSLink21.CATEGORY='10' THEN Cat10+DSLink21.BUDGET_AMOUNT ELSE Cat10
Cat20 derivation IF DSLink21.CATEGORY='20' THEN Cat20+DSLink21.BUDGET_AMOUNT ELSE Cat20
CatNN derivation IF DSLink21.CATEGORY#'10' AND DSLink21.CATEGORY#'20' THEN CatNN+DSLink21.BUDGET_AMOUNT ELSE CatNN

That takes care of your running totals for the 3 categories. Now define the derivation of your Cumulative column as

IF DSLink21.CATEGORY='10' THEN Cat10 ELSE IF DSLink21.CATEGORY='20' THEN Cat20 ELSE CatNN
srinivas.g
Participant
Posts: 251
Joined: Mon Jun 09, 2008 5:52 am

Post by srinivas.g »

You can use LastRowInGroup(%inputcol%) transformer fucntion. Input column is code.
Srinu Gadipudi
bond88
Participant
Posts: 109
Joined: Mon Oct 15, 2012 10:05 am
Location: USA

Post by bond88 »

Hi ArndW,
I am getting all zero's in cumulative column. Could you please guide me?

Thank you,

Regards,
Lohith Sama.
Bhanu
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Please post YOUR stage variable derivations.

In particular derivation expressions must NOT contain assignment statements.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bond88
Participant
Posts: 109
Joined: Mon Oct 15, 2012 10:05 am
Location: USA

Post by bond88 »

Stage Variables are like this:

TEMP1 - IF DSLink21.CAT='10' THEN TEMP1+DSLink21.BUDGET_AMOUNT ELSE TEMP1
TEMP2 - IF DSLink21.CAT='20' THEN TEMP2+DSLink21.BUDGET_AMOUNT ELSE TEMP2

..................

TEMP - if (DSLink21.CATEGORY='10')
Then TEMP1
Else If (DSLink21.CATEGORY='20')
Then TEMP2
Else TEMP3

I have one more doubt if I want to sort the data by multiple columns (after one by one) can I use sort stage or can I use the order by clause in the sql query itself?
Bhanu
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

bond88 wrote:I want to sort the data by multiple columns (after one by one) can I use sort stage or can I use the order by clause in the sql query itself?
Yes. :wink:

Either one can be used but for the latter you may still need to add a Sort stage set to "don't sort already sorted" or whatever it says nowadays so it doesn't re-sort it for you.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I don't see a derivation for TEMP3.
bond88
Participant
Posts: 109
Joined: Mon Oct 15, 2012 10:05 am
Location: USA

Post by bond88 »

Sorry ArndW,
I missed it in that post.

Thank you,
Bhanu
bond88
Participant
Posts: 109
Joined: Mon Oct 15, 2012 10:05 am
Location: USA

Post by bond88 »

There is a small problem. Finally summation (Running total) is working but it's not calculating in the order. I want to calculate running totals based on the month (accounting period column) 1 to 12. I am sorting at the source query and I tried sorting using sort stage also but it is calculating in random order and not in ascending or descending. Any suggestions please.

Thanks,
Bhanu
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

How many nodes is the job running on and how are you partitioning the data?
-craig

"You can never have too many knives" -- Logan Nine Fingers
bond88
Participant
Posts: 109
Joined: Mon Oct 15, 2012 10:05 am
Location: USA

Post by bond88 »

Hi Chulett,
I am running it in sequential mode. I didn't enable partition reads and I selected sequential mode under transformer stage also.

Thanks,
Bhanu
Post Reply