Page 1 of 1

Aggregation/Summation

Posted: Wed Jun 12, 2013 2:24 pm
by bond88
Hi all,
Could you please help me in creating this job?

Input

Project_ID-----Year-----month-----code-----amount
1 ---------------2011----1------------10-------100
1----------------2011----1------------20-------95
1----------------2011----1------------30-------80
2----------------2011----1------------20-------500
2----------------2011----1------------30-------800

and so on up to
1----------------2011----12----------20-------70
so on....
2 ---------------2012----10------------10-------1500
2----------------2012----10------------20-------2000
1 ---------------2013----12------------10-------500
1----------------2013----12------------20-------900
1----------------2013----12------------30-------1000


Output:
Project_ID-----Year-----month-----------amount
2----------------2012----10-------------------3500
1----------------2013----12-------------------2400


Output needs to be one record for each project (latest record, year and month if that month has multiple records with different codes, it needs to be sum up in to a single value)

Thank you,

Posted: Wed Jun 12, 2013 2:31 pm
by chulett
Would this not just need an Aggregator? Group on Project ID, year, month and sum the amount.

Posted: Thu Jun 13, 2013 9:12 am
by bond88
Thanks Chulett,
You are right ! If we use aggregator stage and group by on project_id, year and month and I used Aggregation type = calculation, column for calculation = amount, sum of output column = amount_sum

But I am getting all the years and corresponding months. How could I get the latest year and month in the particular project group. For project_id I need the latest record.

Thanks,

Posted: Thu Jun 13, 2013 9:19 am
by chulett
Max(year/month) comes to mind but that won't get you the appropriate sum. Seems to me you'll need to find a way to send only those max records to the Aggregator. Sort descending with a key change column and take the first group per year, perhaps.

Posted: Thu Jun 13, 2013 9:45 am
by bond88
Sorry Chulett,
I didn't understand completely. Before feeding to aggregator stage if we use sort stage (descending) with key change column, how could I feed only the first group of each project_id to aggregator stage? Any suggestion please.

Thanks,

Posted: Thu Jun 13, 2013 9:56 am
by chulett
One way - stage variables to take the first group per project / year, hence the suggestion to sort descending.

Posted: Thu Jun 13, 2013 9:58 am
by bond88
Correct me if this approach is wrong/lengthy one.
First I can sort descending by project_id and fiscal_year along with key change column and if I filter on keychange=1 then I will get one row for each year for particular project_id and after that if I use sort stage again to sort by project_id along with key column change and then if I use filter stage to filter keychange=1 this time I will get one record (latest) per each project_id.

Posted: Thu Jun 13, 2013 1:56 pm
by prasson_ibm
Hi,
After sorting,you can use transformer.below is the logic:-

Code: Select all

Svar=If keychange=1 then Inputcol.Year else Svar
Pass this variable to output derivation of Year then aggregate the data as suggested by chullet.

Posted: Thu Jun 13, 2013 1:57 pm
by prasson_ibm
Hi,
After sorting,you can use transformer.write below logic in stage variable:-

Code: Select all

Svar=If keychange=1 then Inputcol.Year else Svar
Pass this variable to output derivation of Year then aggregate the data as suggested by chullet.

Posted: Wed Jul 10, 2013 4:08 am
by srinivas.nettalam
OR a lengthy approach...

1-Split the input to two streams.
2-Find max year and month in the output1
3- join the two outputs(Fork join) on projectid,month and year and then find sum.