Aggregation/Summation
Moderators: chulett, rschirm, roy
Aggregation/Summation
Hi all,
Could you please help me in creating this job?
Input
Project_ID-----Year-----month-----code-----amount
1 ---------------2011----1------------10-------100
1----------------2011----1------------20-------95
1----------------2011----1------------30-------80
2----------------2011----1------------20-------500
2----------------2011----1------------30-------800
and so on up to
1----------------2011----12----------20-------70
so on....
2 ---------------2012----10------------10-------1500
2----------------2012----10------------20-------2000
1 ---------------2013----12------------10-------500
1----------------2013----12------------20-------900
1----------------2013----12------------30-------1000
Output:
Project_ID-----Year-----month-----------amount
2----------------2012----10-------------------3500
1----------------2013----12-------------------2400
Output needs to be one record for each project (latest record, year and month if that month has multiple records with different codes, it needs to be sum up in to a single value)
Thank you,
Could you please help me in creating this job?
Input
Project_ID-----Year-----month-----code-----amount
1 ---------------2011----1------------10-------100
1----------------2011----1------------20-------95
1----------------2011----1------------30-------80
2----------------2011----1------------20-------500
2----------------2011----1------------30-------800
and so on up to
1----------------2011----12----------20-------70
so on....
2 ---------------2012----10------------10-------1500
2----------------2012----10------------20-------2000
1 ---------------2013----12------------10-------500
1----------------2013----12------------20-------900
1----------------2013----12------------30-------1000
Output:
Project_ID-----Year-----month-----------amount
2----------------2012----10-------------------3500
1----------------2013----12-------------------2400
Output needs to be one record for each project (latest record, year and month if that month has multiple records with different codes, it needs to be sum up in to a single value)
Thank you,
Bhanu
Thanks Chulett,
You are right ! If we use aggregator stage and group by on project_id, year and month and I used Aggregation type = calculation, column for calculation = amount, sum of output column = amount_sum
But I am getting all the years and corresponding months. How could I get the latest year and month in the particular project group. For project_id I need the latest record.
Thanks,
You are right ! If we use aggregator stage and group by on project_id, year and month and I used Aggregation type = calculation, column for calculation = amount, sum of output column = amount_sum
But I am getting all the years and corresponding months. How could I get the latest year and month in the particular project group. For project_id I need the latest record.
Thanks,
Bhanu
Max(year/month) comes to mind but that won't get you the appropriate sum. Seems to me you'll need to find a way to send only those max records to the Aggregator. Sort descending with a key change column and take the first group per year, perhaps.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Correct me if this approach is wrong/lengthy one.
First I can sort descending by project_id and fiscal_year along with key change column and if I filter on keychange=1 then I will get one row for each year for particular project_id and after that if I use sort stage again to sort by project_id along with key column change and then if I use filter stage to filter keychange=1 this time I will get one record (latest) per each project_id.
First I can sort descending by project_id and fiscal_year along with key change column and if I filter on keychange=1 then I will get one row for each year for particular project_id and after that if I use sort stage again to sort by project_id along with key column change and then if I use filter stage to filter keychange=1 this time I will get one record (latest) per each project_id.
Bhanu
-
- Premium Member
- Posts: 536
- Joined: Thu Oct 11, 2007 1:48 am
- Location: Bangalore
Hi,
After sorting,you can use transformer.below is the logic:-
Pass this variable to output derivation of Year then aggregate the data as suggested by chullet.
After sorting,you can use transformer.below is the logic:-
Code: Select all
Svar=If keychange=1 then Inputcol.Year else Svar
Thanks
Prasoon
ETL Consultant
LinkedIn :- http://www.linkedin.com/profile/view?id ... ab_pro_top
Blog:- http://dsshar.blogspot.com/
Prasoon
ETL Consultant
LinkedIn :- http://www.linkedin.com/profile/view?id ... ab_pro_top
Blog:- http://dsshar.blogspot.com/
-
- Premium Member
- Posts: 536
- Joined: Thu Oct 11, 2007 1:48 am
- Location: Bangalore
Hi,
After sorting,you can use transformer.write below logic in stage variable:-
Pass this variable to output derivation of Year then aggregate the data as suggested by chullet.
After sorting,you can use transformer.write below logic in stage variable:-
Code: Select all
Svar=If keychange=1 then Inputcol.Year else Svar
Thanks
Prasoon
ETL Consultant
LinkedIn :- http://www.linkedin.com/profile/view?id ... ab_pro_top
Blog:- http://dsshar.blogspot.com/
Prasoon
ETL Consultant
LinkedIn :- http://www.linkedin.com/profile/view?id ... ab_pro_top
Blog:- http://dsshar.blogspot.com/
-
- Participant
- Posts: 134
- Joined: Tue Jun 15, 2010 2:10 am
- Location: Bangalore