Hi,
Iam trying to aggregate data on CYTD,PYTD,PYE. The CYTD and PYE are pretty straight as we group the aggregating columns on year and month. But how can I aggregate the PYTD which would be prior year to current month values. The meta data has year and month as separate columns. I am using a sort stage to sort the data before I aggregate. I was thinking of using the first row value of the month column when sorted in DESC, so that I would get the Latest month for the current year.
Any ideas how this can be done.
Thanks
Mack
CYTD,PYTD,PYE
Moderators: chulett, rschirm, roy
CYTD,PYTD,PYE
______________________________________
"Everytime I close the door on reality, it comes in through the windows." - Jennifer Yane
"Everytime I close the door on reality, it comes in through the windows." - Jennifer Yane
Re: CYTD,PYTD,PYE
Give us some input data and what you want as result, it will make things clearer to us.
Normally I never use sort and aggregator stages, I prefer to load data to a database and do any sorting aggregating in the database (via an input SQL executed via DataStage).
Ogmios
Normally I never use sort and aggregator stages, I prefer to load data to a database and do any sorting aggregating in the database (via an input SQL executed via DataStage).
Ogmios
In theory there's no difference between theory and practice. In practice there is.
For the Data below I would want the aggregation for all the months of 2004, and all the months of 2003, and till the max month of 2004 in 2003
The output would be
Whats the best way of doing this.
Thanks
Mack
Code: Select all
Year,Month,"Code","Code_Name",Dollar
2004,09,"AA","Code1",25.50
2004,08,"AA","Code1",25.50
2003,10,"AA","Code1",25.50
2004,07,"AA","Code1",25.50
2003,11,"AA","Code1",25.50
2003,05,"AA","Code1",25.50
2003,04,"AA","Code1",25.50
2003,12,"AA","Code1",25.75
Code: Select all
Year,Month,"Code","Code_Name",Dollar
2004,09,"AA","Code1",75.50
2003,09,"AA","Code1",51.00
2003,12,"AA","Code1",127.50
Thanks
Mack
______________________________________
"Everytime I close the door on reality, it comes in through the windows." - Jennifer Yane
"Everytime I close the door on reality, it comes in through the windows." - Jennifer Yane
In SQL only this would be something as (assuming A is your table):
or
Something as:
- Get the maximum combination of year/month
- Subtract 1 of the year
- Then do a grouping of all rows which are higher or equal than this year/month
Or something in the style of your original idea:
- Use a sort stage to sort the data descending
- Put the first (highest) year/month in a stage variable via @INROWNUM
- Skip all rows which are smaller than "maxyear-1"/maxmonth
- Probably sort again to put the data in the right order for the aggregator.
- And push this through an aggregator
For large amounts of data (probably around 100.000) the SQL version is much faster as a DataStage solution.
Ogmios
Code: Select all
select year, month, code, code_name, sum(Dollar)
from A
where year * 100 + month >=
select (integer(substr(yrmnt, 1, 4)) - 1) * 100 +
integer(substr(yrmnt, 4, 4)) as yrmth
from (
select max(digits(year) || digits(month)) as yrmth
from A
)
group by year, month, code, code_name;
Code: Select all
select year, month, code, code_name, sum(Dollar)
from A
where year * 100 + month >=
(select max(year * 100 + month) - 100 as yrmth
from A)
group by year, month, code, code_name;
Something as:
- Get the maximum combination of year/month
- Subtract 1 of the year
- Then do a grouping of all rows which are higher or equal than this year/month
Or something in the style of your original idea:
- Use a sort stage to sort the data descending
- Put the first (highest) year/month in a stage variable via @INROWNUM
- Skip all rows which are smaller than "maxyear-1"/maxmonth
- Probably sort again to put the data in the right order for the aggregator.
- And push this through an aggregator
For large amounts of data (probably around 100.000) the SQL version is much faster as a DataStage solution.
Ogmios
In theory there's no difference between theory and practice. In practice there is.
One thing important with the SQL, you have to be sure that code and code_name always are "correct".
You cannot have rows with code/code_name "AA/Code1" and "AA/Code2" as this will screw up the groupings. If the code is AA the code_name always has to be the same.
The reason why the SQL version is faster than a DataStage solution on large volumes is IO. In the DataStage version you would pull all rows from the database to DataStage, process them there and then write a summary into a database. With the SQL version only the summary data gets moved to DataStage and written. The extra IO in the DataStage solutions makes it slow with large volumes.
Ogmios
You cannot have rows with code/code_name "AA/Code1" and "AA/Code2" as this will screw up the groupings. If the code is AA the code_name always has to be the same.
The reason why the SQL version is faster than a DataStage solution on large volumes is IO. In the DataStage version you would pull all rows from the database to DataStage, process them there and then write a summary into a database. With the SQL version only the summary data gets moved to DataStage and written. The extra IO in the DataStage solutions makes it slow with large volumes.
Ogmios
In theory there's no difference between theory and practice. In practice there is.
In my case, the values for Code/Code _Name can be "AA"/"Code1" and also "AA"/"Code2" so basically they would be treated as a different row and the aggregations would be done on them as being distinct values.
Actaully the grouping would be on Year, Code and Code_Name.
Actaully the grouping would be on Year, Code and Code_Name.
______________________________________
"Everytime I close the door on reality, it comes in through the windows." - Jennifer Yane
"Everytime I close the door on reality, it comes in through the windows." - Jennifer Yane