CYTD,PYTD,PYE

dsrules · Post by **dsrules** » Mon Dec 27, 2004 2:27 pm

Hi,
Iam trying to aggregate data on CYTD,PYTD,PYE. The CYTD and PYE are pretty straight as we group the aggregating columns on year and month. But how can I aggregate the PYTD which would be prior year to current month values. The meta data has year and month as separate columns. I am using a sort stage to sort the data before I aggregate. I was thinking of using the first row value of the month column when sorted in DESC, so that I would get the Latest month for the current year.
Any ideas how this can be done.
Thanks
Mack

ogmios · Post by **ogmios** » Tue Dec 28, 2004 5:46 am

Give us some input data and what you want as result, it will make things clearer to us.

Normally I never use sort and aggregator stages, I prefer to load data to a database and do any sorting aggregating in the database (via an input SQL executed via DataStage).

Ogmios

dsrules · Post by **dsrules** » Tue Dec 28, 2004 10:07 am

For the Data below I would want the aggregation for all the months of 2004, and all the months of 2003, and till the max month of 2004 in 2003

Code: Select all

Year,Month,"Code","Code_Name",Dollar
2004,09,"AA","Code1",25.50
2004,08,"AA","Code1",25.50
2003,10,"AA","Code1",25.50
2004,07,"AA","Code1",25.50
2003,11,"AA","Code1",25.50
2003,05,"AA","Code1",25.50
2003,04,"AA","Code1",25.50
2003,12,"AA","Code1",25.75

The output would be

Code: Select all

Year,Month,"Code","Code_Name",Dollar
2004,09,"AA","Code1",75.50
2003,09,"AA","Code1",51.00
2003,12,"AA","Code1",127.50

Whats the best way of doing this.
Thanks
Mack

ogmios · Post by **ogmios** » Tue Dec 28, 2004 10:39 am

In SQL only this would be something as (assuming A is your table):

Code: Select all

select year, month, code, code_name, sum(Dollar)
from A
where year * 100 + month >=  
              select (integer(substr(yrmnt, 1, 4)) - 1) * 100 + 
                              integer(substr(yrmnt, 4, 4)) as yrmth
              from (
                        select max(digits(year) || digits(month)) as yrmth
                        from A
              )
group by year, month, code, code_name;

or

Code: Select all

select year, month, code, code_name, sum(Dollar)
from A
where year * 100 + month >=  
                        (select max(year * 100 + month) - 100 as yrmth
                        from A)
              
group by year, month, code, code_name;

Something as:
- Get the maximum combination of year/month
- Subtract 1 of the year
- Then do a grouping of all rows which are higher or equal than this year/month

Or something in the style of your original idea:
- Use a sort stage to sort the data descending
- Put the first (highest) year/month in a stage variable via @INROWNUM
- Skip all rows which are smaller than "maxyear-1"/maxmonth
- Probably sort again to put the data in the right order for the aggregator.
- And push this through an aggregator

For large amounts of data (probably around 100.000) the SQL version is much faster as a DataStage solution.

Ogmios

dsrules · Post by **dsrules** » Tue Dec 28, 2004 12:40 pm

Thanks a lot. That was what I wanted but was contemplating various solutions. The SQL is definetely an option I would be looking into now.
Mack

ogmios · Post by **ogmios** » Tue Dec 28, 2004 12:47 pm

One thing important with the SQL, you have to be sure that code and code_name always are "correct".

You cannot have rows with code/code_name "AA/Code1" and "AA/Code2" as this will screw up the groupings. If the code is AA the code_name always has to be the same.

The reason why the SQL version is faster than a DataStage solution on large volumes is IO. In the DataStage version you would pull all rows from the database to DataStage, process them there and then write a summary into a database. With the SQL version only the summary data gets moved to DataStage and written. The extra IO in the DataStage solutions makes it slow with large volumes.

Ogmios

dsrules · Post by **dsrules** » Tue Dec 28, 2004 1:13 pm

In my case, the values for Code/Code _Name can be "AA"/"Code1" and also "AA"/"Code2" so basically they would be treated as a different row and the aggregations would be done on them as being distinct values.
Actaully the grouping would be on Year, Code and Code_Name.

DSXchange

CYTD,PYTD,PYE

CYTD,PYTD,PYE

Re: CYTD,PYTD,PYE