CYTD,PYTD,PYE

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
dsrules
Premium Member
Premium Member
Posts: 76
Joined: Sun Nov 28, 2004 8:56 pm

CYTD,PYTD,PYE

Post by dsrules »

Hi,
Iam trying to aggregate data on CYTD,PYTD,PYE. The CYTD and PYE are pretty straight as we group the aggregating columns on year and month. But how can I aggregate the PYTD which would be prior year to current month values. The meta data has year and month as separate columns. I am using a sort stage to sort the data before I aggregate. I was thinking of using the first row value of the month column when sorted in DESC, so that I would get the Latest month for the current year.
Any ideas how this can be done.
Thanks
Mack
______________________________________
"Everytime I close the door on reality, it comes in through the windows." - Jennifer Yane
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Re: CYTD,PYTD,PYE

Post by ogmios »

Give us some input data and what you want as result, it will make things clearer to us.

Normally I never use sort and aggregator stages, I prefer to load data to a database and do any sorting aggregating in the database (via an input SQL executed via DataStage).

Ogmios
In theory there's no difference between theory and practice. In practice there is.
dsrules
Premium Member
Premium Member
Posts: 76
Joined: Sun Nov 28, 2004 8:56 pm

Post by dsrules »

For the Data below I would want the aggregation for all the months of 2004, and all the months of 2003, and till the max month of 2004 in 2003

Code: Select all

Year,Month,"Code","Code_Name",Dollar
2004,09,"AA","Code1",25.50
2004,08,"AA","Code1",25.50
2003,10,"AA","Code1",25.50
2004,07,"AA","Code1",25.50
2003,11,"AA","Code1",25.50
2003,05,"AA","Code1",25.50
2003,04,"AA","Code1",25.50
2003,12,"AA","Code1",25.75
The output would be

Code: Select all

Year,Month,"Code","Code_Name",Dollar
2004,09,"AA","Code1",75.50
2003,09,"AA","Code1",51.00
2003,12,"AA","Code1",127.50

Whats the best way of doing this.
Thanks
Mack
______________________________________
"Everytime I close the door on reality, it comes in through the windows." - Jennifer Yane
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Post by ogmios »

In SQL only this would be something as (assuming A is your table):

Code: Select all

select year, month, code, code_name, sum(Dollar)
from A
where year * 100 + month >=  
              select (integer(substr(yrmnt, 1, 4)) - 1) * 100 + 
                              integer(substr(yrmnt, 4, 4)) as yrmth
              from (
                        select max(digits(year) || digits(month)) as yrmth
                        from A
              )
group by year, month, code, code_name;
or

Code: Select all

select year, month, code, code_name, sum(Dollar)
from A
where year * 100 + month >=  
                        (select max(year * 100 + month) - 100 as yrmth
                        from A)
              
group by year, month, code, code_name;

Something as:
- Get the maximum combination of year/month
- Subtract 1 of the year
- Then do a grouping of all rows which are higher or equal than this year/month

Or something in the style of your original idea:
- Use a sort stage to sort the data descending
- Put the first (highest) year/month in a stage variable via @INROWNUM
- Skip all rows which are smaller than "maxyear-1"/maxmonth
- Probably sort again to put the data in the right order for the aggregator.
- And push this through an aggregator

For large amounts of data (probably around 100.000) the SQL version is much faster as a DataStage solution.

Ogmios
In theory there's no difference between theory and practice. In practice there is.
dsrules
Premium Member
Premium Member
Posts: 76
Joined: Sun Nov 28, 2004 8:56 pm

Post by dsrules »

Thanks a lot. That was what I wanted but was contemplating various solutions. The SQL is definetely an option I would be looking into now.
Mack
______________________________________
"Everytime I close the door on reality, it comes in through the windows." - Jennifer Yane
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Post by ogmios »

One thing important with the SQL, you have to be sure that code and code_name always are "correct".

You cannot have rows with code/code_name "AA/Code1" and "AA/Code2" as this will screw up the groupings. If the code is AA the code_name always has to be the same.

The reason why the SQL version is faster than a DataStage solution on large volumes is IO. In the DataStage version you would pull all rows from the database to DataStage, process them there and then write a summary into a database. With the SQL version only the summary data gets moved to DataStage and written. The extra IO in the DataStage solutions makes it slow with large volumes.

Ogmios
In theory there's no difference between theory and practice. In practice there is.
dsrules
Premium Member
Premium Member
Posts: 76
Joined: Sun Nov 28, 2004 8:56 pm

Post by dsrules »

In my case, the values for Code/Code _Name can be "AA"/"Code1" and also "AA"/"Code2" so basically they would be treated as a different row and the aggregations would be done on them as being distinct values.
Actaully the grouping would be on Year, Code and Code_Name.
______________________________________
"Everytime I close the door on reality, it comes in through the windows." - Jennifer Yane
Post Reply