Help on aggregation logic

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pdntsap
Premium Member
Premium Member
Posts: 107
Joined: Mon Jul 04, 2011 5:38 pm

Help on aggregation logic

Post by pdntsap »

Hello,

We have a requirement where we need to group the input data based on say 20 columns. Let the columns be C1, C2, C3...C20. After grouping, some column values within each group need to compared with the last value for Column 20 in that goup. An aggregator stage can be used for grouping, I belive, but I am really lost in how I can retain the value of Column 20 of the last record in each group and move it forward for further processing. Any help will be greatly appreciated.

Thanks.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

If you are grouping on all twenty columns, then won't each "group" have a single value for each column, including Column 20? Meaning there really won't be a last of several values in that group. Or by "last" do you mean previous as in the value of Column 20 from the previous group? If so, then it seems like stage variables in a following transformer could be leveraged for that task.
-craig

"You can never have too many knives" -- Logan Nine Fingers
pdntsap
Premium Member
Premium Member
Posts: 107
Joined: Mon Jul 04, 2011 5:38 pm

Post by pdntsap »

Yes Craig. Grouping would produce just produce one record for each group. So, I was joining(join keys were the group keys) the output of the aggregator with the original data so that I get the original data (grouped according to the 20 colums) and the count of records in each group.

Going back and looking at the requirements, I may need to rethink my logic. I need the sort the data based on twenty columns. I need to do some processing on the sorted rows and delete the last record from each group (if some columns in the last record satisfy some requirements). Any suggestions in implementing the above logic?

One method might be sorting and then grouping on the 20 keys to get a count of the number of records in each group. Then join the output of the aggregator with the original data to get all the rows of the original data and the count of the number of rows in each group. Use a transformer stage and make use of stage variable to keep the count of rows and when this stage variable equals the row count for each group, delete the row, reset the stage varaible and repeat the logic for the other groups. I have not yet implemented this logic but am I in the right direction?

Thanks.
Post Reply