Aggregator, Restricting count for individual group

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Munish
Participant
Posts: 89
Joined: Sun Nov 19, 2006 10:34 pm

Aggregator, Restricting count for individual group

Post by Munish »

Hi There,
I am using aggregator to do the sum of values.
The data set looks like
Account_Key, Service_Key, Period_Key, Called_Num,Charge_Amt

I am aggregating by Charge_Amt

For Account_Key, Service_Key, Period_Key group there are more than 10000 Called_Num.
and there are several Service_Key for Account_Key and Period_Key combination.

I want to restrict to top 100 for Account_Key, Service_Key, Period_Key group.

What might be the best solution.

Thanks and regards,
Munish
MK
us1aslam1us
Charter Member
Charter Member
Posts: 822
Joined: Sat Sep 17, 2005 5:25 pm
Location: USA

Post by us1aslam1us »

Your statements are not clear but to my understanding you can sort in the ascending order on those keys and restrict them to 100 by giving a constraint in the transformer. But if you can pass some more information about what you are trying to achieve someone can help it.
I haven't failed, I've found 10,000 ways that don't work.
Thomas Alva Edison(1847-1931)
Munish
Participant
Posts: 89
Joined: Sun Nov 19, 2006 10:34 pm

Post by Munish »

Data before passing to AGGREGATOR
Acc_K Serv_K Period_K Called_N Amt
1 2 3 88888 10
1 2 3 88888 12
------so on-----------------------------
1 2 3 88889 2
1 2 3 88889 3
---------so on---------------------------
more than 10000 called_num for this group (1,2,3)

1 3 3 77777 10
1 3 3 77777 12
------so on-----------------------------
1 3 3 77779 2
1 3 3 77779 3
---------so on---------------------------
more than 10000 called_num for this group (1,3,3)


After Aggregation it looks like
Acc_K Serv_K Period_K Called_N Amt
1 2 3 88888 22
1 2 3 88889 5

---------so on---------------------------
more than 10000 called_num for this group (1,2,3)

1 3 3 77777 22
1 3 3 77779 5
---------so on---------------------------
more than 10000 called_num for this group (1,3,3)

WishList:

What I want to restrict Top 100 for each combination
(1,2,3), (1,3,3)

Should you need more information, please let me know.

Thanks,
Munish
MK
us1aslam1us
Charter Member
Charter Member
Posts: 822
Joined: Sat Sep 17, 2005 5:25 pm
Location: USA

Post by us1aslam1us »

You can approach in this way

1.Sort the data on Acc_K,Serv_K,Period_K and Amt field in the Ascending order.
2. In Transformer stage, group the data based on (Acc_K Serv_K Period_K Called_N) and select the first 100 rows for each set.
I haven't failed, I've found 10,000 ways that don't work.
Thomas Alva Edison(1847-1931)
Munish
Participant
Posts: 89
Joined: Sun Nov 19, 2006 10:34 pm

Post by Munish »

Good Thought.

Is there any property in Aggregator to take care of this.

Thanks
Munish
MK
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

No.

Whatever you do, you are going to require two passes through the data to do anything with ranks.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Munish
Participant
Posts: 89
Joined: Sun Nov 19, 2006 10:34 pm

Post by Munish »

Hi,
Could you please elabortate a bit further
1.Sort the data on Acc_K,Serv_K,Period_K and Amt field in the Ascending order.
2. In Transformer stage, group the data based on (Acc_K Serv_K Period_K Called_N) and select the first 100 rows for each set.
How do you do this in transformer.

Thanks
Munish
MK
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Sorting is easy. Either an OS level or sort stage would do.
Do a group comparison of present row with previous row. If same start a counter.
In the transformer, restrict counter <= 1000. Then pass it through the aggregator and do your rollup.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I understood the requirement to be to sort the aggregated data. This would involve stages in a different order; a sort (or Sort Merge collector) following the Aggregator stage and something (Head, Tail or Transformer stage perhaps) to filter the first or last 100 rows.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

The OP wants only first 100 enteries per group. This would require a counter in a transformer to constraint the output to the first 100 per group. Then pass it through the aggregator to do the rollup.
That was my understanding.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Sort the data, use KeyChange column for the counter. Use transformer (or Filter stage) contraint to avoid KeyChange < 100, and following by an aggregator stage.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

kumar_s wrote:contraint to avoid KeyChange < 100,
Someone forgot the '=' along with '<' :wink:
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Count starts with 0. :wink:
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Ok you got me. If the count starts from 0 then yes. Darn you Kumar :wink:
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Munish
Participant
Posts: 89
Joined: Sun Nov 19, 2006 10:34 pm

Post by Munish »

Hi All,
I am very near to solve this mystery.
as per the suggestions, what I plan to do
Sort>> Aggregate >> Sort (Group and respective amt (descending)) >> Another Sort (For Keychange)>> Filter stage (restrict Keychage Counter)

However, I am held up at
use KeyChange column for the counter
when I use 'Create Keychange Column' in Sort stage, it does not give me right result.
Can you please put some light how to get it running.

Thanks,
Munish
MK
Post Reply