simple question reg aggregator

kumar_s · Post by **kumar_s** » Fri Aug 05, 2005 8:54 am

Hi,
I could not find any simple solution for this simple task.
I just need to group on a deptno and to find a max of sal for that group.
And to fetch the correspoding record for that max(sal).
Obviously to find the max i used Aggregator (Can also be done without aggregator, But...)
Inorder to get the output from the Aggregator i need to do any one of group by function for all rest of the columns.
But i need not to....
Should i have to pass the columns that need to be Aggregated to Aggregaotor and the rest of the col to a merge to get the desired output.
May i know What is in practice......

regards
kumar

pnchowdary · Post by **pnchowdary** » Fri Aug 05, 2005 9:44 am

Hi Kumar,

Contrary to your subject for this post "simple question reg aggregator". Your second half of the post is quite confusing.

Anyway, from what I have understood about your problem, here is my take on it.

I could not find any simple solution for this simple task.
I just need to group on a deptno and to find a max of sal for that group.
And to fetch the correspoding record for that max(sal).

In the aggregator stage, group by the deptno and do a max(sal) for that group. Along, with these 2 cols, also pass the key columns to the aggregator stage and just pass them through without any tranformation.

Write the output to a hashed file and then use the key columns in the hashed file with your input data and find the input record for that max(sal) criteria.

kumar_s · Post by **kumar_s** » Fri Aug 05, 2005 11:15 am

Its my turn to get confused :D
let me put this way,

ip

Code: Select all

empno   name sal   deptno
1111    aaaa  100  10
2222    bbbb  200  20
3333    cccc  300  10
5555    dddd  500  30
4444    eeee  600   20

juz need to grouped by account number and found out max of sal
so the output should be

Code: Select all

empno   name   sal   deptno
3333      cccc    300   10
4444      eeee   600   20
5555      dddd   500   30

pass the key columns to the aggregator stage and just pass them through without any tranformation.

i dont know how to pass a col without doing any aggregation in the aggregator stage

regards
kumar

pnchowdary · Post by **pnchowdary** » Fri Aug 05, 2005 1:12 pm

Hi Kumar,

I take my statement back

. I agree with you that you cannot pass the columns in the aggregator without applying some kind of aggregate function. I was thinking about something else in my mind, when I answered your post. Sorry for the confusion.

ray.wurlod · Post by **ray.wurlod** » Fri Aug 05, 2005 6:53 pm

The best way to solve this kind of problem is to figure out how you would do it if you only had SQL.
In your case this will convince you that it can't be done in one pass. In SQL you would need a nested query.
So too in DataStage. The "nested query" populates a lookup of some kind with the maximum value and corresponding search key. The main pass compares the search key and value against the lookup containing search key and maximum value (both as keys), and imposes the rule "drop" if the lookup fails.

kumar_s · Post by **kumar_s** » Sat Aug 06, 2005 12:54 am

yes, this is better when compared with merge stage.

regards
kumar

mpouet · Post by **mpouet** » Sun Aug 07, 2005 3:54 pm

Hi,

If I understand your problem, I think you have a very better way to do this.
First step : partition by "deptno"
Second step : sort by "sal" with the sort stage (descending mode)
Last step : use the stage which permit to have unique record on a key (I don't remember its name). Choose "deptno" as key and keep the first record.

That's it.
It will be very faster.
Matthieu

mpouet · Post by **mpouet** » Sun Aug 07, 2005 3:56 pm

Euh... the name of the stage is "remove duplicate".

Have fun.
Matthieu

kumar_s · Post by **kumar_s** » Sun Aug 07, 2005 11:25 pm

Thanx mpouet,

This is indeed a fine solution.
But this is what actually in practice in our place, i thought of finding some effiecient way.

.
But atlast it shows i can leave as it is.

regards
kumar