simple question reg aggregator

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

simple question reg aggregator

Post by kumar_s »

Hi,
I could not find any simple solution for this simple task.
I just need to group on a deptno and to find a max of sal for that group.
And to fetch the correspoding record for that max(sal).
Obviously to find the max i used Aggregator (Can also be done without aggregator, But...)
Inorder to get the output from the Aggregator i need to do any one of group by function for all rest of the columns.
But i need not to....
Should i have to pass the columns that need to be Aggregated to Aggregaotor and the rest of the col to a merge to get the desired output.
May i know What is in practice......

regards
kumar
pnchowdary
Participant
Posts: 232
Joined: Sat May 07, 2005 2:49 pm
Location: USA

Post by pnchowdary »

Hi Kumar,

Contrary to your subject for this post "simple question reg aggregator". Your second half of the post is quite confusing. :wink:

Anyway, from what I have understood about your problem, here is my take on it.
I could not find any simple solution for this simple task.
I just need to group on a deptno and to find a max of sal for that group.
And to fetch the correspoding record for that max(sal).
In the aggregator stage, group by the deptno and do a max(sal) for that group. Along, with these 2 cols, also pass the key columns to the aggregator stage and just pass them through without any tranformation.

Write the output to a hashed file and then use the key columns in the hashed file with your input data and find the input record for that max(sal) criteria.
Thanks,
Naveen
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Its my turn to get confused :D
let me put this way,

ip

Code: Select all

empno   name sal   deptno
1111    aaaa  100  10
2222    bbbb  200  20
3333    cccc  300  10
5555    dddd  500  30
4444    eeee  600   20  
juz need to grouped by account number and found out max of sal
so the output should be

Code: Select all

empno   name   sal   deptno
3333      cccc    300   10
4444      eeee   600   20
5555      dddd   500   30

pass the key columns to the aggregator stage and just pass them through without any tranformation.
i dont know how to pass a col without doing any aggregation in the aggregator stage :shock:

regards
kumar
pnchowdary
Participant
Posts: 232
Joined: Sat May 07, 2005 2:49 pm
Location: USA

Post by pnchowdary »

Hi Kumar,

I take my statement back :oops: . I agree with you that you cannot pass the columns in the aggregator without applying some kind of aggregate function. I was thinking about something else in my mind, when I answered your post. Sorry for the confusion.
Thanks,
Naveen
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The best way to solve this kind of problem is to figure out how you would do it if you only had SQL.
In your case this will convince you that it can't be done in one pass. In SQL you would need a nested query.
So too in DataStage. The "nested query" populates a lookup of some kind with the maximum value and corresponding search key. The main pass compares the search key and value against the lookup containing search key and maximum value (both as keys), and imposes the rule "drop" if the lookup fails.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

yes, this is better when compared with merge stage.

regards
kumar
mpouet
Participant
Posts: 34
Joined: Mon Oct 18, 2004 10:23 am
Location: France

Post by mpouet »

Hi,

If I understand your problem, I think you have a very better way to do this.
First step : partition by "deptno"
Second step : sort by "sal" with the sort stage (descending mode)
Last step : use the stage which permit to have unique record on a key (I don't remember its name). Choose "deptno" as key and keep the first record.

That's it.
It will be very faster.
Matthieu
mpouet
Participant
Posts: 34
Joined: Mon Oct 18, 2004 10:23 am
Location: France

Post by mpouet »

Euh... the name of the stage is "remove duplicate".

Have fun.
Matthieu
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Thanx mpouet,

This is indeed a fine solution.
But this is what actually in practice in our place, i thought of finding some effiecient way. :( .
But atlast it shows i can leave as it is.


regards
kumar
Post Reply