Sum , Sort and Get Results

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
samit_9999
Participant
Posts: 20
Joined: Thu Oct 06, 2005 12:23 pm

Sum , Sort and Get Results

Post by samit_9999 »

Hi,

I have a flat file which has data in the following format

Dept Emp Sal
10 50 1000
10 50 2000
10 50 2000
10 20 3000
10 20 3000
10 20 4000

I am interested in picking up the employee , where sum(sal) is the highest.In the above case the expected result is

Dept Emp Sal
10 20 10000

This is how i am doing it
1) Aggregator Transformation -- Sum Sal group by Dept and Emp
Results
Dept Emp Sal
10 50 5000
10 20 10000
2) Sort Transformation - Sort the data based on sal
Results
Dept Emp Sal
10 20 10000
10 50 5000
3) Aggregator Transformation - Pick up the first record
Results
Dept Emp Sal
10 20 10000

Is there a better way of acheiving the same results .

Please help?

Sam
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That's really a job for SQL, not something DataStage ought to be doing.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
samit_9999
Participant
Posts: 20
Joined: Thu Oct 06, 2005 12:23 pm

Post by samit_9999 »

I just receive a flat file which has data in the above format...so i need to carry the above operations in datastage.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Two Aggregator stages. The first generates the sum for each Dept/Emp the second generates the max of that. Sort the input file by Dept and Emp (use a filter in the Sequential File stage) and assert this sorted order in both Aggregator stages; this will make the job finish faster.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

samit_9999 wrote:I just receive a flat file which has data in the above format...so i need to carry the above operations in datastage.
No, not really. What's stopping you from loading it into a work table in your database of choice and working it over with a SQL stick?
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

In such cases, i would love to build a hashed file and run sql's on it via the universe stage. For small amounts of data, its pretty smooth. :wink:
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Mandar_Deshpande
Participant
Posts: 12
Joined: Wed Mar 02, 2005 1:59 am

Post by Mandar_Deshpande »

If it ireal time example then how one employee can have more then 1 salary?
Last edited by Mandar_Deshpande on Sun Aug 06, 2006 11:24 pm, edited 1 time in total.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Mandar_Deshpande wrote:Hey,

I dont understand why we are using aggregator in between. what I see from example is data is at emp & dept level. then why we are agregating group by emp & dept. (as it is already aggragated)
Just sort it descending by salary & get the first row

Let me know if I am wrong.

- Mandar
It is not Max(Sal) but it is Max(Sum(Sal)), so we need it.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Post Reply