Page 1 of 1

Sum , Sort and Get Results

Posted: Thu Jul 20, 2006 6:15 pm
by samit_9999
Hi,

I have a flat file which has data in the following format

Dept Emp Sal
10 50 1000
10 50 2000
10 50 2000
10 20 3000
10 20 3000
10 20 4000

I am interested in picking up the employee , where sum(sal) is the highest.In the above case the expected result is

Dept Emp Sal
10 20 10000

This is how i am doing it
1) Aggregator Transformation -- Sum Sal group by Dept and Emp
Results
Dept Emp Sal
10 50 5000
10 20 10000
2) Sort Transformation - Sort the data based on sal
Results
Dept Emp Sal
10 20 10000
10 50 5000
3) Aggregator Transformation - Pick up the first record
Results
Dept Emp Sal
10 20 10000

Is there a better way of acheiving the same results .

Please help?

Sam

Posted: Thu Jul 20, 2006 6:29 pm
by ray.wurlod
That's really a job for SQL, not something DataStage ought to be doing.

Posted: Thu Jul 20, 2006 6:31 pm
by samit_9999
I just receive a flat file which has data in the above format...so i need to carry the above operations in datastage.

Posted: Thu Jul 20, 2006 6:37 pm
by ray.wurlod
Two Aggregator stages. The first generates the sum for each Dept/Emp the second generates the max of that. Sort the input file by Dept and Emp (use a filter in the Sequential File stage) and assert this sorted order in both Aggregator stages; this will make the job finish faster.

Posted: Thu Jul 20, 2006 7:03 pm
by chulett
samit_9999 wrote:I just receive a flat file which has data in the above format...so i need to carry the above operations in datastage.
No, not really. What's stopping you from loading it into a work table in your database of choice and working it over with a SQL stick?

Posted: Sat Jul 22, 2006 2:38 pm
by DSguru2B
In such cases, i would love to build a hashed file and run sql's on it via the universe stage. For small amounts of data, its pretty smooth. :wink:

Posted: Sun Aug 06, 2006 10:39 pm
by Mandar_Deshpande
If it ireal time example then how one employee can have more then 1 salary?

Posted: Sun Aug 06, 2006 11:10 pm
by kumar_s
Mandar_Deshpande wrote:Hey,

I dont understand why we are using aggregator in between. what I see from example is data is at emp & dept level. then why we are agregating group by emp & dept. (as it is already aggragated)
Just sort it descending by salary & get the first row

Let me know if I am wrong.

- Mandar
It is not Max(Sal) but it is Max(Sum(Sal)), so we need it.