I have to read employee data from sequenetial file. Then I need to find the 3 persons who earn the highest salary in a department of each company.(i.e. we need to group by ORGANISATION NAME,DEPARTMENT) How can i do this?
Columns in source
ORGANISATION NAME
EMP_ID
DEPARTMENT
SAL
Please note that I do not want to write this data to a temporary table and then perform a group by
HOw to group by in datastage?
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 62
- Joined: Thu Dec 28, 2006 11:54 pm
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
DataStage is not a reporting tool.
Use a reporting tool for this task.
Sure you could do it, with an Aggregator, a Sort and a Head stage (to suggest one method), but you really need to get your head around using the right tool for the right job. If you don't, "they" will be requiring you to use DataStage to generate all their reports.
Use a reporting tool for this task.
Sure you could do it, with an Aggregator, a Sort and a Head stage (to suggest one method), but you really need to get your head around using the right tool for the right job. If you don't, "they" will be requiring you to use DataStage to generate all their reports.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 62
- Joined: Thu Dec 28, 2006 11:54 pm
-
- Premium Member
- Posts: 62
- Joined: Thu Dec 28, 2006 11:54 pm
-
- Premium Member
- Posts: 79
- Joined: Thu Mar 22, 2007 4:58 pm
- Location: USA
I'm curious too what the need is for just getting the top three by LOCATION, DEPT. It does sound like reporting, which - as said above - is done better by other tools designed for that purpose.
If you must do it in DS, there are multiple ways to go about it. For instance:
Sort by LOCATION (asc), DEPT (asc), Salary (Desc) with partitioning set on LOCATION, DEPT.
Use a transformer with a stage variable that counts up, resetting on each new LOCATION, DEPT combination. Put a constraint on your output that svCOUNT <= 3.
If you must do it in DS, there are multiple ways to go about it. For instance:
Sort by LOCATION (asc), DEPT (asc), Salary (Desc) with partitioning set on LOCATION, DEPT.
Use a transformer with a stage variable that counts up, resetting on each new LOCATION, DEPT combination. Put a constraint on your output that svCOUNT <= 3.
-
- Premium Member
- Posts: 62
- Joined: Thu Dec 28, 2006 11:54 pm
Thanks jc , I did it. Actually I had faced a similar challege before where I had to generate rowid' starting from 1 for each group. At that time I had loaded the sequential file to a TEMP Oracle table. Then I generated the ROW NUMBER's by using partition by clause. Was just just curious if we can achieve the same in datastage. I dont find normal ETL requirements that challenging anymore so just trying out new challenges in free time.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
It can easily be done in DataStage; I can think of at least three ways. But, since you like a challenge, I'll leave you to research them.
Here's another challenge - make sure your strategy works properly when the parallel job is running on more than one node.
Here's another challenge - make sure your strategy works properly when the parallel job is running on more than one node.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.