Ranking the data both from above and below

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
bapajju
Participant
Posts: 82
Joined: Wed Nov 19, 2003 11:58 pm

Ranking the data both from above and below

Post by bapajju »

Hi all,
I have to extract top 10 sales man (in terms of number of units they have sold) and bottom 10 sales man.The data is coming through a comma separated flat file. Can I get the top and bottom 10 sales man from this data file without putting the data into any temporary table???Please suggest.
girishoak
Participant
Posts: 65
Joined: Wed Oct 29, 2003 3:54 am

Post by girishoak »

Hi Bapajju,

Here is the solution, please try and let me know.
In a single job use use sequential file twise. Sort data from one file in ascending while the other in descending order. Take top 10 records from each sort stage and merge them to get the final list.
If this doesnt work, as I doubtful about getting first 10 records, consider next solution.
First sort the sequential file in ascending order on basis of no of unit sold and create a intermediate data file say temp1
Similarly sort the other file in descending order on the basis of no of unit sold and then create another intermediate data file say temp2.

At the end of job write the following command in after job sub routine call shell script.
This shell script will contain following lines

Code: Select all

head -10 temp2 > <desired data file name> // this will give top 10 
head -10 temp1 >> <desired data file name> // this will give last 10
Let me know your comments. Thanks

Girish Oak
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

You are going to have to sort the data. Without knowing how big the sequential file is, I'll take a guess that it's less than 1 million.

So, load this file into a UV/ODBC hash file.

Then, select this hash file using the UV/ODBC stage with an order-by on your numeric column and write the output to a transformer where you have a constraint of @INROWNUM <= 10 and then out to a file.

Have a second output link select this hash file using the UV/ODBC stage with an order-by descending on your numeric column and write the output to a transformer where you have a constraint of @INROWNUM <= 10 and then out to a file.

No matter what you need a temporary something, so a hash file is disposable and easy to use just like a table.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

If you decide to go for the Unix sort script then you only need to sort the data once in ascending order, you can then use the head -10 command to get the first 10 rows and the tail -10 command to get the bottom 10 rows.

The hash file approach lets you put all of the logic and code into a DataStage job which is easier to support and maintain then a Unix script. Since you are doing simple movement of data between sequential files and hash files the performance should be very good. The one thing that might make a Unix script easier is if you are going to do this type of thing on a lot of different types of files, you can then have a Unix script that receives the file name and column position and ranking row counts as passed in parameters and this one script can handle all your sort and ranking needs.
Post Reply