Ranking the data both from above and below

bapajju · Post by **bapajju** » Tue Dec 02, 2003 4:15 am

Hi all,
I have to extract top 10 sales man (in terms of number of units they have sold) and bottom 10 sales man.The data is coming through a comma separated flat file. Can I get the top and bottom 10 sales man from this data file without putting the data into any temporary table???Please suggest.

girishoak · Post by **girishoak** » Tue Dec 02, 2003 4:39 am

Hi Bapajju,

Here is the solution, please try and let me know.
In a single job use use sequential file twise. Sort data from one file in ascending while the other in descending order. Take top 10 records from each sort stage and merge them to get the final list.
If this doesnt work, as I doubtful about getting first 10 records, consider next solution.
First sort the sequential file in ascending order on basis of no of unit sold and create a intermediate data file say temp1
Similarly sort the other file in descending order on the basis of no of unit sold and then create another intermediate data file say temp2.

At the end of job write the following command in after job sub routine call shell script.
This shell script will contain following lines

Code: Select all

head -10 temp2 > <desired data file name> // this will give top 10 
head -10 temp1 >> <desired data file name> // this will give last 10

Let me know your comments. Thanks

Girish Oak

kcbland · Post by **kcbland** » Tue Dec 02, 2003 1:05 pm

You are going to have to sort the data. Without knowing how big the sequential file is, I'll take a guess that it's less than 1 million.

So, load this file into a UV/ODBC hash file.

Then, select this hash file using the UV/ODBC stage with an order-by on your numeric column and write the output to a transformer where you have a constraint of @INROWNUM <= 10 and then out to a file.

Have a second output link select this hash file using the UV/ODBC stage with an order-by descending on your numeric column and write the output to a transformer where you have a constraint of @INROWNUM <= 10 and then out to a file.

No matter what you need a temporary something, so a hash file is disposable and easy to use just like a table.

vmcburney · Post by **vmcburney** » Tue Dec 02, 2003 4:14 pm

If you decide to go for the Unix sort script then you only need to sort the data once in ascending order, you can then use the head -10 command to get the first 10 rows and the tail -10 command to get the bottom 10 rows.

The hash file approach lets you put all of the logic and code into a DataStage job which is easier to support and maintain then a Unix script. Since you are doing simple movement of data between sequential files and hash files the performance should be very good. The one thing that might make a Unix script easier is if you are going to do this type of thing on a lot of different types of files, you can then have a Unix script that receives the file name and column position and ranking row counts as passed in parameters and this one script can handle all your sort and ranking needs.