Posted: Mon Mar 12, 2007 8:32 pm
There are many suggestion given with different functionalities. ![Rolling Eyes :roll:](./images/smilies/icon_rolleyes.gif)
![Rolling Eyes :roll:](./images/smilies/icon_rolleyes.gif)
Absolutely I want the first 10% values and not the first 10 rows.kumar_s wrote:Bit confused again.
If you file has 100 rows with column a,b,c, do you expect first 10rows or you expect first 10 values of a,b,c. Which in case may not be the same as first 10 rows.
How can the values be lined in a single row?DSguru2B wrote:You need to sort these values ascending, line them up in a single row, get the max value (last value), calculate the 10 percent breaking point, filter all the records equal to or less than that breaking point.pravin1581 wrote: It is top 10% of a particular column in a file.
If we are talking about Top 10% on values then you have to follow the method I gave above. If its Top 10% positional values, then all you need to do is get the number of records, sort again and get the 10% of the number of records. Get all records before that breaking point.
How can we get the record count before running the job . My requirement is the row count after sorting the data. The @OUTROWNUM function is not returning the row count.chulett wrote:Get the record count before the job runs. For UNIX or MKS 'wc -l' was suggested earlier so I didn't repost that. Pass that number in as a job parameter. After you sort, use a constraint where @OUTROWNUM <= 10% of the input rows.pravin1581 wrote:I am facing problems in record count also as DSLINKINFo always returns 0 . How can we restrict the data to top 10 % .chulett wrote:In Server get the record count and Sort the data descending on the value in question. Constrain your output to the 'top 10%' of the records. ...
Quoted post it self has the required information. 'wc -l' the suggested approach to find the row count before the run.pravin1581 wrote:How can we get the record count before running the job . My requirement is the row count after sorting the data. The @OUTROWNUM function is not returning the row count.chulett wrote:Get the record count before the job runs. For UNIX or MKS 'wc -l' was suggested earlier so I didn't repost that. Pass that number in as a job parameter. After you sort, use a constraint where @OUTROWNUM <= 10% of the input rows.pravin1581 wrote: I am facing problems in record count also as DSLINKINFo always returns 0 . How can we restrict the data to top 10 % .
Suppose the values are 2,4,6,8,10,10,11,11,12,13. There are 8 distinct values and I want to extract out top 10% values. The reqd value is 13.kumar_s wrote:Quoted post it self has the required information. 'wc -l' the suggested approach to find the row count before the run.pravin1581 wrote:How can we get the record count before running the job . My requirement is the row count after sorting the data. The @OUTROWNUM function is not returning the row count.chulett wrote: Get the record count before the job runs. For UNIX or MKS 'wc -l' was suggested earlier so I didn't repost that. Pass that number in as a job parameter. After you sort, use a constraint where @OUTROWNUM <= 10% of the input rows.
Could you pleas post some input data sample and what you expect from that.
We want only the distinct top values. So the data needs to be sorted and distinct.kumar_s wrote:In case if you input is 2,4,6,8,10,10,11,11,12,13,2,4,6,8,10,10,11,11,12,13 then you need to get 13,13 or 12,13?
i.e., only distinct values or just the top values found in data.
Do wc -l of you file in a ExecuteCommand activity (which will give you total record count of the file), and pass the Command.Output of the Execute.Command Activity as parameter to the job.
Read the file, and sort it. In transformer, restrict the rows using @INROWNUM <= Input.paramter/10.
But this logic will vary based on your reply to this.
How can we run the SQL on a file ?DSguru2B wrote:Or load it in a hashed file, the duplicates will be eliminated by itself. Use the sql to get the particular column in sorted format, then do your count and top 10 percent calculation.
Or simply load your file into a 'work' table in your database of choice.pravin1581 wrote:How can we run the SQL on a file ?