Page 1 of 1

Sort in Ascending or Descending based on parameter

Posted: Wed Jan 23, 2008 5:33 am
by Raghavendra
I have to sort the data either in ascending order or descending order based on parameter value. I tried to implement it with the following approach:

In the transformer create two output links. If sorting to be done in Ascending enable the first link and pass the data to a sort stage and set sorting as Ascending in its properties. If descending, enable the second link and pass the data to a sort stage and set sorting as descending in its properties. Funnel the data and write it to output. Make the operation as sequential from sort stages.


Transformer ------------>Ascending
| Sort Stage --------------->
| Funnel --->Output
|---------------->Descending -------------->
Sort Stage

I believe there will be a better approach than this. Can anybody give me some inputs towards the optimal solution?

Posted: Wed Jan 23, 2008 8:23 am
by ray.wurlod
Please encase your design in Code tags - what you have posted is incomprehensible. Does your design work? The direction of sort, wherever it is specified, is a sub-property that does not accept a job parameter reference.

There's no need for sequential operation; why do you believe there is?

Posted: Thu Jan 24, 2008 4:27 am
by Raghavendra
The design is as follows:
ASC_IND_PARM specifies whether to sort in Ascending or Descending order. If ASC_IND_PARM =1 then the data will be sent to Ascending sort stage (Sort Order set to Ascending) and If ASC_IND_PARM <>1 the data will be sent to Descending sort stage (Sort Order set to Descending). Then both of them will be joined by a funnel stage. Here only one link will be active at a time.
Limitation in this design is that we need to set the sort stages to sequential mode and use sequence funnel which will not disturb the sort order.

Code: Select all

Transformer ---------------->Sort Stage ( Properties set to Ascending )
 |                               |
 |                               |
 |                               |
\/                               \/
Sort Stage  ------------------>Funnel Stage -------> Outputfile
( Properties 
set to Descending)

Posted: Thu Jan 24, 2008 6:36 am
by Cr.Cezon
Hello Raghavendra,

I don't know if my response is stupid,

but why you don't use a sequence and dependes on the ASC_IND_PARM the sequence lauch a job or another ( difference between jobs the asc/desc of sort)

you will win in time of execution but lose in double code.

regards,
Cristina.

Posted: Thu Jan 24, 2008 6:46 am
by Raghavendra
We cannot use sequences in our implementation. We are running these jobs in grid environment (shared nothing environment in which nodes are allocated at run time) and so as per our development guidelines we cannot use sequences.

Posted: Thu Jan 24, 2008 7:39 am
by ray.wurlod
You still haven't answered my question about why you believe you need sequential execution.

Posted: Thu Jan 24, 2008 11:47 pm
by Raghavendra
Continuous Funnel combines the records of the input data in no guaranteed order.In Sort funnel we can either sort in ascending or descending but not both ( Sort order accepts only one fixed value). And so I have used Sequence Funnel to collect the data.

If you use parallel sort stages before sequence funnel, the data will be sorted within a node but not across the nodes.

Sequence funnel will just collect the data from sorts and pass the data to output and the data will not be in sorted order.And so the sort stages execution is set sequential mode in the implementation.

Posted: Mon Jan 28, 2008 7:45 am
by Raghavendra
This design is working fine if the number of rows is less. For my case I will have maximum 20,000 rows and so I believe this implementation is ok.
For better performance (Improving the run time),I need inputs from experts here.

Posted: Mon Jan 28, 2008 9:56 pm
by ray.wurlod
If your data are key-partitioned on the sort keys (hash or modulus algorithm) every distinct value will occur on one and only on one partition, so that your entire data set is sorted no matter how many degrees of parallelism you use.