Page 1 of 1

sort stage

Posted: Wed Sep 22, 2004 7:49 am
by sonia jacob
Hi,

I use SORT STAGE to sort the sequential file that is generated by my job. Just this stage itself takes 1 to 2 minutes. I did not get any improvement in performance even after giving a temporary directory (not sure whether it does improve performance in the first place).

Any suggestions.

thanks,
Sonia

Re: sort stage

Posted: Wed Sep 22, 2004 8:18 am
by raju_chvr
Sonia,

What else are you doing in this job and how many number of rows are being processed by this stage for 1-2 minutes of time you mentioned ?

Posted: Wed Sep 22, 2004 8:31 am
by sonia jacob
i have a simple transform and a sort in the container. The log shows that the processing happens pretty fast till the SORT. I process a couple of thousands of records.

Posted: Wed Sep 22, 2004 8:56 am
by raju_chvr
On how many columns are you sorting the data in Sort specifications. This does matter.


Did you change any default settings in Sort stage like: "Maximum number of rows in Virtual Memory" ?

Posted: Wed Sep 22, 2004 9:14 am
by sonia jacob
3 sort columns and everythig else have default values

Re: sort stage

Posted: Wed Sep 22, 2004 9:33 am
by chulett
sonia jacob wrote:I use SORT STAGE to sort the sequential file that is generated by my job. Just this stage itself takes 1 to 2 minutes. I did not get any improvement in performance even after giving a temporary directory (not sure whether it does improve performance in the first place).
The temporary directory isn't meant to improve performance, it just tells the stage where to put the temporary files it generates. Important note - if you leave that field blank, it will default to the current project. This may be less of an issue on Windows, but on UNIX I always make sure our jobs that use the Sort stage set this to "/tmp" or something else outside of the project. I've seen too many large sorts or runaway jobs fill up the disk with temp files and if you fill up your Project partition, that is what's known as a Very Bad Thing. :wink:

On the performance side, the Sort stage isn't the speediest, so one or two minutes ain't all that bad. The kind of things that would improve it are faster disks or faster processors or possibly more memory, other than that... :?

Posted: Wed Sep 22, 2004 9:40 am
by tonystark622
One other thing, and it's been recommended in this forum before. The UNIX sort command is a _LOT_ faster than the sort stage, especially with lots of records. If you have more than a few thousand rows, it might be faster to write the data to a seq file and sort it with the UNIX sort command.

Good Luck,
Tony

Posted: Wed Sep 22, 2004 9:53 am
by chulett
Good advice, but Sonia is on an NT server. :wink:

Posted: Wed Sep 22, 2004 10:20 am
by tonystark622
'S what I get for replying in a hurry. Yeah, what I meant was the Windows SORT command (open a command window and type 'help sort'). (I have no idea whether the Windows sort command would be faster than the Sort stage or not.... I was just trying to save face :oops: )

tony

Posted: Wed Sep 22, 2004 3:27 pm
by ariear
DOS sort is a lot faster than DS Sort stage - not just that - DS sort failes ! when sorting huge Data (like 4GB).
...But DOS sort can't sort case-sensitive so if you're using DS aggregator stage after DOS sort and mark the sort order it'll fail on when encountering Unsorted charachters. Try using GNU sort or buy Sync sort

Posted: Sun Sep 26, 2004 11:29 am
by roy
Hi,
If 1-2 minutes is out of the question, performance wise
I just wanted to point out that there is the SFU (Services For Unix) and probably others like it, which you can install on a windows platform that makes unix flavour OS command available in windows platforms hence enabling you to use unix os commands :)
this an alternative to the expensive co-sort and the likes in case they are beyond your budjet.

IHTH,