sort stage
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 122
- Joined: Mon Jul 05, 2004 1:33 pm
- Location: MA
sort stage
Hi,
I use SORT STAGE to sort the sequential file that is generated by my job. Just this stage itself takes 1 to 2 minutes. I did not get any improvement in performance even after giving a temporary directory (not sure whether it does improve performance in the first place).
Any suggestions.
thanks,
Sonia
I use SORT STAGE to sort the sequential file that is generated by my job. Just this stage itself takes 1 to 2 minutes. I did not get any improvement in performance even after giving a temporary directory (not sure whether it does improve performance in the first place).
Any suggestions.
thanks,
Sonia
Re: sort stage
Sonia,
What else are you doing in this job and how many number of rows are being processed by this stage for 1-2 minutes of time you mentioned ?
What else are you doing in this job and how many number of rows are being processed by this stage for 1-2 minutes of time you mentioned ?
-
- Participant
- Posts: 122
- Joined: Mon Jul 05, 2004 1:33 pm
- Location: MA
-
- Participant
- Posts: 122
- Joined: Mon Jul 05, 2004 1:33 pm
- Location: MA
Re: sort stage
The temporary directory isn't meant to improve performance, it just tells the stage where to put the temporary files it generates. Important note - if you leave that field blank, it will default to the current project. This may be less of an issue on Windows, but on UNIX I always make sure our jobs that use the Sort stage set this to "/tmp" or something else outside of the project. I've seen too many large sorts or runaway jobs fill up the disk with temp files and if you fill up your Project partition, that is what's known as a Very Bad Thing.sonia jacob wrote:I use SORT STAGE to sort the sequential file that is generated by my job. Just this stage itself takes 1 to 2 minutes. I did not get any improvement in performance even after giving a temporary directory (not sure whether it does improve performance in the first place).
On the performance side, the Sort stage isn't the speediest, so one or two minutes ain't all that bad. The kind of things that would improve it are faster disks or faster processors or possibly more memory, other than that...
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Premium Member
- Posts: 483
- Joined: Thu Jun 12, 2003 4:47 pm
- Location: St. Louis, Missouri USA
One other thing, and it's been recommended in this forum before. The UNIX sort command is a _LOT_ faster than the sort stage, especially with lots of records. If you have more than a few thousand rows, it might be faster to write the data to a seq file and sort it with the UNIX sort command.
Good Luck,
Tony
Good Luck,
Tony
-
- Premium Member
- Posts: 483
- Joined: Thu Jun 12, 2003 4:47 pm
- Location: St. Louis, Missouri USA
DOS sort is a lot faster than DS Sort stage - not just that - DS sort failes ! when sorting huge Data (like 4GB).
...But DOS sort can't sort case-sensitive so if you're using DS aggregator stage after DOS sort and mark the sort order it'll fail on when encountering Unsorted charachters. Try using GNU sort or buy Sync sort
...But DOS sort can't sort case-sensitive so if you're using DS aggregator stage after DOS sort and mark the sort order it'll fail on when encountering Unsorted charachters. Try using GNU sort or buy Sync sort
Hi,
If 1-2 minutes is out of the question, performance wise
I just wanted to point out that there is the SFU (Services For Unix) and probably others like it, which you can install on a windows platform that makes unix flavour OS command available in windows platforms hence enabling you to use unix os commands
this an alternative to the expensive co-sort and the likes in case they are beyond your budjet.
IHTH,
If 1-2 minutes is out of the question, performance wise
I just wanted to point out that there is the SFU (Services For Unix) and probably others like it, which you can install on a windows platform that makes unix flavour OS command available in windows platforms hence enabling you to use unix os commands
this an alternative to the expensive co-sort and the likes in case they are beyond your budjet.
IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org