sort stage

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
sonia jacob
Participant
Posts: 122
Joined: Mon Jul 05, 2004 1:33 pm
Location: MA

sort stage

Post by sonia jacob »

Hi,

I use SORT STAGE to sort the sequential file that is generated by my job. Just this stage itself takes 1 to 2 minutes. I did not get any improvement in performance even after giving a temporary directory (not sure whether it does improve performance in the first place).

Any suggestions.

thanks,
Sonia
raju_chvr
Premium Member
Premium Member
Posts: 165
Joined: Sat Sep 27, 2003 9:19 am
Location: USA

Re: sort stage

Post by raju_chvr »

Sonia,

What else are you doing in this job and how many number of rows are being processed by this stage for 1-2 minutes of time you mentioned ?
sonia jacob
Participant
Posts: 122
Joined: Mon Jul 05, 2004 1:33 pm
Location: MA

Post by sonia jacob »

i have a simple transform and a sort in the container. The log shows that the processing happens pretty fast till the SORT. I process a couple of thousands of records.
raju_chvr
Premium Member
Premium Member
Posts: 165
Joined: Sat Sep 27, 2003 9:19 am
Location: USA

Post by raju_chvr »

On how many columns are you sorting the data in Sort specifications. This does matter.


Did you change any default settings in Sort stage like: "Maximum number of rows in Virtual Memory" ?
sonia jacob
Participant
Posts: 122
Joined: Mon Jul 05, 2004 1:33 pm
Location: MA

Post by sonia jacob »

3 sort columns and everythig else have default values
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: sort stage

Post by chulett »

sonia jacob wrote:I use SORT STAGE to sort the sequential file that is generated by my job. Just this stage itself takes 1 to 2 minutes. I did not get any improvement in performance even after giving a temporary directory (not sure whether it does improve performance in the first place).
The temporary directory isn't meant to improve performance, it just tells the stage where to put the temporary files it generates. Important note - if you leave that field blank, it will default to the current project. This may be less of an issue on Windows, but on UNIX I always make sure our jobs that use the Sort stage set this to "/tmp" or something else outside of the project. I've seen too many large sorts or runaway jobs fill up the disk with temp files and if you fill up your Project partition, that is what's known as a Very Bad Thing. :wink:

On the performance side, the Sort stage isn't the speediest, so one or two minutes ain't all that bad. The kind of things that would improve it are faster disks or faster processors or possibly more memory, other than that... :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
tonystark622
Premium Member
Premium Member
Posts: 483
Joined: Thu Jun 12, 2003 4:47 pm
Location: St. Louis, Missouri USA

Post by tonystark622 »

One other thing, and it's been recommended in this forum before. The UNIX sort command is a _LOT_ faster than the sort stage, especially with lots of records. If you have more than a few thousand rows, it might be faster to write the data to a seq file and sort it with the UNIX sort command.

Good Luck,
Tony
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Good advice, but Sonia is on an NT server. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
tonystark622
Premium Member
Premium Member
Posts: 483
Joined: Thu Jun 12, 2003 4:47 pm
Location: St. Louis, Missouri USA

Post by tonystark622 »

'S what I get for replying in a hurry. Yeah, what I meant was the Windows SORT command (open a command window and type 'help sort'). (I have no idea whether the Windows sort command would be faster than the Sort stage or not.... I was just trying to save face :oops: )

tony
ariear
Participant
Posts: 237
Joined: Thu Dec 26, 2002 2:19 pm

Post by ariear »

DOS sort is a lot faster than DS Sort stage - not just that - DS sort failes ! when sorting huge Data (like 4GB).
...But DOS sort can't sort case-sensitive so if you're using DS aggregator stage after DOS sort and mark the sort order it'll fail on when encountering Unsorted charachters. Try using GNU sort or buy Sync sort
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
If 1-2 minutes is out of the question, performance wise
I just wanted to point out that there is the SFU (Services For Unix) and probably others like it, which you can install on a windows platform that makes unix flavour OS command available in windows platforms hence enabling you to use unix os commands :)
this an alternative to the expensive co-sort and the likes in case they are beyond your budjet.

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
Post Reply