Page 1 of 1

Maximizing resources utilized for sorting

Posted: Tue Sep 25, 2007 11:57 pm
by timsmith_s
I have reviewed several threads regarding scratch space issues; however, I was hoping that someone might summarize the best approach for maximizing resources used for sorting (tsort)?

That is, if I am in a situation where I have a very large file to sort, if I use an explicit sort stage and specify the memory, I can only go up to < 1 GB - otherwise I get a nmap error. I have also tried to specify the TSORT environmental variable, but I understand that this is essentially the same thing, but the value is then set globally, rather than at the stage level.

In the end, I am just trying to get a check list of things that I can set to get the job to complete - not necessarily fast, just complete without having to allocate large scratch space filesystems.

Re: Maximizing resources utilized for sorting

Posted: Wed Sep 26, 2007 2:18 am
by felixyong
This will be a global setting within a job if you specify it as part of Job Parameters.
$APT_TSORT_STRESS_BLOCKSIZE = [mb]

When the memory buffer is filled, sort uses temporary disk space in the following order:
Scratch disks in the $APT_CONFIG_FILE "sort" named disk pool
Scratch disks in the $APT_CONFIG_FILE default disk pool
The default directory specified by $TMPDIR
The UNIX /tmp directory

The other parameter that you can play with is BUFFER which also have a few settings before it write to disks the same as SORT listed in the same ordered.
$APT_BUFFER_MAXIMUM_MEMORY
$APT_BUFFER_FREE_RUN
$APT_BUFFER_DISK_WRITE_INCREMENT

We need to see what we're trying to achieve, using all the resources to get the best performance. However, it may not necessary be always truth.

timsmith_s wrote:I have reviewed several threads regarding scratch space issues; however, I was hoping that someone might summarize the best approach for maximizing resources used for sorting (tsort)?

That is, if I am in a situation where I have a very large file to sort, if I use an explicit sort stage and specify the memory, I can only go up to < 1 GB - otherwise I get a nmap error. I have also tried to specify the TSORT environmental variable, but I understand that this is essentially the same thing, but the value is then set globally, rather than at the stage level.

In the end, I am just trying to get a check list of things that I can set to get the job to complete - not necessarily fast, just complete without having to allocate large scratch space filesystems.

      Posted: Wed Sep 26, 2007 12:16 pm
      by timsmith_s
      Great feedback - thank you.

      I understand about the $APT_TSORT_STRESS_BLOCKSIZE. Or rather I understand its the memory setting, but is this the memory setting per node? For instance, say I have 4GB of RAM per NODE, It doesnt appear that DSEE is burning up the RAM before it starts hitting the scratch partitions - during a sort operation. Maybe this is a two part question that I would defer to another thread.

      Posted: Thu Sep 27, 2007 12:17 am
      by Prakashs
      Adjusting DataStage heap space may allow you to sort larger files.

      Posted: Tue Oct 02, 2007 9:54 am
      by timsmith_s
      How is the heap space modified? You mean the process heap space, say at the UNIX OS level?