Page 1 of 1

Tsort merger aborting: Scratch space full Error

Posted: Thu Sep 09, 2010 7:13 am
by somu_june
Hi,


I have a parallel job and the job is running fine if I don't sort the data and If I sort the data then the job is aborting with the below error


APT_CombinedOperatorController,1: Fatal Error: Tsort merger aborting: Scratch space full


I checked the space that I'm using
$ df -P /opt/IBM/InformationServer/Server/Scratch
Filesystem 512-blocks Used Available Capacity Mounted on
/dev/rx/dsk/root/IBMvol 40536164 19359184 21176980 48% /opt/IBM

and it is showing 48% available and still my job is aborting and below is my default configuration file that I'm using



main_program: APT configuration file: /opt/IBM/InformationServer/Server/Configurations/default.apt
{
node "node1"
{
fastname "etldev01"
pools ""
resource disk "/opt/IBM/InformationServer/Server/Datasets" {pools ""}
resource scratchdisk "/opt/IBM/InformationServer/Server/Scratch" {pools ""}
}
node "node2"
{
fastname "etldev01"
pools ""
resource disk "/opt/IBM/InformationServer/Server/Datasets" {pools ""}
resource scratchdisk "/opt/IBM/InformationServer/Server/Scratch" {pools ""}
}
node "node3"
{
fastname "etldev01"
pools ""
resource disk "/opt/IBM/InformationServer/Server/Datasets" {pools ""}
resource scratchdisk "/opt/IBM/InformationServer/Server/Scratch" {pools ""}
}
node "node4"
{
fastname "etldev01"
pools ""
resource disk "/opt/IBM/InformationServer/Server/Datasets" {pools ""}
resource scratchdisk "/opt/IBM/InformationServer/Server/Scratch" {pools ""}
}
}



I taught the problem is with the disk space but I can still see 48% left. Any help let me know.


Thanks,
Somaraju

Posted: Thu Sep 09, 2010 7:32 am
by chulett
You're still out of space there. You would need to monitor that while the job runs to see the problem.

Posted: Thu Sep 09, 2010 8:07 am
by somu_june
Thanks chulett

Posted: Thu Sep 09, 2010 3:06 pm
by mhester
Craig is correct - kind of :)

There is space there as you can clearly see from the df command, but when your job runs the sort operation consumes what available disk space is present. When the job aborts the temp sort files are removed which makes you believe that you have space.

Also, I am not sure that your current resource disk and scratch disk location is probably the best idea. These should probably be located on a mount that is separate from the engine install.

Another thing you might want to do to prevent an abort due to space issues would be to create a configuration like the following -

Code: Select all

{ 
node "node1" 
{ 
fastname "etldev01" 
pools "" 
resource disk "/opt/Resource_Disk_1" {pools ""} 
resource disk "/opt/Resource_Disk_Overflow" {pools ""}
resource scratchdisk "/opt/Resource_Scratch_1" {pools ""} 
resource scratchdisk "/opt/Resource_Scratch_2" {pools ""} 

} 
node "node2" 
{ 
fastname "etldev01" 
pools "" 
resource disk "/opt/Resource_Disk_2" {pools ""} 
resource disk "/opt/Resource_Disk_Overflow" {pools ""}
resource scratchdisk "/opt/Resource_Scratch_1" {pools ""} 
resource scratchdisk "/opt/Resource_Scratch_2" {pools ""} 
} 
The resource disk will be used in order of entry meaning that when Resource_Disk_1 fills up Resource_Disk_Overflow is there to help out.

The Resource Scratch_1 and 2 are used in a round robin manner, but since there are 2 and they should be on separate controllers you should see improved performance especially if the disk are raid0 (mirrored).

Posted: Thu Sep 09, 2010 7:17 pm
by chulett
Hey, there's no "kind of" correct there, Mr Hester. It's just plain old correct correct. :wink:

Posted: Fri Sep 10, 2010 8:02 am
by somu_june
Thanks Chulett and Hester for helping me out and understanding about configuration file more clearly and correctly