Tsort merger aborting: Scratch space full Error

somu_june · Post by **somu_june** » Thu Sep 09, 2010 7:13 am

Hi,

I have a parallel job and the job is running fine if I don't sort the data and If I sort the data then the job is aborting with the below error

APT_CombinedOperatorController,1: Fatal Error: Tsort merger aborting: Scratch space full

I checked the space that I'm using
$ df -P /opt/IBM/InformationServer/Server/Scratch
Filesystem 512-blocks Used Available Capacity Mounted on
/dev/rx/dsk/root/IBMvol 40536164 19359184 21176980 48% /opt/IBM

and it is showing 48% available and still my job is aborting and below is my default configuration file that I'm using

main_program: APT configuration file: /opt/IBM/InformationServer/Server/Configurations/default.apt
{
node "node1"
{
fastname "etldev01"
pools ""
resource disk "/opt/IBM/InformationServer/Server/Datasets" {pools ""}
resource scratchdisk "/opt/IBM/InformationServer/Server/Scratch" {pools ""}
}
node "node2"
{
fastname "etldev01"
pools ""
resource disk "/opt/IBM/InformationServer/Server/Datasets" {pools ""}
resource scratchdisk "/opt/IBM/InformationServer/Server/Scratch" {pools ""}
}
node "node3"
{
fastname "etldev01"
pools ""
resource disk "/opt/IBM/InformationServer/Server/Datasets" {pools ""}
resource scratchdisk "/opt/IBM/InformationServer/Server/Scratch" {pools ""}
}
node "node4"
{
fastname "etldev01"
pools ""
resource disk "/opt/IBM/InformationServer/Server/Datasets" {pools ""}
resource scratchdisk "/opt/IBM/InformationServer/Server/Scratch" {pools ""}
}
}

I taught the problem is with the disk space but I can still see 48% left. Any help let me know.

Thanks,
Somaraju

chulett · Post by **chulett** » Thu Sep 09, 2010 7:32 am

You're still out of space there. You would need to monitor that while the job runs to see the problem.

somu_june · Post by **somu_june** » Thu Sep 09, 2010 8:07 am

Thanks chulett

mhester · Post by **mhester** » Thu Sep 09, 2010 3:06 pm

Craig is correct - kind of

There is space there as you can clearly see from the df command, but when your job runs the sort operation consumes what available disk space is present. When the job aborts the temp sort files are removed which makes you believe that you have space.

Also, I am not sure that your current resource disk and scratch disk location is probably the best idea. These should probably be located on a mount that is separate from the engine install.

Another thing you might want to do to prevent an abort due to space issues would be to create a configuration like the following -

Code: Select all

{ 
node "node1" 
{ 
fastname "etldev01" 
pools "" 
resource disk "/opt/Resource_Disk_1" {pools ""} 
resource disk "/opt/Resource_Disk_Overflow" {pools ""}
resource scratchdisk "/opt/Resource_Scratch_1" {pools ""} 
resource scratchdisk "/opt/Resource_Scratch_2" {pools ""} 

} 
node "node2" 
{ 
fastname "etldev01" 
pools "" 
resource disk "/opt/Resource_Disk_2" {pools ""} 
resource disk "/opt/Resource_Disk_Overflow" {pools ""}
resource scratchdisk "/opt/Resource_Scratch_1" {pools ""} 
resource scratchdisk "/opt/Resource_Scratch_2" {pools ""} 
}

The resource disk will be used in order of entry meaning that when Resource_Disk_1 fills up Resource_Disk_Overflow is there to help out.

The Resource Scratch_1 and 2 are used in a round robin manner, but since there are 2 and they should be on separate controllers you should see improved performance especially if the disk are raid0 (mirrored).

chulett · Post by **chulett** » Thu Sep 09, 2010 7:17 pm

Hey, there's no "kind of" correct there, Mr Hester. It's just plain old correct correct.

somu_june · Post by **somu_june** » Fri Sep 10, 2010 8:02 am

Thanks Chulett and Hester for helping me out and understanding about configuration file more clearly and correctly