Page 1 of 1

APT_CONFIG_FILE

Posted: Fri Sep 18, 2009 12:10 pm
by arun_im4u
Hello,

One of our jobs failed during a join because the scratch space defined on one of the nodes got filled up. The rest of the nodes did not even come close to filling up. If I define additional file systems for scratch in that failed node will it scale across the rest of the file systems and hence avoid this one node to be filled up.

Ex:
Current Configuration:

node "n1" {
pools ""
fastname "fastone"
resource scratchdisk "/fs1/ds/scratch" {}
resource disk "/fs1/ds/disk" {}

Proposed:

node "n1" {
pools ""
fastname "fastone"
resource scratchdisk "/fs1/ds/scratch" {}
resource scratchdisk "/fs2/ds/scratch" {}
resource scratchdisk "/fs3/ds/scratch" {}
resource scratchdisk "/fs4/ds/scratch" {}
resource disk "/fs1/ds/disk" {}
resource disk "/fs2/ds/disk" {}
resource disk "/fs3/ds/disk" {}
resource disk "/fs4/ds/disk" {}

Thanks

Posted: Fri Sep 18, 2009 3:28 pm
by ray.wurlod
One node will still fill, but no longer catastrophically until all four have filled.

Posted: Mon Sep 21, 2009 8:53 am
by sjfearnside
As I understand it, the strategy you propose does not appear to take full advantage of your parallel engine. You may want to look at your partitioning strategy to improve the distribution of your data more evenly among your nodes defined by the configuration file. This may resolve your space problem on your scratch disk allocation.