APT_CONFIG_FILE

arun_im4u · Post by **arun_im4u** » Fri Sep 18, 2009 12:10 pm

Hello,

One of our jobs failed during a join because the scratch space defined on one of the nodes got filled up. The rest of the nodes did not even come close to filling up. If I define additional file systems for scratch in that failed node will it scale across the rest of the file systems and hence avoid this one node to be filled up.

Ex:
Current Configuration:

node "n1" {
pools ""
fastname "fastone"
resource scratchdisk "/fs1/ds/scratch" {}
resource disk "/fs1/ds/disk" {}

Proposed:

node "n1" {
pools ""
fastname "fastone"
resource scratchdisk "/fs1/ds/scratch" {}
resource scratchdisk "/fs2/ds/scratch" {}
resource scratchdisk "/fs3/ds/scratch" {}
resource scratchdisk "/fs4/ds/scratch" {}
resource disk "/fs1/ds/disk" {}
resource disk "/fs2/ds/disk" {}
resource disk "/fs3/ds/disk" {}
resource disk "/fs4/ds/disk" {}

Thanks

ray.wurlod · Post by **ray.wurlod** » Fri Sep 18, 2009 3:28 pm

One node will still fill, but no longer catastrophically until all four have filled.

sjfearnside · Post by **sjfearnside** » Mon Sep 21, 2009 8:53 am

As I understand it, the strategy you propose does not appear to take full advantage of your parallel engine. You may want to look at your partitioning strategy to improve the distribution of your data more evenly among your nodes defined by the configuration file. This may resolve your space problem on your scratch disk allocation.