Hello,
One of our jobs failed during a join because the scratch space defined on one of the nodes got filled up. The rest of the nodes did not even come close to filling up. If I define additional file systems for scratch in that failed node will it scale across the rest of the file systems and hence avoid this one node to be filled up.
Ex:
Current Configuration:
node "n1" {
pools ""
fastname "fastone"
resource scratchdisk "/fs1/ds/scratch" {}
resource disk "/fs1/ds/disk" {}
Proposed:
node "n1" {
pools ""
fastname "fastone"
resource scratchdisk "/fs1/ds/scratch" {}
resource scratchdisk "/fs2/ds/scratch" {}
resource scratchdisk "/fs3/ds/scratch" {}
resource scratchdisk "/fs4/ds/scratch" {}
resource disk "/fs1/ds/disk" {}
resource disk "/fs2/ds/disk" {}
resource disk "/fs3/ds/disk" {}
resource disk "/fs4/ds/disk" {}
Thanks
APT_CONFIG_FILE
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Premium Member
- Posts: 278
- Joined: Wed Oct 03, 2007 8:45 am
As I understand it, the strategy you propose does not appear to take full advantage of your parallel engine. You may want to look at your partitioning strategy to improve the distribution of your data more evenly among your nodes defined by the configuration file. This may resolve your space problem on your scratch disk allocation.