I am running a job which is using around 5 server shared containers and we are in the process of upgrading from 6.0 to 7.5. This job is failing when it is run on 6 node config file but runs fine with less than 6 node config. This is on version 7.5. It is running fine with 6 nodes on version 6.0 on the same box. Could anyone give me a possible solution to this issue.
Thanks.
Job fails running on 6 node but runs fine with < 6 node c
Moderators: chulett, rschirm, roy
Netboyks,
assuming your config file is in order (i.e. if other jobs work flawlessly) then you might be overloading some system resource with all the extra processes that the extra node will trigger.
Please tell us what the error messages that you are getting are, this way we might be able to pinpoint where and why it is failing and most likely you can reconfigure so that you can work with more nodes.
BTW, how many CPUs does this system have?
assuming your config file is in order (i.e. if other jobs work flawlessly) then you might be overloading some system resource with all the extra processes that the extra node will trigger.
Please tell us what the error messages that you are getting are, this way we might be able to pinpoint where and why it is failing and most likely you can reconfigure so that you can work with more nodes.
BTW, how many CPUs does this system have?
Arnd,
Thanks for your response. It is a 8 Cpu box with 130GB of swap. The following is the error from the log
CaptureDate,4: Unable to open project <project name> - 149.
CaptureDate,4 is a server shared container.
And the job is failing even when it is the only one running. Any suggestions please.
Thanks
Netboyks
Thanks for your response. It is a 8 Cpu box with 130GB of swap. The following is the error from the log
CaptureDate,4: Unable to open project <project name> - 149.
CaptureDate,4 is a server shared container.
And the job is failing even when it is the only one running. Any suggestions please.
Thanks
Netboyks