Page 1 of 1

Jobs aborting when invoked by sequence and run on 1 node

Posted: Tue Feb 21, 2012 1:59 am
by Jayanto
Hello,

I have 5 parallel jobs, which are being invoked sequentially by a Sequence. Running these jobs independently, is not causing any problem; and they are all running fine. But when invoked by the Sequence, either one of them aborts.

Every time a different job aborts, with the error message as "ORCHESTRATE step execution terminating due to SIGINT"

This scenario is happening, if am running the Sequence on a single(1) node. Running it on default number of nodes(32), is'nt causing any issues. But again that is not the standard practice, and will reduce the job performance.

Tried searching for this error, in other threads, in DSXchange. Some said to set the environment variable's 'APT_MONITOR_SIZE' & 'APT_MONITOR_TIME'. But doing that is'nt helping.... :(

Posted: Tue Feb 21, 2012 7:25 am
by Jayanto
Any Tips....?? :( :(

Posted: Tue Feb 21, 2012 8:04 am
by Jayanto
Another update, the job aborting each time, is first giving the fatal error "Issuing abort after 50 warnings logged." But I am running the job, with NoLimits for warnings....

Also the Warnings issued above are NullHandling related. Can this be a reason ?? :?

Posted: Tue Feb 21, 2012 8:07 am
by chulett
Jayanto wrote:is first giving the fatal error "Issuing abort after 50 warnings logged." But I am running the job, with NoLimits for warnings....
Apparently not.

Why not fix the jobs so they don't log those warnings?

Posted: Tue Feb 21, 2012 10:32 am
by Jayanto
@Craig :: Removed the null handling warnings.... But still the job is aborting, with the above specified SIGINT Error.... :( Any Other leads...!!!

Another thing which I tried out, and might be helpful in leading me.... There are 4 jobs within the Sequence. Namely Job1-Job2-Job3-Job4 .

Job 1 -- Running fine both in 1 node & default(32) nodes
Job 2 -- Running fine only on default(32) nodes
Job 3 -- Running only in 1 node
Job 4 -- Running only in 1 node



Any help :?

Posted: Tue Feb 21, 2012 11:08 am
by DSguru2B
Make sure you are passing the right config file to all the jobs.

Posted: Tue Feb 21, 2012 12:03 pm
by Jayanto
@DSguru2B :: Yes I did recheck.... Am passing the same, and the correct configuration file to all the jobs.... :?

Posted: Wed Feb 22, 2012 4:52 am
by Jayanto
Hi All.... Doing a workaround for the timebeing. :(

Am running one of the Jobs in Default(32) number of nodes, and rest on 1 node. Currently they all are working fine.

But any further update, on how to handle and run them in a single node, will be extremely helpful.... :)

Posted: Wed Feb 22, 2012 8:07 am
by DSguru2B
At this point, with all the obvious reasons discarded, get in touch with IBM.

Posted: Fri Mar 23, 2012 2:16 pm
by pk7
I have discovered that if a job has too many warnings (50+??) then a signal is sent to interrupt the job. I had the same problem and once I reduced the number of error messages (null handling messages in my case) the problem went away.