Page 1 of 1

Configuration File not getting generated

Posted: Wed Feb 04, 2015 7:45 am
by skathaitrooney
Hey guys,

In our parallel job, we are using LSF as the resource manager.
When we ran our job the dynamic config file is not getting generated.
From the job log we can see the value of APT_CONFIG_FILE= (blank).
Hence our job is aborted. This just happened randomly, our job ran fine when we re-executed it as config file was generated this time.


ERROR:
<Dynamic_grid.sh>Error: Job error, submit error. invoking GenConfig... Please check Java is available on the compute nodes

Went through the IBM technote:http://www-01.ibm.com/support/docview.w ... wg21673569

According to this technote a javacore file should get generated in our projects directory containing the JVM threads info running at that time.
Sadly it did not get generated in our projects directory or elsewhere in our complete environment.

Can anyone help as to locating the javacore file pertaining to this issue so that we can debug more?

Posted: Wed Feb 04, 2015 9:38 am
by PaulVL
Couple of things first:

Ensure that you have LSF configured properly and that the user id that you are using can submit jobs to the grid. Try the test.sh script in $GRIDHOME.

Make sure that the APT_GRID_xxx values are properly set in your project.
APT_GRID_CONFIG, APT_GRID_ENABLE, APT_GRID_QUEUE, APT_GRID_COMPUTENODE, APT_GRID_PARTITIONS (the rest can be blank if you wish)

Make sure that the default template for your grid apt file is accessible by the user id running the job.

Make sure that your JOBDIR in grid global values (or project values ) file is present on head node and compute node.

Make sure the user id you are using can write to that path (sticky bit the path for dstage group).

Engine binaries path needs to be on compute nodes, projects as well, etc...



My guess.... you don't have JOBDIR set correctly for the user id you are using.