Page 1 of 1

job aborted for Not enough space on node frs-diis

Posted: Tue May 11, 2010 7:12 am
by xiaying
Hi
I migrate one jobs from one project to another project, and there two projects locate on same server,for the sake of convenience. we called job before migrate as JOB1 and after migrate as JOB2. JOB2 was aborted,the information of error is:

Tfm_Separate,1: Fatal Error: APT_Communicator::shmemInitBuffers: createMapFile (/opt/IBM/InformationServer/Server/Scratch/tmp/apTvS30412230412230356f36) failed: Not enough space on node frs-diis
The most likely reasons for this are a system-wide limit on the number of mmap'ed shared memory segments or that you have specified scratch directories or a TMPDIR that is not local to this node (e. g. is an NFS file system).


Lkp_COUNTRY.Lnk_NonCountry_Sort,3: Fatal Error: Unable to initialize communication channel on frs-diis. This is typically caused by a configuration problem. Examples of typical problems include:
1) The temporary directory, identified by $TMPDIR and/or the scratch disks in your ORCHESTRATE configuration, is located on a non-local file system (e. g. mounted over NFS).
2) The temporary directory is located on a file system with insufficient space.


df
Filesystem 512-blocks Free %Used Iused %Iused Mounted on
/dev/fslv05 41943040 9064408 79% 128436 12% /opt/IBM/InformationServer
/dev/fslv04 41943040 41933800 1% 25 1% /opt/IBM/InformationServer/Server/Datasets
/dev/dsscratch 146800640 144146992 2% 34 1% /opt/IBM/InformationServer/Server/Scratch
/dev/fslv07 6291456 6134952 3% 942 1% /var/mqm
/dev/fslv08 16777216 13496472 20% 46 1% /home/db2inst1/frs11/logs
/dev/fslv09 83886080 59641720 29% 1215 1% /files
/dev/fslv06 52428800 52420048 1% 10 1% /home/db2inst1/frs11/tempspace
/dev/fslv10 16777216 13496472 20% 46 1% /home/db2inst1/frs12/logs
/dev/fslv11 52428800 52420048 1% 10 1% /home/db2inst1/frs12/tempspace


ulimit -a
core file size (blocks) 1048575
data seg size (kbytes) unlimited
file size (blocks) unlimited
max memory size (kbytes) unlimited
open files unlimited
pipe size (512 bytes) 64
stack size (kbytes) unlimited
cpu time (seconds) unlimited
max user processes 500
virtual memory (kbytes) unlimited

but, the JOB1 run normally.
if anyone can help me?

Posted: Tue May 11, 2010 7:18 am
by chulett
Gack, edit your post and get rid of the blue font, that's unreadable. :?

Posted: Tue May 11, 2010 8:44 am
by nagarjuna
Job got aborted due to lack of space . Please note that after the job aborted or completed all the scratch space related to that job will be cleaned up .

Posted: Tue May 11, 2010 9:25 am
by chulett
Right, not sure what advice you are looking for here other than either add more space, run fewer jobs at the same time or make them more efficient in their scratch space utilization, if at all possible.

Posted: Wed May 12, 2010 8:58 am
by xiaying
nagarjuna wrote:Job got aborted due to lack of space . Please note that after the job aborted or completed all the scratch space related to that job will be cleaned up .

Thanks for you replay, but I have clear up the scratch directory, and it is not work up, the job still aborted. and I sew one solution about not enough space (viewtopic.php?t=104125),
it is similar to this problem, but I think the difference between them more then similarity.

That problem is "KP,1: Could not map table file "/path/xx/lookuptable.20061006.drmm0ac (size 552711192 bytes)": Not enough space Error finalizing / saving table /path/xx/ds_temp/dynLUT143294a5578e98 "

and My problem is "APT_CombinedOperatorController(0),0: Fatal Error: APT_Communicator::shmemInitBuffers: createMapFile (/opt/IBM/InformationServer/Server/Scratch/tmp/apTvS340870340870c6f1f519) failed: Not enough space on node frs-diis"

so if anybody can help to resolve this problem? or tell me the difference between those?

Posted: Wed May 12, 2010 9:13 am
by chulett
Are you on a 64bit or 32bit server? Also, set $APT_DISABLE_COMBINATION to True and rerun to see exactly where the error message is coming from.

Posted: Sat Apr 23, 2011 2:09 pm
by D0n1117
I had the same problem with the same error with a lookup stage with 5 sequential files (4 as lookups). Plenty of scratchdisk, space was not an issue as the error indicates. All partitioning was set to Auto. I changed the partitioning to "Same" for the input and "Entire" for the lookups and now it works. Explicitly set your partitions on your input and see if that works.