Fatal Error: Fork failed: Not enough space

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sanjay
Premium Member
Premium Member
Posts: 203
Joined: Fri Apr 23, 2004 2:22 am

Fatal Error: Fork failed: Not enough space

Post by sanjay »

Hi All,

We are using DataStage EE v7.5.1a in HP-UX 11i (11.23) with Oracle 10g (10.1.0.4) on Itanium Environment for a Datawarehousing project. The machine rx7620 has 4 cpu, 4gb RAM, 4gb swap space.

We have a SEQUENCER calling two datastage jobs in parallel. Both these job reads same sequential file and transforms according to business rules.

During execution of this SEQUENCER, we are getting the following error:

"cs}}},0: Fatal Error: Fork failed: Not enough space
Failure during execution of operator logic.
node_node2: Player 15 terminated unexpectedly."

and either one of the configured parallel job (in the SEQUENCER) is aborted.

The config file "default.apt" has two nodes, with "resource diskk" (/opt/app/Ascential/DataStage/Datasets) and "scratch disk" (/opt/app/Ascential/DataStage/Scratch) configured (same location for both the nodes). 2.9GB free space available in these resource and scratch disk locations.

Ulimit for the unix user, in which we installed ds is as below:

$ ulimit -a
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) 4292870144
stack(kbytes) 131072
memory(kbytes) unlimited
coredump(blocks) 4194303

If we change the SEQUENCER to run the ds jobs one after the other (SEQUENTIAL), then, we dont get such error and the job completes successfully.

But, it would be better if we can run parallel to reduce the execution time to half (as both the jobs are reading the same source file and does transformation).

Please share with us, if you have faced any such issue, any resource to be increased like tuning kernel parameters ....

Thanks,
Srini
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
First thing I'd try is diferentiating locations of the 2 nodes in the config file, optimal configuration not on the same disk, at least not on the same directories (though files should have node id seperation)

Then you might want to check TMPDIR & TEMP env vars and see where do they point to and how big that location is, if they are not defined check your /tmp size.

Dud you get the ulimit values from running a DS job that invokes the ulimit -a command? (you should have)

Let us know what you come up with and we'll follow up.

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
s_r_chandru
Charter Member
Charter Member
Posts: 13
Joined: Tue Apr 08, 2003 9:51 pm
Location: Australia

Re: Fatal Error: Fork failed: Not enough space

Post by s_r_chandru »

Hi Srini,

Can you explain us your Job Design for both the Jobs.

As I guess that the error thrown is towards the Buffer.Also Can you check the Dump Score and let us know the OutCome of it.

-Chandru,
"You see things and Say Why? I dream of Things and Say Why Not?"-Bernaud Shaw
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

How long do the jobs run? On a Unix machine, you can use the 'df' command to show how much space is available on all of your mounted disks. Try running this WHILE your jobs are running, it may reveal where the space is getting used.

On our system, temp space, scratch space, and data space are all mounted on separate disks, so if space fills up, it is easy to determine the culprit.

Disclaimer: Obviously, if everything is getting written to the same disk, this may not be much help...
sanjay
Premium Member
Premium Member
Posts: 203
Joined: Fri Apr 23, 2004 2:22 am

Post by sanjay »

Hi All,

Thanks for your response.

We have configured 2 nodes in the configuration file (default.apt), attached herewith for your reference, specified different directories for both the nodes (for both Scratch and Datasets location). That disk has around 2.9GB free space.

----- start of default.apt ----------
{
node "node1"
{
fastname "hpsgnp1b"
pools ""
resource disk "/opt/app/Ascential/DataStage/Datasets" {pools ""}
resource scratchdisk "/opt/app/Ascential/DataStage/Scratch" {pools ""}
}
node "node2"
{
fastname "hpsgnp1b"
pools ""
resource disk "/opt/app/Ascential/DataStage/Datasets1" {pools ""}
resource scratchdisk "/opt/app/Ascential/DataStage/Scratch1" {pools ""}
}
}
----------- end of default.apt --------------

Also attached herewith a environment settings, that we captured, when the job was aborted (from datastage director).

--------- start of env settings ------------------
Environment variable settings:
_=/usr/bin/nohup
SENDMAIL_SERVER=1
DDFA=0
SNMP_MASTER_START=1
MROUTED_ARGS=
SHLIB_PATH=/opt/app/Ascential/DataStage/PXEngine/lib:/opt/app/Ascential/DataStage/branded_odbc/lib:/opt/app/Ascenti

al/DataStage/DSEngine/lib:/opt/app/Ascential/DataStage/DSEngine/uvdlls:/opt/app/Ascential/DataStage/DSEngine/java/j

re/lib/IA64W/hotspot:/opt/app/Ascential/DataStage/DSEngine/java/jre/lib/IA64W:/opt/app/oracle/product/10.1/lib:/usr

/lib:/usr/lib/Motif1.2
APT_ORCHHOME=/opt/app/Ascential/DataStage/PXEngine
XNTPD_ARGS=
PATH=/hedw/DataStage_Project/HEDW/wrapped:/hedw/DataStage_Project/HEDW/buildop:/hedw/DataStage_Project/HEDW/RT_BP42

.O:/opt/app/Ascential/DataStage/DSCAPIOp:/opt/app/Ascential/DataStage/RTIOperators:/opt/app/Ascential/DataStage/DSP

arallel:/opt/app/Ascential/DataStage/PXEngine/user_osh_wrappers:/opt/app/Ascential/DataStage/PXEngine/osh_wrappers:

/opt/app/Ascential/DataStage/DSEngine/bin:/opt/app/Ascential/DataStage/PXEngine/bin:/opt/app/oracle/product/10.1/bi

n:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:.:/opt/langtools/bin:/usr/bin/X11:/usr/contrib/bin/X11:/opt/aCC/bin
DHCPV6CLNTD_ARGS=
XNTPD=0
SNMP_TRAPDEST_START=1
SAVEPATH=/sbin
LD_PRELOAD=/opt/app/Ascential/DataStage/DSEngine/java/jre/lib/IA64W/hotspot/libjvm.so
INETD_ARGS=
MROUTED=0
SNMP_MIB2_START=1
ERASE=^H
ORACLE_SID=hedw
INIT_STATE=3
SENDMAIL_SENDONLY=0

RWHOD=0
DHCPV6D=0
DSHOME=/opt/app/Ascential/DataStage/DSEngine
SNMP_HPUNIX_START=1
ORA_NLS33=/opt/app/oracle/product/10.1/ocommon/nls/admin/data

PRE_U95=true
ODBCINI=/opt/app/Ascential/DataStage/DSEngine/.odbc.ini
HOME=/
DHCPV6SRVRD_ARGS=
SENDMAIL_RECVONLY=0
TERM=
ORACLE_HOME=/opt/app/oracle/product/10.1
PWD=/opt/app/Ascential/DataStage/DSEngine
TZ=SST-8
INETD=1
SENDMAIL_SERVER_NAME=
NTPDATE_SERVER=
UDTHOME=/opt/app/Ascential/DataStage/ud41
UDTBIN=/opt/app/Ascential/DataStage/ud41/bin
LOGNAME=etladmin
DS_USERNO=-24635
WHO=HEDW
BELL=^G
FLAVOR=-1
DSIPC_OPEN_TIMEOUT=30
APT_BUFFER_MAXIMUM_MEMORY=6145728
APT_CONFIG_FILE=/opt/app/Ascential/DataStage/Configurations/default.apt
APT_MONITOR_MINTIME=10
DS_OPERATOR_BUILDOP_DIR=buildop
DS_OPERATOR_WRAPPED_DIR=wrapped
APT_DUMP_SCORE=1
LD_LIBRARY_PATH=/opt/app/Ascential/DataStage/PXEngine/java/jre/lib/IA64W/server:/opt/app/Ascential/DataStage/PXEngi

ne/java/jre/lib/IA64W:/hedw/DataStage_Project/HEDW/RT_BP42.O:/opt/app/Ascential/DataStage/DSCAPIOp:/opt/app/Ascenti

al/DataStage/RTIOperators:/opt/app/Ascential/DataStage/DSParallel:/opt/app/Ascential/DataStage/PXEngine/user_lib:/o

pt/app/Ascential/DataStage/PXEngine/lib:/hedw/DataStage_Project/HEDW/buildop:/usr/lib:/usr/lib/hpux64:/lib
TMPDIR=/tmp
DS_ENABLE_RESERVED_CHAR_CONVERT=0
DS_TDM_PIPE_OPEN_TIMEOUT=720
DS_TDM_TRACE_SUBROUTINE_CALLS=0
APT_COMPILEOPT=+DD64 -O -c -ext -z +Z
APT_COMPILER=/opt/aCC/bin/aCC
APT_LINKER=/opt/aCC/bin/aCC
APT_LINKOPT=+DD64 -b -Wl,+s -Wl,+vnocompatwarnings
srcpassword=LDH@1:VH>93=0OEI<:
stgpassword=LDH@1:VH>93=0OEI<:
tgtpassword=LDH@1:VH>93=0OEI<:
OSH_STDOUT_MSG=1
APT_ERROR_CONFIGURATION=severity, !vseverity, !jobid, moduleid, errorIndex, timestamp, !ipaddr, !nodeplayer,

!nodename, opid, message
APT_OPERATOR_REGISTRY_PATH=/hedw/DataStage_Project/HEDW/buildop
----------- end of env settings -----------------

Please help us to resolve if any settings to be done in the environment variables / buffer ....

There is a environment variable "TMPDIR" in DS Administrator, which is pointing to /tmp and that has around 3.5gb free space.

I am quite that, we have enough free space in all mount points/file system in the unix system.

Thanks,
Srini
SPI
Participant
Posts: 5
Joined: Wed Oct 05, 2005 1:34 am
Location: Bordeaux

Re: Fatal Error: Fork failed: Not enough space

Post by SPI »

Hi,
I don't know the design of your two jobs, but i can tell you that I already had this type of message with a job which used 5 stages "join". I had not sorted the input data of these stages and I did not use the partionnement property (default). Once the data sorted and partitionned, the job proceeded perfectly.
Finally, if I were you I would not make point the TMPDIR on the directory /tmp of the machine even if it seems that space is sufficient because nobody is sur of what it occurs on this directory except if you are alone on the machine. Attention on version 7, check that the valorization of parameter TMPDIR is really taken. It's a known bug on this version.
SPI
Post Reply