Fatal : node_node1: Player 2 terminated unexpectedly.

mctny · Post by **mctny** » Thu Jun 01, 2006 7:48 am

Hello everyone,

I was wondering if anyone knows about this error I got while running scheduled job at night, the error is not descriptive it says
"node_node1: Player 2 terminated unexpectedly." the next error is
main_program: Unexpected termination by Unix signal 9(SIGKILL)

any comments?

ashwin141 · Post by **ashwin141** » Thu Jun 01, 2006 8:13 am

Hi Cetin

I had faced something similar. Though I am not sure about the exact reason for this error. It may have something to do with the disk (resource and scratch) space.

Regards
Ashwin

kris007 · Post by **kris007** » Thu Jun 01, 2006 9:00 am

You got that error because someone tried to kill the unix processes with kill -9. Never try to kill a process using kill -9. It's not a good practise.

DSguru2B · Post by **DSguru2B** » Thu Jun 01, 2006 9:27 am

Was it a DB2 database that you were loading to?

kris007 · Post by **kris007** » Thu Jun 01, 2006 9:41 am

Is there anything we need to look out if it is a DB2 database. Just Curious

kumar_s · Post by **kumar_s** » Thu Jun 01, 2006 10:44 am

Perhaps DSguru2B might have got the same error while loading any table..

mctny - Waht is the load of your server when you got this error?
What is the volume of the data you are working on?
I hope this is ramdon, am I right?
Is there any one who had issued KILL -9 command from unix?
What is the value of APT_MONITOR_SIZE and APT_MONITOR_TIME in your adminstrator Environmental settings?

nivaskvs · Post by **nivaskvs** » Thu Jun 01, 2006 10:52 am

Becuase its the PX version running on AIX unix , yuo are gitting this error, Thier is a Patch for thix issue to fix it permenently. you can contact Ascential Support team to get this patch........ A quick fix would be a restart for the job.

mctny · Post by **mctny** » Thu Jun 01, 2006 1:54 pm

nivaskvs wrote:Becuase its the PX version running on AIX unix , yuo are gitting this error, Thier is a Patch for thix issue to fix it permenently. you can contact Ascential Support team to get this patch........ A quick fix would be a restart for the job.

Thank you guys for all the comments,
I don't think anyone can issue killl 9 command, it is a nightly running job. no one would have access to the unix boxes at night time.
yes that job runs successfully every night,
I tend to agree that it could be from scratch disck space, but I am very new I don't know how to fix that problem. I don't know how to check those APT parameters values either,
answer to other questions
it is an oracle database we are trying to load data. it is not DB2. we are using Datastage Enterprise edition 7.5.1.A

thanks again

yakiku · Post by **yakiku** » Mon Aug 21, 2006 11:44 pm

Hi nivaskvs:

Do you have any reference name for this patch? We are seeing the same problem with our PX jobs. Job would fail without any apparent reason but aborts with this error:

main_program: Unexpected termination by Unix signal 9(SIGKILL).

Upon rerunning the job, it executes fine.

kumar_s · Post by **kumar_s** » Tue Aug 22, 2006 2:48 am

yakiku wrote:Hi nivaskvs:

Do you have any reference name for this patch? We are seeing the same problem with our PX jobs. Job would fail without any apparent reason but aborts with this error:

main_program: Unexpected termination by Unix signal 9(SIGKILL).

Upon rerunning the job, it executes fine.

What is the load of the server while you getting this error. Have you tried the option suggested with MONITOR?

mali_aydin · Post by **mali_aydin** » Tue Aug 22, 2006 5:00 am

Hi Cetin,
Empty files or null values is one of the cause of this problem. You have to use extra controls that handles null data or empty files.

MAli

yakiku wrote:Hi nivaskvs:

Do you have any reference name for this patch? We are seeing the same problem with our PX jobs. Job would fail without any apparent reason but aborts with this error:

main_program: Unexpected termination by Unix signal 9(SIGKILL).

Upon rerunning the job, it executes fine.

yakiku · Post by **yakiku** » Tue Aug 22, 2006 8:33 am

kumar: system was at its lowest level of load at the time of the error.

MAli: The same is run a minute after the failure without chaning code/data/params, it ran fine. Does not seem logical..

samba · Post by **samba** » Tue Aug 22, 2006 8:52 am

I am also faced with same problem before...
same exact problem i got couple of months back.
we increase the buffer size.(i dont know how to increase the buffer size)
and after that we never faced that problem.

Thanks

kumar_s · Post by **kumar_s** » Tue Aug 22, 2006 7:04 pm

Check in Adminstrator for Buffer settings for project level setting.
yakiku - How many jobs were parallely been called? How many stages in each jobs?

yakiku · Post by **yakiku** » Thu Aug 24, 2006 4:09 pm

There was only one job running at the time of this error and there were total 12 stages in the job ( Sequential files, Filter, Join, Funnel, Lookup, Transformers and TD Api.)

These are the buffering variables at project level:

APT_BUFFERING_POLICY Automatic buffering
APT_BUFFER_DISK_WRITE_INCREMENT 1048576
APT_BUFFER_FREE_RUN 0.5
APT_BUFFER_MAXIMUM_MEMORY 3145728
APT_BUFFER_MAXIMUM_TIMEOUT 1