Fatal : node_node1: Player 2 terminated unexpectedly.

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

mctny
Charter Member
Charter Member
Posts: 166
Joined: Thu Feb 02, 2006 6:55 am

Fatal : node_node1: Player 2 terminated unexpectedly.

Post by mctny »

Hello everyone,

I was wondering if anyone knows about this error I got while running scheduled job at night, the error is not descriptive it says
"node_node1: Player 2 terminated unexpectedly." the next error is
main_program: Unexpected termination by Unix signal 9(SIGKILL)

any comments?
Thanks,
Chad
__________________________________________________________________
"There are three kinds of people in this world; Ones who know how to count and the others who don't know how to count !"
ashwin141
Participant
Posts: 95
Joined: Wed Aug 24, 2005 2:26 am
Location: London, UK

Post by ashwin141 »

Hi Cetin

I had faced something similar. Though I am not sure about the exact reason for this error. It may have something to do with the disk (resource and scratch) space.

Regards
Ashwin
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

You got that error because someone tried to kill the unix processes with kill -9. Never try to kill a process using kill -9. It's not a good practise.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Was it a DB2 database that you were loading to?
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

Is there anything we need to look out if it is a DB2 database. Just Curious :?:
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Perhaps DSguru2B might have got the same error while loading any table..

mctny - Waht is the load of your server when you got this error?
What is the volume of the data you are working on?
I hope this is ramdon, am I right?
Is there any one who had issued KILL -9 command from unix?
What is the value of APT_MONITOR_SIZE and APT_MONITOR_TIME in your adminstrator Environmental settings?
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
nivaskvs
Participant
Posts: 4
Joined: Sat Jan 22, 2005 12:19 pm
Contact:

Post by nivaskvs »

Becuase its the PX version running on AIX unix , yuo are gitting this error, Thier is a Patch for thix issue to fix it permenently. you can contact Ascential Support team to get this patch........ A quick fix would be a restart for the job.
mctny
Charter Member
Charter Member
Posts: 166
Joined: Thu Feb 02, 2006 6:55 am

Post by mctny »

nivaskvs wrote:Becuase its the PX version running on AIX unix , yuo are gitting this error, Thier is a Patch for thix issue to fix it permenently. you can contact Ascential Support team to get this patch........ A quick fix would be a restart for the job.
Thank you guys for all the comments,
I don't think anyone can issue killl 9 command, it is a nightly running job. no one would have access to the unix boxes at night time.
yes that job runs successfully every night,
I tend to agree that it could be from scratch disck space, but I am very new I don't know how to fix that problem. I don't know how to check those APT parameters values either,
answer to other questions
it is an oracle database we are trying to load data. it is not DB2. we are using Datastage Enterprise edition 7.5.1.A

thanks again
Thanks,
Chad
__________________________________________________________________
"There are three kinds of people in this world; Ones who know how to count and the others who don't know how to count !"
yakiku
Premium Member
Premium Member
Posts: 23
Joined: Thu May 13, 2004 7:14 am

Post by yakiku »

Hi nivaskvs:

Do you have any reference name for this patch? We are seeing the same problem with our PX jobs. Job would fail without any apparent reason but aborts with this error:

main_program: Unexpected termination by Unix signal 9(SIGKILL).

Upon rerunning the job, it executes fine.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

yakiku wrote:Hi nivaskvs:

Do you have any reference name for this patch? We are seeing the same problem with our PX jobs. Job would fail without any apparent reason but aborts with this error:

main_program: Unexpected termination by Unix signal 9(SIGKILL).

Upon rerunning the job, it executes fine.
What is the load of the server while you getting this error. Have you tried the option suggested with MONITOR?
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
mali_aydin
Charter Member
Charter Member
Posts: 6
Joined: Fri May 05, 2006 6:57 am

Post by mali_aydin »

Hi Cetin,
Empty files or null values is one of the cause of this problem. You have to use extra controls that handles null data or empty files.



MAli


yakiku wrote:Hi nivaskvs:

Do you have any reference name for this patch? We are seeing the same problem with our PX jobs. Job would fail without any apparent reason but aborts with this error:

main_program: Unexpected termination by Unix signal 9(SIGKILL).

Upon rerunning the job, it executes fine.
yakiku
Premium Member
Premium Member
Posts: 23
Joined: Thu May 13, 2004 7:14 am

Post by yakiku »

kumar: system was at its lowest level of load at the time of the error.

MAli: The same is run a minute after the failure without chaning code/data/params, it ran fine. Does not seem logical..
samba
Premium Member
Premium Member
Posts: 62
Joined: Wed Dec 07, 2005 11:44 am

Post by samba »

I am also faced with same problem before...
same exact problem i got couple of months back.
we increase the buffer size.(i dont know how to increase the buffer size)
and after that we never faced that problem.

Thanks
samba
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Check in Adminstrator for Buffer settings for project level setting.
yakiku - How many jobs were parallely been called? How many stages in each jobs?
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
yakiku
Premium Member
Premium Member
Posts: 23
Joined: Thu May 13, 2004 7:14 am

Post by yakiku »

There was only one job running at the time of this error and there were total 12 stages in the job ( Sequential files, Filter, Join, Funnel, Lookup, Transformers and TD Api.)

These are the buffering variables at project level:

APT_BUFFERING_POLICY Automatic buffering
APT_BUFFER_DISK_WRITE_INCREMENT 1048576
APT_BUFFER_FREE_RUN 0.5
APT_BUFFER_MAXIMUM_MEMORY 3145728
APT_BUFFER_MAXIMUM_TIMEOUT 1
Post Reply