Page 1 of 1

ETL job aborts

Posted: Thu Mar 03, 2005 6:44 am
by varshanswamy
node_node1: Player 5 terminated unexpectedly.
main_program: Unexpected termination by Unix signal 14(SIGALRM)
main_program: Step execution finished with status = FAILED.

I wanted to know what could be the reason for the failure of these jobs

Posted: Thu Mar 03, 2005 7:00 am
by ArndW
Hello varshanswamy,

you will need to supply quite a lot more information so that someone here can help you! This is like calling up your support line and saying "my program won't run" and expecting to get help.

Overall questions - ... actually, so many I don't really know where I should start asking. But I'll give it a shot:

1. Does the job abort before processing anything?
2. Have you turned on or used any of the tracing or debugging options?
3. Are you writing to tables? Are you reading from DB tables?
4. Have you looked into the director log file details?
5. Have you looked into the UNIX logfiles?

Any information at all that might narrow down a cause will be helpful (firstly to yourself and then perhaps for those here in the forum). Put yourself in someone's shoes who is reading your message... what questions would you ask - that is the type of information that even the experts need to help solve a problem.

Posted: Thu Mar 03, 2005 2:13 pm
by ray.wurlod
Can you identify the processing node associated with Player 5? (The metaphor here is an orchestra; there is a conductor, section leaders and players.) If you can, then you can narrow your search to configuration and other factors associated with that processing node, such as whether all required components are installed there.

Unfortunately, SIGALARM can be anything - it's defined as "alarm clock timeout", which only tells you that some kind of timeout - perhaps waiting for some event - has occurred.

Posted: Thu Mar 03, 2005 4:09 pm
by s1kaasam
Hi
You havent specified what unix OS you are using.If it is Sun OS then there is a patch that has to be installed on the server and included in dsenv file LD_LIBRARY_PATH.But if you confirm that it is a sun OS then I could post the reply for you to check if the server has the patch required and then you could include that in your dsenv file.This has been a bug with Sun OS.

Posted: Thu Apr 21, 2005 9:12 am
by urahul
Hello,
I am using the Sun OS (SunOS edwps405 5.8 Generic_117350-21 sun4u sparc SUNW,Sun-Fire-15000)
and facing the same problems.
Could you please advice me on the patch you mentioned and the relevent details if any.

Thanks very much
Rahul


s1kaasam wrote:Hi
You havent specified what unix OS you are using.If it is Sun OS then there is a patch that has to be installed on the server and included in dsenv file LD_LIBRARY_PATH.But if you confirm that it is a sun OS then I could post the reply for you to check if the server has the patch required and then you could include that in your dsenv file.This has been a bug with Sun OS.

Posted: Thu Apr 21, 2005 3:16 pm
by s1kaasam
Hi
The patch that you should be adding is to the desnv file on the Sun OS box.Log in as datastage admin and view your dsenv file.In the file you would have LD_LIBRARY_PATH,to it you need to add '/usr/lib/lwp' for light weight processes.

But you need to make sure that /usr/lib/lwp has been installed or exists on your box.

After you have added you need to stop and start datastage services so that it picks it up.Hope you know the steps to stop and start datastage services.

Any questions please let me know.

Posted: Wed Jul 13, 2005 7:31 am
by Amos.Rosmarin
Hi,

I'm facing the same problem with signal 14

adding /usr/lib/lwp to the ld_lib_path did not help.
but I think it's good to have it in the lib path anyway.

Waiting for few minutes and then re-runnig the job usually works.

More ideas will be appreciated

Cheers,
Amos

Posted: Wed Jul 13, 2005 9:32 am
by roy
We are currently trying to see if adding swap space helps.
I'll post if the issue reoccurs.