Page 1 of 1

Unix Signal 14(SIGALRM)

Posted: Wed Feb 09, 2005 6:35 am
by RSchibi
I was running a job (which has run successfully before) & it aborted with the message 'Unexpected termination by Unix signal 14(SIGALRM)'.

Does anyone know what this means?

I re-ran the job this morning (no changes - just submitted it) and it ran to completion wihtout any problem.

Thanks!

Posted: Wed Feb 09, 2005 6:54 am
by ArndW
It could be that the alarm signal was raised by a broken pipe; that can happen if one process on the reader-writer connection aborts or doesn't respond within a given time limit. Was your machine unusually busy when the alarm was raised? Check your particular UNIX Manpages for the description of SIGALRM or alarm(2) for detailed information.

Posted: Wed Feb 09, 2005 4:14 pm
by ray.wurlod
Signals are the operating system's way to notify itself, and any executing process, that something has occurred. Some signals are innocuous, others indicate trouble.

You will read on the forum about SIGSEGV (segmentation violation - an attempt to access memory that doesn't exist or not in your space).

SIGALRM is another example. It can be caused by lots of different things; being a general purpose alarm. Talk to your UNIX Administrator to learn more. It may be something completely outside DataStage, such as a temporary power failure.

DataStage has its own signal handlers. When SIGALRM is detected, the DataStage signal handler decides "it's probably not safe to continue" and aborts the job.

Posted: Wed Feb 09, 2005 5:06 pm
by aartlett
We got these alarms (sigserv and sigalarm) when we recently remediated some existing, working jobs. The errors only occured on the dev not the prod box.

After much discussion with Ascential and looking here I found the problem was the size of the file sysem allocated to data sets. When I checked after the aborts all looked ok. When I did a rough calculation of the space required by a 1.5 GB file going through 10 transformations (heaps) I figured that could be where the problem. Changed the config to put data sets in a new spot and Bingo it worked without a hitch ... 3 weeks of development lost but hey I didn't need sleep ... pleanty of coffee in my dripolator.

Posted: Wed Feb 09, 2005 9:22 pm
by trokosz
If this is a PX job with a server stage in it, the time-based job monitor is likely the problem. There is a section in the 7.5 release notes about it. You should configure that specific job to use row-based monitoring by setting BOTH the following variables: APT_MONITOR_TIME=5 and APT_MONITOR_SIZE=10000. This will update every 10000 rows. You can change APT_MONITOR_SIZE to any integer value > 1 as necessary, don't go too small or it starts to affect performance. APT_MONITOR_TIME must always remain set to 5.

Posted: Mon Apr 25, 2005 8:24 am
by paulhill20
[quote="trokosz"]If this is a PX job with a server stage in it, the time-based job monitor is likely the problem. There is a section in the 7.5 release notes about it. You should configure that specific job to use row-based monitoring by setting BOTH the following variables: APT_MONITOR_TIME=5 and APT_MONITOR_SIZE=10000. This will update every 10000 rows. You can change APT_MONITOR_SIZE to any integer value > 1 as necessary, don't go too small or it starts to affect performance. APT_MONITOR_TIME must always remain set to 5.[/quote]

Why must APT_MONITOR_TIME always be set to 5?

Posted: Tue Apr 26, 2005 7:00 am
by legendkiller
These error may also come when there is loads of parallel processing going on the server