Unix Signal 14(SIGALRM)

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
RSchibi
Participant
Posts: 8
Joined: Mon Apr 19, 2004 6:13 am

Unix Signal 14(SIGALRM)

Post by RSchibi »

I was running a job (which has run successfully before) & it aborted with the message 'Unexpected termination by Unix signal 14(SIGALRM)'.

Does anyone know what this means?

I re-ran the job this morning (no changes - just submitted it) and it ran to completion wihtout any problem.

Thanks!
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

It could be that the alarm signal was raised by a broken pipe; that can happen if one process on the reader-writer connection aborts or doesn't respond within a given time limit. Was your machine unusually busy when the alarm was raised? Check your particular UNIX Manpages for the description of SIGALRM or alarm(2) for detailed information.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Signals are the operating system's way to notify itself, and any executing process, that something has occurred. Some signals are innocuous, others indicate trouble.

You will read on the forum about SIGSEGV (segmentation violation - an attempt to access memory that doesn't exist or not in your space).

SIGALRM is another example. It can be caused by lots of different things; being a general purpose alarm. Talk to your UNIX Administrator to learn more. It may be something completely outside DataStage, such as a temporary power failure.

DataStage has its own signal handlers. When SIGALRM is detected, the DataStage signal handler decides "it's probably not safe to continue" and aborts the job.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
aartlett
Charter Member
Charter Member
Posts: 152
Joined: Fri Apr 23, 2004 6:44 pm
Location: Australia

Post by aartlett »

We got these alarms (sigserv and sigalarm) when we recently remediated some existing, working jobs. The errors only occured on the dev not the prod box.

After much discussion with Ascential and looking here I found the problem was the size of the file sysem allocated to data sets. When I checked after the aborts all looked ok. When I did a rough calculation of the space required by a 1.5 GB file going through 10 transformations (heaps) I figured that could be where the problem. Changed the config to put data sets in a new spot and Bingo it worked without a hitch ... 3 weeks of development lost but hey I didn't need sleep ... pleanty of coffee in my dripolator.
trokosz
Premium Member
Premium Member
Posts: 188
Joined: Thu Sep 16, 2004 6:38 pm
Contact:

Post by trokosz »

If this is a PX job with a server stage in it, the time-based job monitor is likely the problem. There is a section in the 7.5 release notes about it. You should configure that specific job to use row-based monitoring by setting BOTH the following variables: APT_MONITOR_TIME=5 and APT_MONITOR_SIZE=10000. This will update every 10000 rows. You can change APT_MONITOR_SIZE to any integer value > 1 as necessary, don't go too small or it starts to affect performance. APT_MONITOR_TIME must always remain set to 5.
paulhill20
Participant
Posts: 11
Joined: Tue Jun 22, 2004 1:06 pm

Post by paulhill20 »

[quote="trokosz"]If this is a PX job with a server stage in it, the time-based job monitor is likely the problem. There is a section in the 7.5 release notes about it. You should configure that specific job to use row-based monitoring by setting BOTH the following variables: APT_MONITOR_TIME=5 and APT_MONITOR_SIZE=10000. This will update every 10000 rows. You can change APT_MONITOR_SIZE to any integer value > 1 as necessary, don't go too small or it starts to affect performance. APT_MONITOR_TIME must always remain set to 5.[/quote]

Why must APT_MONITOR_TIME always be set to 5?
legendkiller
Participant
Posts: 60
Joined: Sun Nov 21, 2004 2:24 am

Post by legendkiller »

These error may also come when there is loads of parallel processing going on the server
Post Reply