Mutex errors

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Mutex errors

Post by ShaneMuir »

Hello everybody,

I have searched around the forums to find information about 'timeout waiting for mutex' and have found some useful information but nothing which clears up my problem which is as follows:

I am running a very simple job from a flat file which uses a couple of hash file lookups - when i run one version it works perfectly. However when I release the job and run it, I keep getting 'timeout waiting for mutex' errors.

It appears that it has something to do with the Hash file lookups but why does it only happen in the released version of the job? How can this be cleared up? It has been suggested that it could be caused by the speed of the CPU amongst other things. Should I set the timeout period higher or lower?

Any help you can provide me on this would be most appreciated.

Regards
Shane Muir
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

What kinds of things are you doing in your job, what stages are you using? IPC? Do you have Row Buffering enabled? What operating system? When you release this job, do you migrate it to another (production) server or is it running on the same server as the unreleased job?
ShaneMuir wrote:Should I set the timeout period higher or lower?
Higher. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

Thanks for the quick response there Craig,

In response to your questions

1. What kind of things am i doing in the job - its all very simple (or at least i thought it was).
  • a. Flat file comes in
    b. Transform ads a couple of extra fields based on input data
    c. Rather large transform - 3 hash files which are simple mappings for country codes, currency codes and account groupings. Also in this transform stage are approx 17 output streams.
    d. each stream after that produces a separate line for a flat file
    e. collector stage to collect all the 17 output streams
    f. flat file output
2. Row buffering is enabled, as is the Inter process option.
3. Unix OS
4. Actually the release job is on the same server as the unreleased job

I set the timeout higher but still get the multiple mutex errors (generally about 1 for each output stream).

Thanks again
Shane
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

ShaneMuir wrote:3. Unix OS
Actually, I meant which specific O/S... HP/UX? It seems to get more than its fair share of mutex errors. :?
I set the timeout higher but still get the multiple mutex errors
What value do you currently have it set to?
-craig

"You can never have too many knives" -- Logan Nine Fingers
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

There are spin tries and spin wait settings in the DataStage config file but I'm yet to find description of how these should be set to avoid the mutex problem. Your link collector is almost certainly the trigger for the mutex problem if you can redesign your job to not use one you should be okay. 17 output streams is a bit unusual, is it possible to redesign your job to have a transformer followed by a pivot stage. The transformer would do all the lookup, append the extra columns, the pivot would break each row into up to 17 rows based on the values in the extra columns.
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

Craig:
In answer to your question, the operating system is HP-UX 11i and the timeout was originally set at 10 secs and i moved it up to 300 secs. The thing is that the time out errors occur after about 6 seconds anyway.

Vincent:
Unfortunately I don't think the Pivot table will work as each of the output streams has a different data structure (eg one has 36 fields the next has 91 etc). Its almost like XML but just different enough to be annoying.

Thanks again guys for all your input on this.
tonystark622
Premium Member
Premium Member
Posts: 483
Joined: Thu Jun 12, 2003 4:47 pm
Location: St. Louis, Missouri USA

Post by tonystark622 »

I can't remember if the Link Collector requires Row Buffering and/or Inter Process to be enabled. Have you tried disabling Row Buffering?

Tony
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Link Collector requires row buffering enabled. :cry:

(You can know this because of the call to ipcopen() in the error message - ipc is "inter process communication".) Also, it's in the manual. :wink:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
tonystark622
Premium Member
Premium Member
Posts: 483
Joined: Thu Jun 12, 2003 4:47 pm
Location: St. Louis, Missouri USA

Post by tonystark622 »

I thought it might, but couldn't find it when I looked in the manual. Ah well, at least I was trying to help.

Tony
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

Hi again everyone

Just to let you know, we still haven't found the cause of the problem, but as it turns out, it was not an isolated problem, another part of the project on a separate site also had the same problem, so at present we have just migrated only the job executable to production and it works fine. Will get back to you when we find a permanent solution.

Thanks again for all your help
Shane Muir
jeredleo
Participant
Posts: 74
Joined: Thu Jun 19, 2003 8:49 am

Post by jeredleo »

Just curious what version of DS you have 7. ?? I saw that you use Link Collector and then continue on and indicate that your 17 output files have different layouts. I didn't think you could use Link Collector to collect multiple layouts? I also know that we upgraded to 7.5 a couple months ago and had problems where you need to have Key's set up on the input files all have to match as well as your output file. This wasn't a problem in earlier releases of DS as far as having some inputs having a key identified and some not. However with the 7.5 release, we ran into problems where depending how it was set up, our jobs would abend and garbage up the output stream or in other cases it would actually drop records. Just a heads up. Now in regards to the mutex errors, we had problems on adjusting the performance tab on the job. If you have it set to get Project defaults, when moving to production and 're-compiling' the job I would have to assume it would pull in your production project's defaults. Have your DS Admin verify that the project defaults are the same between the two projects. Just something to look at.

JB
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

MUTEX locks - also called "smart semaphores" - live in the operating system.

Instead of waiting, asleep, on a regular semaphore and waking when notified, the idea of a smart semaphore is to wait, retrying, so that you can wake faster.

Which is good in theory.

As machines became faster, the limit on retries before abandoning the wait could be achieved more quickly, which is what causes the errors. By increasing the allowed number of retries (SPINTRIES) or the time for which one is prepared to wait (SPINWAIT), you should be able to reduce the frequency of these errors.

But they will still occur if you have to wait too long for a resource.

In DataStage this might, for example, indicate a badly tuned set of lock tables, extensive use of directory-type files (which only have "group 1"), or just waiting too long for some resource that is governed by use of a semaphore for single threading, such as accessing the T30FILE table or the disk cache file/free chains.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

Hi again

In response the the link collector with 17 different inputs, i suppose i should have mentioned that each stream although different is then transformed into a single concatenated text field so that they are the same length and can be fed into a link collector, before being sent to a single text output.

And also checking now with Admin to see if the server settings are the same.

SM
Post Reply