Time Out waiting for Mutex Error

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
trojan
Participant
Posts: 3
Joined: Wed Mar 28, 2007 1:44 am

Time Out waiting for Mutex Error

Post by trojan »

A job failed with an error "ds_ipcgetnext - timeout waiting for mutex".

Our Operating System is SUN Solaris and we use Datastage Server Edition 7.5

This job is scheduled to run within a particular time window and it fails during that time window only. If we run it manually outside that time window it runs perfectly alright. Supposing it to be a database issue(many processes running simultaneously within that time window) we contacted our DBA's , but they said that the database activity is normal. We also tried using various perfermance improvement options.

We modified the performamce parameters such as buffer size and time out to the maximum possible (buffer size 1024 m and timeout 600 sec) but still the job failed.

We also tried changing the configuration file parameters : spintries and spinsleep without any success.

Job Details: The job includes a link collector which collects the data from 4 stored procedures and passes it to a transformer. The tranformer then calls a stored procedure.The job fails on this link to the stored procedure.

If anyone has faced the same error OR knows the resolution please share.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard. :D

You are not the first to have encountered this problem. It seems from what you have tried that you have already searched the forum for possible answers. It might be useful to place before-stage and after-stage subroutines that execute timing points either side of the Transformer stage, so that you can prove that the delay is in the downstream SP.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
trojan
Participant
Posts: 3
Joined: Wed Mar 28, 2007 1:44 am

Post by trojan »

any other way of gettin ideas??? :(
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Any way other than what? Asking here? :?

You could try searching the forums here as 'mutex errors' are not all that uncommon and have been discussed a number of times. You could call your Support provider, make them earn the money you pay them.

Or wait for someone else to answer. Anyone can. Some of us check the site many times a day, most don't - so you may just need to wait for the right person to show up.

I personally don't have any experience with the error so can't offer any advice other than what's already here.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Junk the Link Collector, write each output link to separate sequential files, concatenate them together, then load into your target table. No more mutex errors.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I should have been more specific - I don't have issues of this nature because I don't use the "problematic" stages like the Link Collector. I do as Ken suggests - separate output files and a post-processing concatenation - this is quick, easy to implement and problem free.
-craig

"You can never have too many knives" -- Logan Nine Fingers
trojan
Participant
Posts: 3
Joined: Wed Mar 28, 2007 1:44 am

Post by trojan »

yeah....i agree to your solution. But the problem is that the job is in production and we cannot modify the job.

We started facing this problem during the last 3 weeks. Prior to that the job was workin fine for more than a year.

so other than the design changes are there any performance parameters that we can modify?? :?:

Thanks a lot for your responses though. :)
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Ask the DBA why the SP is taking longer to respond than it formerly did. There may be a solution at that end.

Ask the UNIX sys admin whether the overall load has increased since when it was working.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

The job uses a stage that has a built in timeout. That means on occasion the job will timeout. That means it is unstable. Just because it ran for a year doesn't mean it won't timeout. Even if it mysteriously starts working again doesn't mean it's stable. It can fail again in the future. The only 100% guaranteed solution is to STOP using this stage. Otherwise, it's just a gamble each time you use the LC that it will work.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
jpr196
Participant
Posts: 65
Joined: Tue Sep 26, 2006 1:49 pm
Location: Virginia

Post by jpr196 »

On my current engagement, we had this same error and tried many of the suggested solutions. However, we finally discovered the reason for this was because we were running out of room in our tablespace. So, you may want to check with the dbas to make sure enough memory is being allocated if you haven't already.
Post Reply