ds_ipcgetnext() timeout waiting for mutex

rohitagarwal15 · Post by **rohitagarwal15** » Tue Mar 27, 2012 5:11 am

ArndW wrote:sd_ds - I went to search and typed in "ds_ipcgetnext" and got 38 different threads as a result. Most of those looked quite informative.

I also faced similar type of error "ds_ipcgetnext() timeout waiting for mutex".
But we have already set the environment varaible namely "DS_IPCPUT_OLD_TIMEOUT_BEHAVIOR" to 1, still we face this error.
I searched the forum and found something related to tuning of some uvconfig parameters namely SPINTRIES and SPINSLEEP.
Can any one help me in providing info how i can tune this parameters so that we will not face this error again.
Presently once we recompile all jobs and now running them so they are running fine now.

chulett · Post by **chulett** » Tue Mar 27, 2012 7:23 am

Split from this older topic so you can control your own destiny.

First off, confirm for us you are still talking about a Server job on a Windows server as those "SPIN" variables depend entirely on your operating system, from what I recall. Also let us know what DataStage version you are running.

rohitagarwal15 · Post by **rohitagarwal15** » Mon Apr 02, 2012 2:59 am

Datastage version is 8.1 with fixpack2.
Operating system is AIX 6.1

chulett · Post by **chulett** » Mon Apr 02, 2012 6:40 am

Server or Parallel job? What does your job design look like?

rohitagarwal15 · Post by **rohitagarwal15** » Wed Apr 11, 2012 3:40 am

chulett wrote:Server or Parallel job? What does your job design look like?

Its Parallel job. We are using shared container in the job, apart from this we are using transformer, copy, join and funnel stage.

ray.wurlod · Post by **ray.wurlod** » Wed Apr 11, 2012 4:35 pm

Ignore everything in the error message after "timeout". Everything else is about the mechanism.

The problem is in the inter-process communication (ipc) function that gets the next buffer-ful of data. It has exceeded its wait interval.

Why? There are a number of possible reasons, but they usually hover around the fact that the server or network between servers is overloaded and/or not fast enough.

rohitagarwal15 · Post by **rohitagarwal15** » Fri Apr 13, 2012 2:07 am

Thanks Ray for the guidance and information.
Meanwhile i have spoken to the network team but they responded that there is no network issue. Again when the job was aborted that time we ask our unix admin to check for server utilization but they also said eveything is fine.

kandyshandy · Post by **kandyshandy** » Fri Apr 13, 2012 2:33 am

Do you get this error when you rerun the job?

qt_ky · Post by **qt_ky** » Fri Apr 13, 2012 9:38 pm

In DataStage Administrator:

Have you tried increasing the inter process timeout setting in the project properties from default 10 seconds to max 600 seconds?

Have you tried increasing the project's Parallel, Operator-specific DSIPC_OPEN_TIMEOUT environment variable from the default of 30?

kandyshandy · Post by **kandyshandy** » Sun Apr 15, 2012 8:48 pm

Try whatever others suggested here. If no luck, contact IBM.

In 2009, when we migrated our jobs from 7.5 to 8.0.1 & 8.1 eventually, we got this error. I don't remember exactly how it was fixed. But i guess we got a patch from IBM to fix this.

But keep in mind that that was early days of version 8

rohitagarwal15 · Post by **rohitagarwal15** » Mon Apr 16, 2012 3:55 am

I have increased the value of DSIPC_OPEN_TIMEOUT to 300 and also i added one parameter DS_IPCPUT_OLD_TIMEOUT_BEHAVIOR, value set to1. After doing all these things my job runs fine but next day again it is aborted and when i recompile the job and rerun it then it runs fine. now a days it is getting aborted but not very frequently.

ray.wurlod · Post by **ray.wurlod** » Mon Apr 16, 2012 2:51 pm

What happens if you reset the aborted job, rather than recompile?

qt_ky · Post by **qt_ky** » Mon Apr 16, 2012 7:04 pm

What inter process timeout setting values have you tried also? Those are set per project in Administrator.

Have you opened any support case after trying all the setting changes?

rohitagarwal15 · Post by **rohitagarwal15** » Mon Apr 16, 2012 11:46 pm

ray.wurlod wrote:What happens if you reset the aborted job, rather than recompile? ...

If i reset the job then it is again aborted but if i recompile it then it runs fine.

kandyshandy · Post by **kandyshandy** » Tue Apr 17, 2012 1:13 am

Rohit, Even if you revert those 2 things(Admin setting & Env. parameter), your job should run fine sometimes and abort sometimes. That's why i asked you initially whether you face this error during your rerun. Check IBM site if this is an known issue.

DSXchange

ds_ipcgetnext() timeout waiting for mutex

ds_ipcgetnext() timeout waiting for mutex

ds_ipcgetnext() - timeout waiting for mutex