19 jobs failed with ds_ipcgetnext
Moderators: chulett, rschirm, roy
19 jobs failed with ds_ipcgetnext
Hi sorry abut the title - i didn't want this dismissed with the usuall "timeout waiting for mutex" answer! I know about row/buffers etc...
Anyway, heres the problem. On Friday, one job, a pull from a SQL server db, failed with ""timeout waiting for mutex". I reset/reran it failed pretty much straightway, with other jobs running. I reran it on saturday morning, again with other jobs, it completed successfully.
Tonight, the same job,.. along with 18 others (in the same category) failed with mutex errors. I have other jobs in other categories running to completion, without issue. Just these jobs in this particular category failed - all from the same source,.. all pretty much failed at the same time.
As well as the ds_ipcput(), I'm also getting an ds_ipcgetnext() thereafter. Most server jobs have this error, most have it in a CInterProcess Stage.
Nothing has changed, no upgrades to DS or its jobs. I can't speak for the source system however.
Any help very much appreciated!!!
Thanks in advance
Paul
Anyway, heres the problem. On Friday, one job, a pull from a SQL server db, failed with ""timeout waiting for mutex". I reset/reran it failed pretty much straightway, with other jobs running. I reran it on saturday morning, again with other jobs, it completed successfully.
Tonight, the same job,.. along with 18 others (in the same category) failed with mutex errors. I have other jobs in other categories running to completion, without issue. Just these jobs in this particular category failed - all from the same source,.. all pretty much failed at the same time.
As well as the ds_ipcput(), I'm also getting an ds_ipcgetnext() thereafter. Most server jobs have this error, most have it in a CInterProcess Stage.
Nothing has changed, no upgrades to DS or its jobs. I can't speak for the source system however.
Any help very much appreciated!!!
Thanks in advance
Paul
Re: 19 jobs failed with ds_ipcgetnext
Just a general questions:
1) Are these jobs are running for a while without any issues?
2) Though there is no changes in the DS side any changes made in network / SQL Server / OS?
1) Are these jobs are running for a while without any issues?
2) Though there is no changes in the DS side any changes made in network / SQL Server / OS?
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
This is an example of a case where you may benefit from slightly increasing the buffer timeout value.
It's still related to total load on the machine, but if you can allow the IPC buffers a bit more grace time, you *should* get fewer timeouts.
It's still related to total load on the machine, but if you can allow the IPC buffers a bit more grace time, you *should* get fewer timeouts.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Re: 19 jobs failed with ds_ipcgetnext
Yes,... these jobs have been running in 8.5 since i upgraded in April. No previous issues until Fridays one job failure,.. now all these 19.SURA wrote:Just a general questions:
1) Are these jobs are running for a while without any issues?
2) Though there is no changes in the DS side any changes made in network / SQL Server / OS?
We've had no network or o/s changes - not sure about the DB. I wasn't informed of any changes.
All te IPC stages are at defaults,..
Buffer: 128kb
Timeout: 10secs
Yes, it looks as though they are hitting the 10 secs and erroring
I can up all of them - but i don't understand why they're failing now. The category has 195 jobs,.. 19 failed wit this error.
Re: 19 jobs failed with ds_ipcgetnext
You mentioned all 19 were hitting the same DB, are these the only ones to hit that DB ? If so, your culprit is almost certainly a change on the DB front. How long do the queries take to run in a DB client? What database is it?
Is there any load on either the DataStage server or the DB server when trying to run the jobs? How many of these are you running at one time?
Is there any load on either the DataStage server or the DB server when trying to run the jobs? How many of these are you running at one time?
I have probably 7 or so jobs running simaltaniously into the same database. The only ones which are causing a problem are the IPC jobs.
Sorry,.. I'm starting to understan this a little more.
I got our unix administrator to report out the process activity over the period. There was a massive cpu spike at the time the jobs started to go wrong. The data didn't show load, how many processes were waiting, but I suspect given the utilisation of the 4 cpu's this were the problem is.
I am going to up the projects timeout parameter to 20 seconds... I have some questions which looked to not be answered here...
Sorry,.. I'm starting to understan this a little more.
I got our unix administrator to report out the process activity over the period. There was a massive cpu spike at the time the jobs started to go wrong. The data didn't show load, how many processes were waiting, but I suspect given the utilisation of the 4 cpu's this were the problem is.
I am going to up the projects timeout parameter to 20 seconds... I have some questions which looked to not be answered here...
-
- Participant
- Posts: 14
- Joined: Mon Jan 19, 2009 9:06 pm
We saw problems with IPC stages whenever the column metadata (datatype, length, display) defined in the stage didn't match what was coming from the source. Correcting the metadata helped but never resolved all of our problems. In the end, we removed the IPC stages whenever a job would fail with this error.
I don't see any mention of what you upgraded from or what database you're using, but we experienced lots of strange errors when upgrading from 7.5.2 to 8.1. You may want to look what what patches you had installed on your previous version and see if something similar needs to be applied to 8.5. We struggled for a year before discovering a patch that needed to be applied to our new environment.
I don't see any mention of what you upgraded from or what database you're using, but we experienced lots of strange errors when upgrading from 7.5.2 to 8.1. You may want to look what what patches you had installed on your previous version and see if something similar needs to be applied to 8.5. We struggled for a year before discovering a patch that needed to be applied to our new environment.
Nope; to me, you are pushing the issue , not solving it. Based on my understanding, if the load delayed due to network traffic or any other reasons you will face this issue again.PaulS wrote:I am going to up the projects timeout parameter to 20 seconds... I have some questions which looked to not be answered here...
I am not sure about your job design.
1) Write the data into a file and the use a separate load job could resolve (95%).
2) Replace the IPC with file
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
We also upgraded from 7.5.2.. I hit mutex errors in 8.5 in job using a link partitioner/collector. First time I've seen it in an IPC.
From the mass of documents I've read, it appears IPC are more trouble than they are worth. Unfortuneately my category has 195 jobs, each with two IPCs.. I'm not about to re-write them all.
I've been looking at the sequencer and instead of uping the timeouts, I'm going to resequence the calling of the jobs. I have 7 strands running simaltaniously,.. the whole sequence takes 25mins, couple of the strands complete in 10 mins. I'll combine them and take some of the weight off the early period of heavy utilisation. There is some scope to smooth it out further if needed.
Thanks for everyones help here - very much appreciated!
Paul
From the mass of documents I've read, it appears IPC are more trouble than they are worth. Unfortuneately my category has 195 jobs, each with two IPCs.. I'm not about to re-write them all.
I've been looking at the sequencer and instead of uping the timeouts, I'm going to resequence the calling of the jobs. I have 7 strands running simaltaniously,.. the whole sequence takes 25mins, couple of the strands complete in 10 mins. I'll combine them and take some of the weight off the early period of heavy utilisation. There is some scope to smooth it out further if needed.
Thanks for everyones help here - very much appreciated!
Paul