Error writing to pipe
Moderators: chulett, rschirm, roy
Error writing to pipe
Bonjour,
I used the DB2 AIX V8 version with the DataStage 7.5.
When I tried to LOAD a large volume of rows (more than 5 millions) in my DB2 table and I got this error "Error writing to pipe".
I suspect a problem with the partition LOCK.
I tried to used a .DS and .SEQ file but I got the same error.
Anyone can help me !!!
Thank !
I used the DB2 AIX V8 version with the DataStage 7.5.
When I tried to LOAD a large volume of rows (more than 5 millions) in my DB2 table and I got this error "Error writing to pipe".
I suspect a problem with the partition LOCK.
I tried to used a .DS and .SEQ file but I got the same error.
Anyone can help me !!!
Thank !
I am curious why you suspect a DB/2 partition lock when you get the same error message writing to a dataset or sequential file?
Does the error message refer to a stage or a player number - pipes are used extensively for interprocess communication so you would need to narrow it down. If you got an error writing to a pipe that means that the other process that is reading from the pipe has either died (usually the error message then is different) or is too slow. Your error looks like a pipe timeout issue; but this shouldn't happen when writing to a sequential file. Also, does the timeout and abort happen at the same row each time or at approximately the same runtime?
Does the error message refer to a stage or a player number - pipes are used extensively for interprocess communication so you would need to narrow it down. If you got an error writing to a pipe that means that the other process that is reading from the pipe has either died (usually the error message then is different) or is too slow. Your error looks like a pipe timeout issue; but this shouldn't happen when writing to a sequential file. Also, does the timeout and abort happen at the same row each time or at approximately the same runtime?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Camaj,
change the job to write to a dataset or sequential file. Exactly what happens, i.e. the full error message. Does it happen at the same row number each time? Which stage is giving the error? All this can be done without going into the details of the orchestrate mechanism.
Usually in a case like this I start removing components of the job bit by bit until the error goes away, then concentrate on what I did for the last change. Do you have lookups or transformations? Does the error persist when you remove these steps?
change the job to write to a dataset or sequential file. Exactly what happens, i.e. the full error message. Does it happen at the same row number each time? Which stage is giving the error? All this can be done without going into the details of the orchestrate mechanism.
Usually in a case like this I start removing components of the job bit by bit until the error goes away, then concentrate on what I did for the last change. Do you have lookups or transformations? Does the error persist when you remove these steps?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
OK, so then it goes away when you don't load to DB/2. What happens if you change the stage to upsert? Is your scratch filling up? Could you look at your log file and post the actual error message, plus look a couple of entries before and after for warnings or other text that might assist in narrowing down the cause. Run just a couple of thousand rows through (put a constraint @INROWNUM < 5000 in a transform stage) and see if the data actually gets written to DB/2. Sort your incoming data stream differently (could it be related to what you are trying to write - i.e. bad data of some type).
The answer to your problem isn't obvious from your error description, so you will need to do some more diagnosis and reporting in order to narrow it down.
The answer to your problem isn't obvious from your error description, so you will need to do some more diagnosis and reporting in order to narrow it down.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
The Load functionality will buffer data - can you watch your temporary and scratch areas to see if they fill up during the big run?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
I don't know what else it might be - if nothing shows up in your DB/2 logs then I think it is time to contact Ascential/IBM support.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
You can change this monitoring information in the $APT settings; but I am not quite sure what extra information would come out of this. Perhaps Kumar could explain -
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Hi,
Always Size base monitoring has preference over the Time base monitoring.
The errors due to broken pipe was also due to the jobmonn process This is due to the monitoring frequency.
It recomended to have some value say 5 in APT_MONITOR_TIME.
Hence to override thisAPT_MONITOR_SIZE can be assighedn to a have a huge value say a million.
Also check for swap space.
regards
kumar
Always Size base monitoring has preference over the Time base monitoring.
The errors due to broken pipe was also due to the jobmonn process This is due to the monitoring frequency.
It recomended to have some value say 5 in APT_MONITOR_TIME.
Hence to override thisAPT_MONITOR_SIZE can be assighedn to a have a huge value say a million.
Also check for swap space.
regards
kumar
-
- Participant
- Posts: 60
- Joined: Sat Jan 24, 2004 12:52 pm
- Location: Mount Carmel, IL
Let me clarify a little on what kumar_s said.
Always Size base monitoring has preference over the Time base monitoring.
-->This is incorrect. As is comes striaght out of the box, it is just the opposite. Time-based monitoring is the default, and unless you change it, is the preferred method.
The errors due to broken pipe was also due to the jobmonn process This is due to the monitoring frequency.
This is probably correct, but without more information and troubleshooting is hard to really tell. There could be an issue with DB2, but it's more likely that modifying the JobMon settings will cause the issue to disappear.
With that said, here's the skinny on JobMon.
The default value for APT_MONITOR_TIME is 5, which if no value is present for APT_MONITOR_SIZE, the engine uses time-based monitoring. There have been issues with time-based monitoring, so a recommended approach is to set APT_MONITOR_SIZE to some large value (like 50000 or 100000). This forces the engine to use row-based monitoring, and decreases the frequency of JobMon checks by the engine. This only works, however, if you have APT_MONITOR_TIME set to the default value (of 5). If any other value is set in APT_MONITOR_TIME, time-based monitoring is used. The only other alternative here, is to turn monitoring off, by using APT_NO_JOBMON=1. There are some patches that exist to fix issues with monitoring (check with Ascential Support to see if one applies in your particular case).
Always Size base monitoring has preference over the Time base monitoring.
-->This is incorrect. As is comes striaght out of the box, it is just the opposite. Time-based monitoring is the default, and unless you change it, is the preferred method.
The errors due to broken pipe was also due to the jobmonn process This is due to the monitoring frequency.
This is probably correct, but without more information and troubleshooting is hard to really tell. There could be an issue with DB2, but it's more likely that modifying the JobMon settings will cause the issue to disappear.
With that said, here's the skinny on JobMon.
The default value for APT_MONITOR_TIME is 5, which if no value is present for APT_MONITOR_SIZE, the engine uses time-based monitoring. There have been issues with time-based monitoring, so a recommended approach is to set APT_MONITOR_SIZE to some large value (like 50000 or 100000). This forces the engine to use row-based monitoring, and decreases the frequency of JobMon checks by the engine. This only works, however, if you have APT_MONITOR_TIME set to the default value (of 5). If any other value is set in APT_MONITOR_TIME, time-based monitoring is used. The only other alternative here, is to turn monitoring off, by using APT_NO_JOBMON=1. There are some patches that exist to fix issues with monitoring (check with Ascential Support to see if one applies in your particular case).