Page 1 of 1

DataStage Abort

Posted: Wed Apr 09, 2008 7:49 am
by Raftsman
Fun times again....

Config: Parallel running on one node

The job consists of two Oracle Enterprise Inputs going into the Difference stage and subsequently into the Switch stage. Dependant on the values, records are inserted or updated.

We receive this message sporatically. It's never on the same program. We have many programs in a sequence using the same logic with different inputs. Once it abort and we kick of the sequence again, it successfully completes. The message below doesn't tell us to much.

In order to begin debugging this, where would we begin. I think IBM support is even going to have problems help us....

Thanks

cs}}}(0),0: Fatal Error: Caught unknown exception in parallel process: terminating.

Determine_Difference,0: Fatal Error: Unable to initialize communication channel on SDVHQOKDS. This is typically caused by a configuration problem. Examples of typical problems include:
1) The temporary directory, identified by $TMPDIR and/or the scratch disks in your ORCHESTRATE configuration, is located on a non-local file system (e. g. mounted over NFS).
2) The temporary directory is located on a file system with insufficient space.

node_node1: Player 5 terminated unexpectedly.
main_program: APT_PMsectionLeader(1, node1), player 5 - Unexpected exit status 1.

Re: DataStage Abort

Posted: Wed Apr 09, 2008 7:58 am
by sud
Theoretically speaking, if IBM has problems telling you what the problem pertains to, then it is a BIG problem. Sounds like you do have ready access to IBM support, instead of procrastinating and drawing unnecessary conclusions, why don't you start the communication process immediately?

Posted: Wed Apr 09, 2008 8:05 am
by Raftsman
If you haven't dealt with them before, they require all information up front. If a sequence consists of 100 jobs, they need it all. This process can take weeks. We don't have this time so I first poll the forum for any help and at the same time, start the IBM support process.

Some times, this problem has already occurred with someone else and they have a resoluiton.

Thanks

Posted: Wed Apr 09, 2008 3:11 pm
by ray.wurlod
A good place to begin would be to look at whether the examples mentioned in the error message apply in your case - the temporary directory is non-local or not have enough free space at the time when the error occurs.

Posted: Thu Apr 10, 2008 9:09 am
by Raftsman
Yes, we did verify the space and this wasn't the issue. We have opened a ticket with IBM and they indicated Swap space. Also, not the case. I have heard rumblings about the Oracle Enterprise stage and its issues but we can't be sure. I have a feeling this is going to be a long a painful process trying to determine what is wrong.

I will keep you posted

Posted: Sat Apr 23, 2011 2:08 pm
by D0n1117
I had the same problem with the same error with a lookup stage with 5 sequential files (4 as lookups). Plenty of scratchdisk, space was not an issue as the error indicates. All partitioning was set to Auto. I changed the partitioning to "Same" for the input and "Entire" for the lookups and now it works. Explicitly set your partitions on your input and see if that works.