DataStage Abort

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Raftsman
Premium Member
Premium Member
Posts: 335
Joined: Thu May 26, 2005 8:56 am
Location: Ottawa, Canada

DataStage Abort

Post by Raftsman »

Fun times again....

Config: Parallel running on one node

The job consists of two Oracle Enterprise Inputs going into the Difference stage and subsequently into the Switch stage. Dependant on the values, records are inserted or updated.

We receive this message sporatically. It's never on the same program. We have many programs in a sequence using the same logic with different inputs. Once it abort and we kick of the sequence again, it successfully completes. The message below doesn't tell us to much.

In order to begin debugging this, where would we begin. I think IBM support is even going to have problems help us....

Thanks

cs}}}(0),0: Fatal Error: Caught unknown exception in parallel process: terminating.

Determine_Difference,0: Fatal Error: Unable to initialize communication channel on SDVHQOKDS. This is typically caused by a configuration problem. Examples of typical problems include:
1) The temporary directory, identified by $TMPDIR and/or the scratch disks in your ORCHESTRATE configuration, is located on a non-local file system (e. g. mounted over NFS).
2) The temporary directory is located on a file system with insufficient space.

node_node1: Player 5 terminated unexpectedly.
main_program: APT_PMsectionLeader(1, node1), player 5 - Unexpected exit status 1.
Jim Stewart
sud
Premium Member
Premium Member
Posts: 366
Joined: Fri Dec 02, 2005 5:00 am
Location: Here I Am

Re: DataStage Abort

Post by sud »

Theoretically speaking, if IBM has problems telling you what the problem pertains to, then it is a BIG problem. Sounds like you do have ready access to IBM support, instead of procrastinating and drawing unnecessary conclusions, why don't you start the communication process immediately?
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
Raftsman
Premium Member
Premium Member
Posts: 335
Joined: Thu May 26, 2005 8:56 am
Location: Ottawa, Canada

Post by Raftsman »

If you haven't dealt with them before, they require all information up front. If a sequence consists of 100 jobs, they need it all. This process can take weeks. We don't have this time so I first poll the forum for any help and at the same time, start the IBM support process.

Some times, this problem has already occurred with someone else and they have a resoluiton.

Thanks
Jim Stewart
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

A good place to begin would be to look at whether the examples mentioned in the error message apply in your case - the temporary directory is non-local or not have enough free space at the time when the error occurs.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Raftsman
Premium Member
Premium Member
Posts: 335
Joined: Thu May 26, 2005 8:56 am
Location: Ottawa, Canada

Post by Raftsman »

Yes, we did verify the space and this wasn't the issue. We have opened a ticket with IBM and they indicated Swap space. Also, not the case. I have heard rumblings about the Oracle Enterprise stage and its issues but we can't be sure. I have a feeling this is going to be a long a painful process trying to determine what is wrong.

I will keep you posted
Jim Stewart
D0n1117
Premium Member
Premium Member
Posts: 11
Joined: Sun Dec 19, 2010 1:49 pm
Location: VA

Post by D0n1117 »

I had the same problem with the same error with a lookup stage with 5 sequential files (4 as lookups). Plenty of scratchdisk, space was not an issue as the error indicates. All partitioning was set to Auto. I changed the partitioning to "Same" for the input and "Entire" for the lookups and now it works. Explicitly set your partitions on your input and see if that works.
Don
DataStage Developer
Post Reply