Problem with failed jobs: suspended processes...

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
xjonny
Participant
Posts: 16
Joined: Tue Oct 03, 2006 2:06 am

Problem with failed jobs: suspended processes...

Post by xjonny »

Hello all!

System:
SuSE Linux 10, DS 7.5x2, compiler: gcc 3.3.3

When parallel job aborts, its processes stay in the system and don't stop...
For example, DB (source) was inactive during job run. Job aborts.
I try to recompile it again. DS says "it is blocked".

I try to see processes with "top" or "ps awx"... and see processes of DS (phantom and so on...).

In job there are "DB2 CLI" (3 src, 1 dst), join (2), shared container (I try it while making new functions in C).

DS configured to use 4 nodes (SMP).

When i tryed to stop all processes of job, it was still inactive, so I had to stop&start DS (uv -admin ...). It is unusable :(. Please, help!

What should I do to prevent this?
IT happens...
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

xjonny,

you have posted to many problems at once and it is hard to tell now what is cause and what is effect. You have tried to bring up DataStage while there are still processes up and running, which is why the engine has not correctly started. After bringing down the engine you need to ensure that no connections remain active.

Look at this thread for information, although there are many more threads out there dealing with this subject.

Once you have DS back up and running, I suggest you use the "truss -p {pid}" command to see what those suspended processes are actually doing and then posting that on this thread.
xjonny
Participant
Posts: 16
Joined: Tue Oct 03, 2006 2:06 am

Post by xjonny »

I mean: "if parallel job fails (sic!!! it fails!!! and I can see this by its status!), its processes stay active. i see them & they are not going to stop...".
With all awful results...

What can I do? There was no such problem with Srv. Ed.

Do you understand? Any ideas?
IT happens...
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I do understand. Did you check to see what these processes are doing using the truss command as I suggested earlier?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

My #1 idea is to create jobs that do not fail.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply