unable to use job after abort

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
jasper
Participant
Posts: 111
Joined: Mon May 06, 2002 1:25 am
Location: Belgium

unable to use job after abort

Post by jasper »

hi,
We just started production on our datastage based dwh and off cource the first day datastage starts acting up. We have a job that runs 60 times a day. first 30 runs went ok, but then it gave the error:
Transform,1: dspipe_wait(12252): Writer(12279) process has terminated.
and aborts. after resetting and rerunning it always gives following errors:
Transform,0: Unable to run job - -2.
Transform,0: Operator terminated abnormally: runLocally did not return APT_StatusOk
CALLDETAIL03.#1.Transform.InputCDR-Input.InputCDR: ds_ipcgetnext - timeout waiting for mutex.
If we open a monitor when the job is not running it will show this transform as running.

things I've allready tried:
-recompiling : impossible, always give cannot get exclusive access.
-reimport the job from test: impossible same error
-clear status file: runs, but doesn't help
- trough administrator command ds.tools cannot find locks, so it also can't release locks.
-checked on unix machine for the process-id's found in ds.tools and they no longer exist.

can someone give some clue on this, it's only our most important fact table?
gh_amitava
Participant
Posts: 75
Joined: Tue May 13, 2003 4:14 am
Location: California
Contact:

Post by gh_amitava »

Hi,

You have to kill the process from OS and restart the DataStage. No other option. But from your error message I see a mutex error. Now Mutex error message comes if you have used a Basic Transformer in your PX job. Ascential suggests not to use a Basic Transformer in a PX job. You can use a PX transformer instead of a Basic one.

Regards
Amitava
nkumar_home
Participant
Posts: 19
Joined: Fri Apr 02, 2004 10:13 am

Post by nkumar_home »

Have you tried to 'Cleanup Resources' from Director for the job. Normally that clears out all the processes. If the lock on the job is still not released then you can try the Clear Status from Director. This should return the job to compiled status.

If above does not work make sure no phatom processes exist see below. If they do kill them.Also make sure no dsr connections are open or netsta connections are open. You can free netstat or shared memory connections with 'ipcrm -m id'
*****************************
ps -eaf | grep pha

echo "==== Checking ps for connections "
ps -eaf | grep dsr

echo ""
echo "==== Checking netstat for connections "
netstat -a | grep dsr

echo ""
echo "==== Checking shared memory (dae) (use ipcrm -m 999)"
ipcs | grep dae

echo ""
echo "==== Checking shared memory (dstage) (use ipcrm -m 999)"
ipcs | grep dstage
**************************************

If problem persists try stop and restart of dstage or reboot in that order.
Ananda
Participant
Posts: 29
Joined: Mon Sep 20, 2004 12:05 am

Post by Ananda »

Hi,

I am facing the same mutex issue in a server job.

Error found.

SBL_SMP_CR_SLX_Extract_HashData..TRN_SPLIT.HASH_note: ds_ipcgetnext() - timeout waiting for mutex

The job is running state. I can always stop the process through Unix. But I would like to know the reason.

I am using a Basic Transformer in a Server job.

Any pointers please let me know.

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Search the forum for SPINTRIES and SPINSLEEP. These are configuration parameters that affect mutex locks' operation.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Gaurav Pamecha
Participant
Posts: 2
Joined: Mon Nov 19, 2007 3:35 am

Phantom 25546

Post by Gaurav Pamecha »

I was getting the problem:
Unable to run job - -2.

I had cleaup all the resources also done clear status for the job and brougth the job to compiled status, then i run the job but getting a warning in the job:
DataStage Job 1 Phantom 25546
Program "DSD.StageRun": Line 301, Square root of a negative number.
DataStage Phantom Finished

It is generating the xml file but when tring to open the file its throwing error.

Please help me in getting the solution to it.
Thanks,
Gaurav
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard. :D

Hijacking threads with unrelated questions is frowned upon here.

Can you please post your question as a new thread?

This will assist future searchers.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply