Page 1 of 1

unable to use job after abort

Posted: Fri Oct 29, 2004 1:31 am
by jasper
hi,
We just started production on our datastage based dwh and off cource the first day datastage starts acting up. We have a job that runs 60 times a day. first 30 runs went ok, but then it gave the error:
Transform,1: dspipe_wait(12252): Writer(12279) process has terminated.
and aborts. after resetting and rerunning it always gives following errors:
Transform,0: Unable to run job - -2.
Transform,0: Operator terminated abnormally: runLocally did not return APT_StatusOk
CALLDETAIL03.#1.Transform.InputCDR-Input.InputCDR: ds_ipcgetnext - timeout waiting for mutex.
If we open a monitor when the job is not running it will show this transform as running.

things I've allready tried:
-recompiling : impossible, always give cannot get exclusive access.
-reimport the job from test: impossible same error
-clear status file: runs, but doesn't help
- trough administrator command ds.tools cannot find locks, so it also can't release locks.
-checked on unix machine for the process-id's found in ds.tools and they no longer exist.

can someone give some clue on this, it's only our most important fact table?

Posted: Tue Nov 02, 2004 12:52 pm
by gh_amitava
Hi,

You have to kill the process from OS and restart the DataStage. No other option. But from your error message I see a mutex error. Now Mutex error message comes if you have used a Basic Transformer in your PX job. Ascential suggests not to use a Basic Transformer in a PX job. You can use a PX transformer instead of a Basic one.

Regards
Amitava

Posted: Tue Nov 02, 2004 3:54 pm
by nkumar_home
Have you tried to 'Cleanup Resources' from Director for the job. Normally that clears out all the processes. If the lock on the job is still not released then you can try the Clear Status from Director. This should return the job to compiled status.

If above does not work make sure no phatom processes exist see below. If they do kill them.Also make sure no dsr connections are open or netsta connections are open. You can free netstat or shared memory connections with 'ipcrm -m id'
*****************************
ps -eaf | grep pha

echo "==== Checking ps for connections "
ps -eaf | grep dsr

echo ""
echo "==== Checking netstat for connections "
netstat -a | grep dsr

echo ""
echo "==== Checking shared memory (dae) (use ipcrm -m 999)"
ipcs | grep dae

echo ""
echo "==== Checking shared memory (dstage) (use ipcrm -m 999)"
ipcs | grep dstage
**************************************

If problem persists try stop and restart of dstage or reboot in that order.

Posted: Mon Jul 03, 2006 1:05 am
by Ananda
Hi,

I am facing the same mutex issue in a server job.

Error found.

SBL_SMP_CR_SLX_Extract_HashData..TRN_SPLIT.HASH_note: ds_ipcgetnext() - timeout waiting for mutex

The job is running state. I can always stop the process through Unix. But I would like to know the reason.

I am using a Basic Transformer in a Server job.

Any pointers please let me know.

Thanks

Posted: Mon Jul 03, 2006 6:16 am
by ray.wurlod
Search the forum for SPINTRIES and SPINSLEEP. These are configuration parameters that affect mutex locks' operation.

Phantom 25546

Posted: Wed Apr 23, 2008 3:03 am
by Gaurav Pamecha
I was getting the problem:
Unable to run job - -2.

I had cleaup all the resources also done clear status for the job and brougth the job to compiled status, then i run the job but getting a warning in the job:
DataStage Job 1 Phantom 25546
Program "DSD.StageRun": Line 301, Square root of a negative number.
DataStage Phantom Finished

It is generating the xml file but when tring to open the file its throwing error.

Please help me in getting the solution to it.

Posted: Wed Apr 23, 2008 3:34 am
by ray.wurlod
Welcome aboard. :D

Hijacking threads with unrelated questions is frowned upon here.

Can you please post your question as a new thread?

This will assist future searchers.