Page 1 of 1

Unable to Re-Start DSengine

Posted: Wed May 16, 2007 4:34 pm
by har
Hello,
Mutiple projects were locked,so we bounce the server.But daemon is not running.We stop the server again and waited for 10 min and start server,still daemon is not running.
when i uv -admin -start i got this response:
DataStage Engine 7.5.1.2 instance "ade" has been brought up.
Deadlock Daemon has been started
Starting JobMonApp
JobMonApp has been started.

When i check for dsrpcd using this command
netstat | grep dsrpcd,nothing is coming up.
Any idea's wat's going on with server.

Thanks,
Har

Posted: Wed May 16, 2007 5:11 pm
by nick.bond
try this

Code: Select all

netstat -a | grep dsrpc
do you find sessions with status of FIN_WAIT_2?

If there are these are sessions that did not complete the closing handshake correctly.

You need to disconnect these before you can restart the server. Do get rid of these connections I believe you can do

Code: Select all

$DSHOME/bin/uv -admin -clearsockets
Although I have never used this method, I have always disconnected with unix commands (didn't know about the above before), but you need root to do it with unix commands. I will dig them out if you can't get this to work.

The other method is to restart the client machines responsible.

Posted: Wed May 16, 2007 5:45 pm
by lstsaur
Har,
That means the port, 31538, is in CLOSE_WAIT status.
Issue lsof -P |grep 31538 to find the PID; kill this PID and then restart your DS server.

Posted: Wed May 16, 2007 5:46 pm
by har
I found CLOSE_WAITS with netstat -a | grep dsrpc

tcp4 0 0 localhost.dsrpc localhost.58693 CLOSE_WAIT

Does these CLOSE_WAIT are stoping daemon...?

Har

Posted: Wed May 16, 2007 5:53 pm
by har
Can you please tell me excat syntax of the command to get PID for CLOSE_WAIT

Thanks,
Hari

Posted: Wed May 16, 2007 7:35 pm
by nick.bond
Determine which client machine had the connection from the IP address and restart it. That should clear the TCP connections.

Posted: Wed May 16, 2007 8:26 pm
by ray.wurlod
Search the forum for the correct procedure for shutting down. You might also search for debugging the dsrpcd daemon.

Posted: Thu May 17, 2007 12:55 am
by lstsaur
Hari,
The command is:
lsof -P |grep 31538 (this will tell you the pid of CLOSE_WAIT)

then kill that pid and restart the server.

Posted: Thu May 17, 2007 12:44 pm
by har
After Rebooting the UNIX Box ,daemon runs fine.
Any way Apprecite your help guys !

har

Posted: Thu May 17, 2007 5:46 pm
by ray.wurlod
Do you feel that that would be a popular - even acceptable - "solution" in a production environment?

I don't.

Posted: Thu May 17, 2007 10:14 pm
by lstsaur
I gave this guy twice on the command how to solve the problem.
In my shop there is no way to reboot the box for this kind of problem.

Posted: Fri May 18, 2007 11:33 am
by har
Its QA Environment not a PRD Environment.
And i try lsof command and aix machine didnt recognize lsof command..
so i forget to mention abt this lsof error in my request.
So finally we decided to reboot and it works

Posted: Fri May 18, 2007 8:33 pm
by clj242021
har wrote:Its QA Environment not a PRD Environment.
And i try lsof command and aix machine didnt recognize lsof command..
so i forget to mention abt this lsof error in my request.
So finally we decided to reboot and it works
lsof not aix standard command,you need to install An extra pack.

Posted: Fri May 18, 2007 10:12 pm
by chulett
It's open source, from what I recall.

Posted: Fri Jun 29, 2007 8:41 am
by ivannavi
nick.bond says:
...but you need root to do it with unix ...
... I have just tried clearsockets for the first time. It doesn't work:

1) I do need to be root. While not root I get:
Permission denied. Root/super-user privileges are required.
clearsockets aborted.
2) While root it doesn't complain, but it also does nothing, because I can still see this "FIN_WAIT_2" as before!

Now I hope to see a "Me too" post! :evil: