Unable to Re-Start DSengine

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
har
Participant
Posts: 118
Joined: Tue Feb 17, 2004 6:23 pm
Location: cincinnati
Contact:

Unable to Re-Start DSengine

Post by har »

Hello,
Mutiple projects were locked,so we bounce the server.But daemon is not running.We stop the server again and waited for 10 min and start server,still daemon is not running.
when i uv -admin -start i got this response:
DataStage Engine 7.5.1.2 instance "ade" has been brought up.
Deadlock Daemon has been started
Starting JobMonApp
JobMonApp has been started.

When i check for dsrpcd using this command
netstat | grep dsrpcd,nothing is coming up.
Any idea's wat's going on with server.

Thanks,
Har
Har
nick.bond
Charter Member
Charter Member
Posts: 230
Joined: Thu Jan 15, 2004 12:00 pm
Location: London

Post by nick.bond »

try this

Code: Select all

netstat -a | grep dsrpc
do you find sessions with status of FIN_WAIT_2?

If there are these are sessions that did not complete the closing handshake correctly.

You need to disconnect these before you can restart the server. Do get rid of these connections I believe you can do

Code: Select all

$DSHOME/bin/uv -admin -clearsockets
Although I have never used this method, I have always disconnected with unix commands (didn't know about the above before), but you need root to do it with unix commands. I will dig them out if you can't get this to work.

The other method is to restart the client machines responsible.
Regards,

Nick.
lstsaur
Participant
Posts: 1139
Joined: Thu Oct 21, 2004 9:59 pm

Post by lstsaur »

Har,
That means the port, 31538, is in CLOSE_WAIT status.
Issue lsof -P |grep 31538 to find the PID; kill this PID and then restart your DS server.
har
Participant
Posts: 118
Joined: Tue Feb 17, 2004 6:23 pm
Location: cincinnati
Contact:

Post by har »

I found CLOSE_WAITS with netstat -a | grep dsrpc

tcp4 0 0 localhost.dsrpc localhost.58693 CLOSE_WAIT

Does these CLOSE_WAIT are stoping daemon...?

Har
Har
har
Participant
Posts: 118
Joined: Tue Feb 17, 2004 6:23 pm
Location: cincinnati
Contact:

Post by har »

Can you please tell me excat syntax of the command to get PID for CLOSE_WAIT

Thanks,
Hari
Har
nick.bond
Charter Member
Charter Member
Posts: 230
Joined: Thu Jan 15, 2004 12:00 pm
Location: London

Post by nick.bond »

Determine which client machine had the connection from the IP address and restart it. That should clear the TCP connections.
Regards,

Nick.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Search the forum for the correct procedure for shutting down. You might also search for debugging the dsrpcd daemon.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
lstsaur
Participant
Posts: 1139
Joined: Thu Oct 21, 2004 9:59 pm

Post by lstsaur »

Hari,
The command is:
lsof -P |grep 31538 (this will tell you the pid of CLOSE_WAIT)

then kill that pid and restart the server.
har
Participant
Posts: 118
Joined: Tue Feb 17, 2004 6:23 pm
Location: cincinnati
Contact:

Post by har »

After Rebooting the UNIX Box ,daemon runs fine.
Any way Apprecite your help guys !

har
Har
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Do you feel that that would be a popular - even acceptable - "solution" in a production environment?

I don't.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
lstsaur
Participant
Posts: 1139
Joined: Thu Oct 21, 2004 9:59 pm

Post by lstsaur »

I gave this guy twice on the command how to solve the problem.
In my shop there is no way to reboot the box for this kind of problem.
har
Participant
Posts: 118
Joined: Tue Feb 17, 2004 6:23 pm
Location: cincinnati
Contact:

Post by har »

Its QA Environment not a PRD Environment.
And i try lsof command and aix machine didnt recognize lsof command..
so i forget to mention abt this lsof error in my request.
So finally we decided to reboot and it works
Har
clj242021
Participant
Posts: 33
Joined: Mon Feb 12, 2007 2:54 am

Post by clj242021 »

har wrote:Its QA Environment not a PRD Environment.
And i try lsof command and aix machine didnt recognize lsof command..
so i forget to mention abt this lsof error in my request.
So finally we decided to reboot and it works
lsof not aix standard command,you need to install An extra pack.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

It's open source, from what I recall.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ivannavi
Premium Member
Premium Member
Posts: 120
Joined: Mon Mar 07, 2005 9:49 am
Location: Croatia

Post by ivannavi »

nick.bond says:
...but you need root to do it with unix ...
... I have just tried clearsockets for the first time. It doesn't work:

1) I do need to be root. While not root I get:
Permission denied. Root/super-user privileges are required.
clearsockets aborted.
2) While root it doesn't complain, but it also does nothing, because I can still see this "FIN_WAIT_2" as before!

Now I hope to see a "Me too" post! :evil:
Post Reply