RPC daemon is not running (81016))

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
kalyanvinnakota
Participant
Posts: 48
Joined: Thu May 05, 2005 9:24 pm

RPC daemon is not running (81016))

Post by kalyanvinnakota »

Hi All,

I re-started the DS server. When I checked for the client connections and Jobs running- there was one Job that was hanging. I faced the same problem of not able to clean it using DS.TOOLS. So I killed the process manually.

After that I stopped and started the server.

Now I get this error.

[b]Failed to connect to host: oppt.in.ibm.com, project: UV
(The connection was refused or the RPC daemon is not running (81016))[/b]

I checked the DS docs and it says to manually re-start the dsrpc deamon do a stop-start server.

I have tried it multiple times- but I see the same error.

Could anyone help us urgently as we are now unable to connect it.

Thanks a ton
Regards,
Kalyan
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Have you searched for 81016 on the forum? You might also search for ways to debug why the RPC daemon may not be starting ("-d9" will help, so might "netstat", so might "BOMBED").
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kalyanvinnakota
Participant
Posts: 48
Joined: Thu May 05, 2005 9:24 pm

Post by kalyanvinnakota »

I am following the below mentioned thing. But it is not working


DataStage Client to UNIX Server Connections
If you cannot connect from a DataStage client to a UNIX server, check that
the dsrpcd daemon is running. The dsrpcd daemon is started when the
DataStage server is installed, and should start automatically when you
reboot. If the daemon has stopped for some reason, restart it with the
following command:
dshome/bin/uv -admin -start
dshome is the DataStage server engine home directory.



Regards,
Kalyan
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That is incomplete advice.

After starting DataStage you should check that the dsrpcd process is running.

Code: Select all

ps -ef | grep dsrpcd | grep -v grep
If it is not, then you must find out why. Usually this involved the netstat command, to determine whether there are any connected DataStage processes preventing dsrpcd from binding to port number 31538.

You can also test this theory - provided you have superuser access - by starting dsrpcd manually capturing debugging information. Assuming you have the DataStage bin directory in your path, and are in the "home" directory:

Code: Select all

nohup dsrpcd -d9 > /tmp/dsrpcd.log 2>&1 &
Typically you get a four line log indicating the problem within a few seconds of executing the command.

Search the forum for techniques for disconnecting the sleeping connections.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kalyanvinnakota
Participant
Posts: 48
Joined: Thu May 05, 2005 9:24 pm

Post by kalyanvinnakota »

Hi Ray,

Thanks a Ton for the reply.

I have done - what you have suggested and this is the ouput.


>nohup dsrpcd -d9 > /tmp/dsrpcd.log 2>&1 &
[1] 123012

dsrpcd.log ->
RPCPID=123012 - 11:29:25 - uvrpc_debugflag=9 (Debugging level)
RPCPID=123012 - 11:29:25 - In rpc_init()
RPCPID=123012 - 11:29:25 - bind bombed errno=67
RPCPID=123012 - 11:29:25 - listen failed


Have you ever seen this error? Let me know and in the mean time, I will try to do some search on this error.

Thanks and Regards,
Kalyan
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's exactly what I described earlier. Now search for netstat and how to free those processes so that dsrpcd can bind to port number 31538.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pneumalin
Premium Member
Premium Member
Posts: 125
Joined: Sat May 07, 2005 6:32 am

Post by pneumalin »

Kalyan,
if you issue -stop while there are live client connection to the server, the dsrpcd cannot be brought up properly when -start is launched. Ensure to close any client connection before doing -stop..

To activate dsrpcd in this situation, Ray might have better idea to do it via the netstat infor and I want to know how to do it too. However, If you are desperate and have no problem to reboot the Unix, then Reboot the whole box will reset everything back to normal, which means the dsrpcd will run again..

Hopefully it helps!

Pneuma.
kalyanvinnakota
Participant
Posts: 48
Joined: Thu May 05, 2005 9:24 pm

Post by kalyanvinnakota »

Hi Ray

The netstat command gave the following output.

>netstat| grep dsr
tcp4 0 0 oppt.in.ibm.com.dsrpc vtammine.in.ibm..2816 CLOSE_WAIT


Do we have to kill the above process?

One more thing- when you said "Free the processes" Which process are you referring to?

I am not sure regarding the below mentioned one's as there are no process Identifier's etc to identify these to clear/kill them.

RPCPID=123012 - 11:29:25 - uvrpc_debugflag=9 (Debugging level)
RPCPID=123012 - 11:29:25 - In rpc_init()
RPCPID=123012 - 11:29:25 - bind bombed errno=67
RPCPID=123012 - 11:29:25 - listen failed


Could you please clarify this for me.

Thanks a Lot
Regards,
Kalyan
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Before killing the process, you might like to check (with an lsof command) that it is a DataStage process. Also use ipcs -m | grep ade to make sure that there are no DataStage processes after you've shut DataStage down, then a restart should be fine.

Did you search? There are 65 hits on netstat alone. If you search for netstat and lsof (all terms) you will narrow to a small number of useful hits.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kalyanvinnakota
Participant
Posts: 48
Joined: Thu May 05, 2005 9:24 pm

Post by kalyanvinnakota »

Hi Ray,

Thanks for all your notes and sorry for the trouble Caused!!!!!!!!!!!!!!!

Currently we have brought down the Datastage server.

checks done when the server is down.

We used the netstat|grep dsr command and it is giving the following result

>netstat|grep dsr
tcp4 0 0 oppt.in.ibm.com.dsrpc vtammine.in.ibm..2816 CLOSE_WAIT


we also used the ipcs -m|grep ade command but nothing is being displayed.

Also there is no process associated with CLOSE_WAIT

ps -ef|grep CLOSE_WAIT
dsadm 128910 123754 0 12:41:51 pts/1 0:00 grep CLOSE_WAIT


As it is not associated with a process how do we kill it? We can see this CLOSE_WAIT only while using the netstat command and not a part of ps.

Thanks and Regards,
kalyan
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The quickest solution is to stop and restart UNIX. That MUST unbind the port (given your employer I'm assuming you're not on a Tru64 cluster).

Otherwise it's a long and tedious process to identify the process associated with the CLOSE_WAIT (or FIN_WAIT or FIN_WAIT2) state so that you can remove it.

I don't have access to DataStage at the moment (doing something else) so all of my replies have been from memory. That's why I've been pushing you to search. For example, you may need to check (using ndd command) what the default TCP timeout is on the system.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kalyanvinnakota
Participant
Posts: 48
Joined: Thu May 05, 2005 9:24 pm

Post by kalyanvinnakota »

Hi Ray,

we reboted our AIX box and the problem got resolved.

We are able to connect to our DS Project.

The previous problem of Sequencer not getting deleted or compiled as we physically removed the RT_CONFIGXXX file, now when we tried to open this sequencer said- un able to find this and on refresh was removed.

So we were successfully able to rename the backedup sequncer to the current sequencer and also successfully compiled.

As of Now problems seems to be resolved. :D

I want to personally thank you and others for the Help Provided.

Have a Great day ahead.
Thanks and Regards,
Kalyan
Post Reply