Page 1 of 2

RPC Daemon is not running Datastage Version 6

Posted: Mon Dec 01, 2003 5:54 am
by anupam
Hi All,

I have installed DataStage Version 6 on solaris. The log file is not showing any error.

I am getting this error when trying to connect to server - "The connection was refused or the RPC Daemon is not running".

The dsrpc - 31538/tcp entry is there in /etc/services. I have rebooted the Server after sucessful installation. And also brought down the DSEngine and then brought up. But i am unable to see the uvrpc entry when i am executing netstat -a | grep uvrpc.

Please Suggest what should i do to resolve this problem ASAP.

Posted: Mon Dec 01, 2003 6:12 am
by ray.wurlod
The RPC daemon at this release runs as dsprcd, so your netstat command should be filtered for dsrpc rather than uvrpc.
This is to enable the UniVerse RPC daemon (which uses port number 31438) and the DataStage RPC daemon (which uses port number 31538) to co-exist.
The daemon can be started manually if you are logged in as root, and attached to the DataStage Engine directory. In that directory is a bin directory, in that directory is the dsrpcd executable.
To start it in normal mode:

Code: Select all

nohup bin/dsrpcd > /dev/null 2>&1 &
To start it in full debug mode:

Code: Select all

nohup bin/dsrpcd -d9 > /tmp/dsrpcd.log 2>&1 &
Once you have figured out in debug mode why it isn't starting, stop it and re-start it in normal mode.
It may be that it is not able to bind a socked (the error message is something like "bind bombed". Then you have to diagnose why the socket is not available (netstat, perhaps).

Posted: Mon Dec 01, 2003 7:13 am
by anupam
Ray,

It does not give me any information. I am not able to conclude anything.
The output keep on varying when i execute the command
nohup bin/dsrpcd > /dev/null 2>&1 &
1. Output
[1] 3901
2. Output
[2] 4012
[1] Done nohup bin/dsrpcd >/dev/null 2>&1

3. Output
[3] 4065
[2] Done nohup bin/dsrpcd >/dev/null 2>&1

Kindly help me

Posted: Tue Dec 02, 2003 9:18 am
by ray.wurlod
You seem to have missed the -d9 for debugging mode.

Posted: Tue Dec 02, 2003 9:29 am
by anupam
Ray,

I have executed this command
"nohup bin/dsrpcd -d9 > /tmp/dsrpcd.log 2>&1 &"
and the output of this is
[1] 13527

Now i am not able to understand what shud i conclude from this.

Posted: Tue Dec 02, 2003 9:34 am
by ray.wurlod
Is that what's on screen, or what's in /tmp/dsrpc.log?

If - as I suspect - it's what's on screen, it means simply that the command has started as a background process with process id (PID) of 13527.

What's in the file /tmp/dsrpc.log? It's a text file; you can read it with cat or more.

Posted: Tue Dec 02, 2003 11:02 am
by Teej
anupam wrote:Ray,

I have executed this command
"nohup bin/dsrpcd -d9 > /tmp/dsrpcd.log 2>&1 &"
and the output of this is
[1] 13527

Now i am not able to understand what shud i conclude from this.
do a "ps -ef |grep dsrpcd"

Is it running under root?

-T.J.

Posted: Tue Dec 02, 2003 11:19 pm
by anupam
Hi T.J,

Yes, dsrpcd is running under root.

Posted: Tue Dec 02, 2003 11:21 pm
by anupam
Ray,

The dsrpcd.log under tmp shows

RPCPID=7512 - 10:35:53 - uvrpc_debugflag=9 (Debugging level)
RPCPID=7512 - 10:35:53 - In rpc_init()
RPCPID=7512 - 10:35:53 - bind bombed errno=125
RPCPID=7512 - 10:35:53 - listen failed

Please suggest.

Posted: Wed Dec 03, 2003 6:54 am
by ray.wurlod
"Bind bombed" means that the process was unable to bind the socket it expected to be able to bind. Probably this is because of a process that had it earlier (an earlier invocation of the RPC daemon that crashed, perhaps) not releasing the socket.
You can verify this with netstat. However, the exact mechanics of releasing the socket varies from UNIX to UNIX, and it's not something I keep in my head! :)
Contact your support provider. This issue has been seen before, so there will be something in the (internal Ascential) knowledge base about it. At worst, rebooting the UNIX machine is guaranteed to clear all sockets.

Posted: Wed Dec 03, 2003 8:12 am
by mhester
Anupam,

Please verify that no other application has decided to connect via port 31538. DataStage does not like to share this port and you will recieve connection errors when this happens even though dsrpcd seems to be up and running. There are many packages out there that will grab all available ports in a range when they start. You might need to employ the help of a sys admin to determine if this condition exists on your system.

We had this very thing happen and received the same errors that you are describing.

Regards,

Michael Hester

Posted: Wed Dec 03, 2003 11:41 am
by Teej
ray.wurlod wrote:You can verify this with netstat. However, the exact mechanics of releasing the socket varies from UNIX to UNIX, and it's not something I keep in my head! :)
For Tru64:

Code: Select all

netstat -a |grep dsrpc
There should be ONLY a LISTEN process present. One should not shut down DataStage if there are other processes, unless you do not care whether something may go wrong.

Can also do a:

Code: Select all

lsof -i |grep dsrpc
To identify processes to kill. :)

-T.J.

Posted: Tue Dec 09, 2003 6:17 am
by anupam
Hi Ray,

Sorry for replying you so late, There is only one listener and that is dsrpc only. The socket is available to datastage only.

please suggest wht shud i do to solve this problem. I am unable to diagnose the problem till now. i rebooted the server but the problem still exists.

Posted: Tue Dec 09, 2003 6:30 am
by ray.wurlod
When you say "rebooted the server" do you mean "DataStage services" or the UNIX machine itself?
There's no way a tied socket can remain so through a UNIX re-boot!
Did you try starting the dsrcd daemon in debug mode after the re-boot? If so, what messages were in the log this time?
Have you read this thread?

Posted: Tue Dec 09, 2003 6:52 am
by anupam
hi Ray,

i am getting this error now,

RPCPID=2259 - 18:08:23 - uvrpc_debugflag=9 (Debugging level)
RPCPID=2259 - 18:08:23 - In rpc_init()
RPCPID=2259 - 18:08:23 - get service by name bombed errno=0
RPCPID=2259 - 18:08:23 - listen failed