NodeAgents.sh start fails

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
manuel.gomez
Premium Member
Premium Member
Posts: 291
Joined: Wed Sep 26, 2007 11:23 am
Location: Madrid, Spain

NodeAgents.sh start fails

Post by manuel.gomez »

Hello all,

I am having problems restarting Datastage aggents

We had a crash on our server, and we had to switch it off and make some hardware interventions on it

After the server was restarted, we tried to restart Datastage services.
DB2 and WAS are perfectly running, but we cannot restart Datastage aggents

This is my output:
./NodeAgents.sh start
Starting LoggingAgent...
The LoggingAgent.sh process stopped unexpectedly.
I checked logs, and this is the only thing I could find:
cat logging-agent.err
2009-05-27 11:25:35,700 ERROR [ISF-LOGGING-AGENT] [0006] Could not start LoggingAgent. Exiting because there is something wrong with the LoggingAgent.

com.ascential.acs.logging.agent.LoggingAgentInternalException: Internal exception in LoggingAgent.
at com.ascential.acs.logging.agent.LoggingAgentSocketImpl.<init>(LoggingAgentSocketImpl.java:60)
at com.ascential.acs.logging.agent.LoggingAgentSocketImpl.main(LoggingAgentSocketImpl.java:110)
Caused by: java.net.BindException: Address already in use (errno:226)
at java.net.PlainSocketImpl.socketBind(Native Method)
at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:331)
at java.net.ServerSocket.bind(ServerSocket.java:324)
at java.net.ServerSocket.<init>(ServerSocket.java:186)
at java.net.ServerSocket.<init>(ServerSocket.java:97)
at com.ascential.acs.logging.agent.LoggingAgentSocketImpl.<init>(LoggingAgentSocketImpl.java:58)
... 1 more
cat logging-agent.out
2009-05-27 11:25:35,562 INFO [ISF-LOGGING-AGENT] [0020] Authenticated.
2009-05-27 11:25:35,585 INFO [ISF-LOGGING-AGENT] [0026] Batch events set to "true".
2009-05-27 11:25:35,587 INFO [ISF-LOGGING-AGENT] [0033] Number of events to batch set to "100".
2009-05-27 11:25:35,588 INFO [ISF-LOGGING-AGENT] [0027] Length of time to wait before flushing batched events set to "60000".
2009-05-27 11:25:35,588 INFO [ISF-LOGGING-AGENT] [0029] Cache config set to "true".
2009-05-27 11:25:35,588 INFO [ISF-LOGGING-AGENT] [0030] Length of time to wait before refreshing cache set to "60000".
2009-05-27 11:25:35,588 INFO [ISF-LOGGING-AGENT] [0028] Buffer events if LoggingService cannot be reached set to "true".
2009-05-27 11:25:35,588 INFO [ISF-LOGGING-AGENT] [0034] Number of events to buffer set to "1000".
2009-05-27 11:25:35,588 INFO [ISF-LOGGING-AGENT] [0035] Length of time to wait before trying to reconnect with the LoggingService set to "5000".
On the same directory (/ASBNode/bin), the file LoggingAgent.out is left absolutely empty

Do you have an idea on what can be happening?

Thanks very much
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Manuel,

I found that messages get written to all sorts of files at V8 and the message that one is looking for is seldom in the file where you might expect it. What I did was to get rid of all log files before a restart and, after the error occurs, to search them all for error messages.

Can you start/stop the DataStage server (.../bin/uv -admin -start) or does that cause an error as well?

What about the other program - does that work (I can't recall the name, but the one where you need to enter a userid and password), does that start?
manuel.gomez
Premium Member
Premium Member
Posts: 291
Joined: Wed Sep 26, 2007 11:23 am
Location: Madrid, Spain

Post by manuel.gomez »

ArndW wrote:Can you start/stop the DataStage server (.../bin/uv -admin -start) or does that cause an error as well?
Yes, datastage stops and starts perfectly
Actually, with admin user, I can log on the server!!!!
ArndW wrote:What about the other program - does that work (I can't recall the name, but the one where you need to enter a userid and password), does that start?
I dont know what problem you are talking about
I started DB2 and WAS (starServer.sh) with no issues
Then I tried NodeAgents.sh and failed (as described above)
Finally, datastage could be started (uv -admin -start)

Thanks for your help!
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I was referring to the "StartServer" command, I missed that in your original post.

I guess it boils down to the
Caused by: java.net.BindException: Address already in use (errno:226)
line in the log. Could it be that there are still processes out there that might be blocking that socket?
manuel.gomez
Premium Member
Premium Member
Posts: 291
Joined: Wed Sep 26, 2007 11:23 am
Location: Madrid, Spain

Post by manuel.gomez »

I dont really know, could you please indicate me if there is any way to locate this possible process?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I would need to look around as well. Perhaps start with "netstat -a | grep ds" to see if anything suspicious shows up. Are you using "dsadm" or "dsrun" or some similar userid and does "ps -ef | grep {userid}" show any pids?
manuel.gomez
Premium Member
Premium Member
Posts: 291
Joined: Wed Sep 26, 2007 11:23 am
Location: Madrid, Spain

Post by manuel.gomez »

This is what I found:
netstat -a | grep ds
tcp 0 0 esxha553.dsrpc 10.225.139.8.3810 ESTABLISHED
tcp 0 0 esxha553.dsrpc 10.225.139.74.2320 ESTABLISHED
tcp 0 0 esxha553.dsrpc 10.225.139.8.3692 ESTABLISHED
tcp 0 0 *.dsrpc *.* LISTEN
tcp 0 0 esxha553.dsrpc 10.225.139.188.4370 ESTABLISHED
tcp 0 0 esxha553.dsrpc 10.225.139.188.4380 ESTABLISHED
tcp 0 0 esxha553.dsrpc 10.225.139.74.2328 ESTABLISHED
tcp 0 0 esxha553.dsrpc 10.225.139.161.2954 ESTABLISHED

ps -ef | grep 114 (userid from dsadm)

dsadm 21222 21143 0 16:48:08 ? 0:00 sshd: dsadm@pts/13
root 21143 1248 0 16:48:02 ? 0:00 sshd: dsadm [priv]
dsadm 21941 21224 0 16:52:15 pts/13 0:00 grep 114

Does this tell you anything?
Thanks a lot
shareeman
Charter Member
Charter Member
Posts: 39
Joined: Tue Sep 23, 2003 6:09 am

Post by shareeman »

Hi Manuel
Did you get to the bottom of this issue?
Kindly update status on this one as i've got the same issue, bizzarely enough on only one of our servers whilst the rest are all hunky dory !

Many Thanks
shareeman
Charter Member
Charter Member
Posts: 39
Joined: Tue Sep 23, 2003 6:09 am

Post by shareeman »

This was a user related issue. The ASB Agent had been started using a different user and stopped using another. The Stop didn't really work so needed to be killed and restarted by the original user/owner.
Not sure if this was the same reason for Manuel's problem though.
manuel.gomez
Premium Member
Premium Member
Posts: 291
Joined: Wed Sep 26, 2007 11:23 am
Location: Madrid, Spain

Post by manuel.gomez »

shareeman wrote:Not sure if this was the same reason for Manuel's problem though.
Exactly the same!
8)
mosd
Participant
Posts: 9
Joined: Wed Jun 24, 2009 3:23 am

Post by mosd »

I have the same problem, i tried grapping for the agent and is not running so i can't kill it to start it with the owner:
ps -ef | grep ASB
shareeman
Charter Member
Charter Member
Posts: 39
Joined: Tue Sep 23, 2003 6:09 am

Post by shareeman »

try ps -ef | grep agent

that should show up two processes, loggingAgent and AgentImpl, you should then be able to use the Process ID's to Kill them

Let us know if that works
mosd
Participant
Posts: 9
Joined: Wed Jun 24, 2009 3:23 am

Post by mosd »

The problem resolved.
The process was not running at all...not sure exactly what resolved the problem, But here is what i did:

1. i exported this variable because Oracle stage needed it according to the article on this link:
export BEQUEATH_DETACH=YES
http://publib.boulder.ibm.com/infocente ... dskpw.html
2. Then i deleted the log files, i.e. Logging-Agent.err and Logging-Agent.ot

and restarted both the engine and after that when i started the agent, it worked.
asyafrudin
Participant
Posts: 16
Joined: Thu Oct 21, 2010 1:40 am
Location: Indonesia
Contact:

Post by asyafrudin »

shareeman wrote:try ps -ef | grep agent

that should show up two processes, loggingAgent and AgentImpl, you should then be able to use the Process ID's to Kill them

Let us know if that works
Exactly what I need to solve my problem.

Though I'm still wondering what the two processes showed up in the first place. :?
Perfection is not about making no mistakes. Perfection is about fixing your mistakes.
Post Reply