Our datastage server has stopped responding to client connections. When i attempt to connect to the server it hangs on the login screen forever and ever.....
This is a GRID server that is fairly new (just development projects/jobs).
I can telnet/ssh to the head node and other nodes in the grid.
I can telnet to the various ports that are required for the server connection.
Upon searching the various threads, i found that the rpc deamon is something to look for.
I did the following
netstat -a | grep dsrpc
and got a LISTEN status.
I did a ps -ef | grep dsrpc and it yielded an output line for dsrpcd.
I have submitted this to the Admin group for resolution, but thought I might ask the gurus here to see if there might be some common items to watch for.
Is there a list of services that one should look for?
What else would cause such a serious condition?
Thanks for your time and input
Troubleshooting unresponsive server
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Ordinarily, in the case of a hang, I'd suggest looking at server name resolution, but the fact that telnet is OK pre-empts that suggestion. Can you try leaving one of the hung connection attempts for, say, at least ten minutes to see whether a timeout error occurs? That may contain a useful diagnostic error code. While it is hanging you might like to try a netstat command on the server.
The next step would be to start dsrpcd with logging, but let's not go there quite yet.
The next step would be to start dsrpcd with logging, but let's not go there quite yet.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
VCInDSX,
First, go to grid_enabled directory (head node) to run the test.sh script to verify that your grid environment is working. If the job running succesfully, you will see information regarding how many nodes (compute nodes names) and partitions, etc. Of course, if the job fails, then you know the "grid" is not even set up correctly let alone able to running DS.
Let me know the result; I will let you know what's next to check.
First, go to grid_enabled directory (head node) to run the test.sh script to verify that your grid environment is working. If the job running succesfully, you will see information regarding how many nodes (compute nodes names) and partitions, etc. Of course, if the job fails, then you know the "grid" is not even set up correctly let alone able to running DS.
Let me know the result; I will let you know what's next to check.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Hi Ray & lstsaur,
Thanks for your invaluable time and input. This had a premature ending today. The admin folks restarted the DS Server last night (this is a new Dev server and not much impact) which "solved" the issue. I am not sure if we might have lost an opportunity to find some issue that might still be lingering in the background.
However, to answer your questions.
The clients connect to the head node - in our setup.
I left one login attempt to hang on for as much as it took. It did not return for 2 hours. I had to kill that instance. This was before I posted my query in this forum. By the time I came back here, the restart had completed. I will use this tip the next time, if we have one.
As for the test.sh, just for sake of my own learning, i tried to execute and was able to see the output of a test job that it ran and all the outputs of Peek. I will try this out if we run into the same issue again. Thanks for the tip.
Thanks again for your help
Thanks for your invaluable time and input. This had a premature ending today. The admin folks restarted the DS Server last night (this is a new Dev server and not much impact) which "solved" the issue. I am not sure if we might have lost an opportunity to find some issue that might still be lingering in the background.
However, to answer your questions.
The clients connect to the head node - in our setup.
I left one login attempt to hang on for as much as it took. It did not return for 2 hours. I had to kill that instance. This was before I posted my query in this forum. By the time I came back here, the restart had completed. I will use this tip the next time, if we have one.
As for the test.sh, just for sake of my own learning, i tried to execute and was able to see the output of a test job that it ran and all the outputs of Peek. I will try this out if we run into the same issue again. Thanks for the tip.
Thanks again for your help
-V