Mike
The scripts are in /etc depending on which version of UNIX and which version of DataStage. In older versions of DataStage then the scripts are the same as Universe. I am doing all this from memory. I do not have access to a UNIX box to check but it looks something like this. The file has "uv.rc" in the name. It usually is in /etc/rc2.d. As UNIX loads or shutdowns this shell script is executed. To run manually then add "stop" as an argument like uv.rc stop. Usually these scripts do not have execute permission so you need to fix that. You can edit this script and see what it does. It kills all Universe processes and their associated shared memory segements. You need to be root to restart DataStage. I think in newer versions of DataStage this script is called "ds.rc" but I am not sure. Good Luck.
Thanks Kim.
Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com
Shutdown
Moderators: chulett, rschirm, roy
One thing to keep in mind is the 'rc' scripts in etc need to be run by 'root'. This can be problematic when done outside of server startup / shutdown situations, as the people responsible for DS may not have root privledges.
If you are running version 6 (or maybe even 5.1+) the concept of a 'dsadm' user was introduced to mitigate this exact problem. The new syntax for starting and stopping the server is as follows (snipped from the pdf docs):
Stopping and Restarting the Server Engine
From time to time you may need to stop or restart the DataStage server engine manually, for example, when you wish to shut down the physical server.
A script called uv is provided for these purposes.
To stop the server engine, use:
# dshome/bin/uv -admin -stop
This shuts down the server engine and frees any resources held by server engine processes.
To restart the server engine, use:
# dshome/bin/uv -admin -start
This ensures that all the server engine processes are started correctly. You should leave some time between stopping and restarting. A minimum of 30 seconds is recommended.
-craig
If you are running version 6 (or maybe even 5.1+) the concept of a 'dsadm' user was introduced to mitigate this exact problem. The new syntax for starting and stopping the server is as follows (snipped from the pdf docs):
Stopping and Restarting the Server Engine
From time to time you may need to stop or restart the DataStage server engine manually, for example, when you wish to shut down the physical server.
A script called uv is provided for these purposes.
To stop the server engine, use:
# dshome/bin/uv -admin -stop
This shuts down the server engine and frees any resources held by server engine processes.
To restart the server engine, use:
# dshome/bin/uv -admin -start
This ensures that all the server engine processes are started correctly. You should leave some time between stopping and restarting. A minimum of 30 seconds is recommended.
-craig
I was curious as we've been running DataStage on a Compaq Tru64 TruCluster environment since at least version 4.
We've had the DS server failover between nodes, both by failures and manually and haven't had a problem with port hangs... for whatever that is worth. [:)]
Some points of consideration for your script:
Since the cluster scripts run as root in our environment, we stuck with the 'S99ds.rc' script for startup and shutdown. Not sure why I mention that. [:I] You'll be fine with the 'admin' route, I would think.
We also save the crontab for our DataStage jobs during shutdown and reload them during failover so we don't lose our scheduling information on the new node. This is easy for us as all jobs are scheduled under a single user.
"netstat -a" can be quite slow. We used to use it as a 'status check' in our cluster scripts, but it literally took too long to be practical. I also believe that with 5.2+ you should be grepping for 'dsrpc' instead of 'uvrpc' any more.
In your CONNECTIONS check, IMHO you should change your command to:
for SESSIONS in `ps -ef | grep dscs | grep -v grep | cut -b 9-15`
This will remove the the actual grep command from the output, which otherwise will get caught up in it.
I'm not sure your checking will really be necessary, but since it is all happening post-shutdown, I doubt it can be classified as 'dangerous', or there is any chance of corruption.
-craig
We've had the DS server failover between nodes, both by failures and manually and haven't had a problem with port hangs... for whatever that is worth. [:)]
Some points of consideration for your script:
Since the cluster scripts run as root in our environment, we stuck with the 'S99ds.rc' script for startup and shutdown. Not sure why I mention that. [:I] You'll be fine with the 'admin' route, I would think.
We also save the crontab for our DataStage jobs during shutdown and reload them during failover so we don't lose our scheduling information on the new node. This is easy for us as all jobs are scheduled under a single user.
"netstat -a" can be quite slow. We used to use it as a 'status check' in our cluster scripts, but it literally took too long to be practical. I also believe that with 5.2+ you should be grepping for 'dsrpc' instead of 'uvrpc' any more.
In your CONNECTIONS check, IMHO you should change your command to:
for SESSIONS in `ps -ef | grep dscs | grep -v grep | cut -b 9-15`
This will remove the the actual grep command from the output, which otherwise will get caught up in it.
I'm not sure your checking will really be necessary, but since it is all happening post-shutdown, I doubt it can be classified as 'dangerous', or there is any chance of corruption.
-craig
Mike
kill -9 is definitely not the recommended way and can corrupt Universe files. It almost always leaves locks on files. kill -15 is the recommended way to kill a process.
Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com
kill -9 is definitely not the recommended way and can corrupt Universe files. It almost always leaves locks on files. kill -15 is the recommended way to kill a process.
Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com
Hmmm.. missed that one. "kill -9" (SIGKILL) is harsh and as Kim notes should only be used as a last resort, as it cannot be caught and can leave files open. Signal 15 (SIGTERM) tells it to shut things down before it dies, so is more 'graceful'.
I've seen people do the 15, wait a moment and then loop back thru with 9s on the stubborn, undead ones.
-craig
I've seen people do the 15, wait a moment and then loop back thru with 9s on the stubborn, undead ones.
-craig
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Take a look at the ndd command too. It gives information on the status of these ports. For example:
ndd /dev/tcp
And you can change the timeout on the ports, which may help to reduce the delay. From memory the default (on Solaris) is 40000 msec.
Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
ndd /dev/tcp
And you can change the timeout on the ports, which may help to reduce the delay. From memory the default (on Solaris) is 40000 msec.
Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518