Testing a stop in the DS services

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
gpbarsky
Participant
Posts: 160
Joined: Tue May 06, 2003 8:20 pm
Location: Argentina

Testing a stop in the DS services

Post by gpbarsky »

Hi.

I made the following test:

1) Start my jobs in order to check if there are transaction to process.
2) Stop DS services. The jobs continue executing. In the server, there are several tasks like "uvsh.exe" (I guess that really these tasks are the running jobs).
3) I test the functionality of the jobs: they work fine, even of the services are down.
4) Stop of the server scheduler: this did not affect the test.
5) With the jobs running, the administrator of DS kill the uvsh.exe tasks.
6) Ths jobs are not working yet, but they are in a "Running" status.

Summary: the running jobs are not affected by a stop of DS services. They continue running because of the uvsh.exe tasks. But, if the server falls down the uvsh.exe tasks are killed, but the status of the running jobs are not updated. The connection between DS engine and the server is broken.

The question is: how can you repare such situation ?

What I wanted to do is to have an automatic startup procedure, that in case that the server is down, when it starts up the procedure is automatically triggered and it starts all the jobs that I need to have running.

I hope to be clear.

Thanks in advance.


Guillermo P. Barsky
Buenos Aires - Argentina
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Because you specify uvsh.exe, I know that you are on a Windows platform. On this platform, the DataStage services are set up, by default, for automatic start when the system is restarted.

Job and stage status is updated into the RT_STATUSxx tables by running jobs. It is entirely reasonable that, if you kill those processes, the status is not updated. I assume you kill them with Task Manager, or a kill -9 if you have this command available. This does not give the process any opportunity to "clean up". As well as not updating status, it can also leave locks set, files open, and other undesirable situations.

You should shut down DataStage jobs with a Stop request (from Director or via dsjob -stop) BEFORE shutting down services.

As for automatic restart of jobs, you can achieve this by using the dsjob command from a regular Windows startup script. Simply put the command (or a short cut to it, or to a BAT file that runs it) into the Startup folder.

It's up to you, and beyond the scope of this reply, to guarantee that the jobs are, in fact, restartable, and that there exists no corruption in the repository as a result of your unorthodox shutdown procedures.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
gpbarsky
Participant
Posts: 160
Joined: Tue May 06, 2003 8:20 pm
Location: Argentina

Post by gpbarsky »

Ray:

Let me explain that the test was not a procedure. It was just a test in our development environment in order to prevent the same situation in production environment.

You know that the "stop" button in the Director screen sometimes works and other not.

It's a good idea to put something in the startup. I was trying to use the dsjob command. I had some problems with parameters.

Do you have an example of how to use it ?

Thanks for your comments.


Guillermo P. Barsky
Buenos Aires - Argentina
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

You have done some due diligence and seen some issues that come up. Sometimes a job stream has its job components in an unusable state that prevents the job stream from properly executing. There is no mass-cleanup facility for all of the jobs within a project. Because a job has the responsibility of maintaining its status tables (the ones Director queries), when the job tragically dies without a change of updating those tables, then the status is never cleaned up. There is the Director menu pull down cleanup job item, but to do this on N number of jobs is time consuming, not to mention that ARE YOU REAAALLY SURE message gets irritating after 10 jobs, and back to comical after 30 jobs. This is why a mass-compile utility comes in really handy at that time.

Points to ponder:
1. Stopping the services won't necessarily kill all executing jobs. There are times jobs are busy doing something and just won't die.
2. Stopping the scheduler has NO effect on jobs. In unix, it's cron, on Wintel, it's AT. cron and AT have no control over the tasks they have executed.
3. I don't know how you test the functionality of jobs if the services are down. Your clients cannot be connected to the server. The most you could see is output results to the targets.
4. Mass-repair of errant-state jobs is easily accomplished with the many mass-compile utilities and tools available. Shameless plug for Compile-All.
5. Never stop DS services without: (1) closing all active clients (2) stopping all jobs or waiting to finish


Kenneth Bland
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Guillermo,

The dsjob command can take as many -param name=value options as you need to give it. However, it's rather more difficult using dsjob to ensure that jobs are in a runnable state. You can use the -jobinfo option, and parse the result to determine the exit status. But, because of what you did to "break" DataStage, this is likely to show as RUNNING, since it's obtained from the same place as the Director client gets the information. What do you do now? If the BAT file is only ever executed from the Startup folder, you can assume that the job is not, in fact, running, run it in RESET mode then run it in NORMAL mode. But, before DataStage will let you do that, you would need to change the status value in appropriate records in the RT_STATUSxx file for that particular job. Gets messy, doesn't it? (It's even worse if you're using multiple instance jobs!)
In short, follow the advice in point 5 at the end of the previous post.


Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
Post Reply