Page 1 of 1

The Job is runnning and I can't stop it

Posted: Sat Apr 16, 2005 4:36 am
by alraaayeq
Hi,


I am wondering how to "force kill" a running job?


I ran a job that gets in loops untill it found the killing tag ( using waitforfile function), the problem is that each time the loop started it waits for X number of seconds, by mistake the code is configured to run 60 hours!!!


Err, I can't wait all that long, I tried to use 'ps -ef' stuff in order to use "kill -9" and I could not found it in the list, also, clicking the stop button in DataStage Director is doing nothing, and releasing locks using DataStage Administrator DS.TOOLS did not do any thing (unless I did it wrongly!)?



What can I do ? did I miss any thing ( except that I ran it for 60 hours ;) ) ?

Posted: Sat Apr 16, 2005 9:55 am
by ArndW
Try to avoid the kill -9 whenever possible so that you don't have locks lying around. You should be able to find your process in ps -ef; if not then it might be really gone and all you have to do is reset the the process flags in the Administrator.

Posted: Sat Apr 16, 2005 10:10 am
by Sainath.Srinivasan
Set the next job in the sequence to 'Not Compiled' status and create the seq file so the 'wait-for-file' becomes successful and fail in the next event causing the seq to abort by itself.

Posted: Sat Apr 16, 2005 11:36 pm
by alraaayeq
ArndW wrote:Try to avoid the kill -9 whenever possible so that you don't have locks lying around. You should be able to find your process in ps -ef; if not then it might be really gone and all you have to do is reset the the process flags in the Administrator.
resetting the process flag is something new to me and I could not found any hints for it, can you please enlighten me by more details.

Posted: Sat Apr 16, 2005 11:41 pm
by alraaayeq
Sainath.Srinivasan wrote:Set the next job in the sequence to 'Not Compiled' status and create the seq file so the 'wait-for-file' becomes successful and fail in the next event causing the seq to abort by itself.
Yes ,what you said is true, but since the job is SLEEPING ( using sleep command) the 60 hours will completed even though I put the killing tag or make the child job not in runnable state.

Posted: Sun Apr 17, 2005 4:51 pm
by ray.wurlod
You've just discovered why it's not best practice to use long periods of SLEEP.

Re-code the job so that it wakes up occasionally to determine whether any signals or other notifications have been received.

Posted: Sun Apr 17, 2005 11:46 pm
by alraaayeq
ray.wurlod wrote:You've just discovered why it's not best practice to use long periods of SLEEP.

Re-code the job so that it wakes up occasionally to determine whether any signals or other notifications have been received.
Yes, I will be in trouble if I am using long periods of Sleep. But, this is the only way I can schedule the job to run every 2 hours!! (without using 3rd party tool)

So, I can conclude that "force Killing" is something not applicable with Ascential DS running jobs!

Posted: Mon Apr 18, 2005 2:45 am
by Sainath.Srinivasan
You can do the following:
Wait-for-file for 60 mins -> if found then exit else continue

Posted: Mon Apr 18, 2005 4:01 am
by alraaayeq
Sainath.Srinivasan wrote:You can do the following:
Wait-for-file for 60 mins -> if found then exit else continue

That is better idea :shock:, using SLEEP is bugging me , do you believe that sleeping 3600 seconds takes less than 20 minutes sometimes, and sometimes took unpredictable times periods, it is realy weird behaviour.



BTY, the major topic that I am trying to figure out is Killing tasks no matter what the task is doing without the need to restart the server. :?: :?:

Posted: Mon Apr 18, 2005 4:16 am
by Sainath.Srinivasan
Sleep and Nap uses the computer cycles to calculate the time. You can trace unix sleep.

If you intend to kill or release process, you will need to have access to something like DS.TOOLS, dssh etc, access to dsadm user and also good knowledge of who is doing what in the system. It is not recommended if you cannot trace your process in full.

Posted: Mon Apr 18, 2005 5:34 am
by ray.wurlod
There's a second form of the SLEEP statement, that sleeps until a particular time. Note the following code fragment.

Code: Select all

* Wake every five minutes to check whether any notifications.
Now = Time()
PrevTime = Now

* Exit loop when two hours have elapsed.
ExitTime = Now + 7200
If ExitTime > 86400 Then ExitTime -= 86400

Loop
   GoSub CheckNotifications
   If Notified Then Exit
   NextTime = Now + 300
   Sleep Oconv(NextTime, "MT:")  ; * SLEEP hh:mm
   PrevTime = NextTime
   If PrevTime <= 86400 And NextTime > 86400 Then NextTime -= 86400
While NextTime <= ExitTime
Repeat

Posted: Mon Apr 18, 2005 7:15 am
by alraaayeq
Sainath.Srinivasan wrote:Sleep and Nap uses the computer cycles to calculate the time. You can trace unix sleep.

If you intend to kill or release process, you will need to have access to something like DS.TOOLS, dssh etc, access to dsadm user and also good knowledge of who is doing what in the system. It is not recommended if you cannot trace your process in full.
I dont know why DS.TOOLS did not help me, I wonder if I used it wrongly
:?
can you explain more, what the releasing locks and killing jobs features in DS.TOOLS that is useful to force kill running job (am not aware of !)?

Posted: Mon Apr 18, 2005 2:19 pm
by newtier
You should be able to kill the specific process. Suppose your job name is: mySleepJob

Issue the command: ps -ef | grep mySleepJob (or any portion of the string)

If the job is actually running, you should see the process id (on the left) and the parent's process id (DataStage engine probably) on the right.

Suppose the process id is: 2329392

Issue the command: kill 2329392

Likewise you can go into the the DataStage "universe" (and let's not start the semantic argument over again that it's not really universe) and find the processes for the "user". Then you can use Universe commands to end the processes for that user.

Posted: Mon Apr 18, 2005 6:05 pm
by ray.wurlod
The "end process" from Director and from DS.TOOLS, and the "UniVerse" command (MASTER OFF) all send a signal (kill -15).

If the process is not using any CPU cycles (for example if it is sleeping) it will not be able to service this signal. This is why "log out process" sometimes appears not to work.

You can wait for a while (say five minutes) to see whether the process wakes and is able to process the signal. Most folks, however, demand instant gratification, and resort to an immediate assumption that it's not going to work, then reach immediately for a larger hammer (kill -9), then wonder why they have to clean up locks, open files and other resources that the non-ignorable signal did not give the process opportunity to clean up gracefully.

Posted: Tue Apr 19, 2005 1:17 pm
by alraaayeq
ray.wurlod wrote:....... Most folks, however, demand instant gratification, and resort to an immediate assumption that it's not going to work, then reach immediately for a larger hammer (kill -9), then wonder why they have to clean up locks,......
Yah, larger hammer :twisted:

I did it, with the help of biggest hammer , which is restarting the daemon..... :cry:

I met a 4+ years expert guy, he surprised why the process still printing logs (under director) where there is no resources used by this Job. Also; you can reset it and run it again, but if you want to recompile , a message popped and said " this job might be monitored" or something like that...


That's what I call " frustration "