DSXchange

Posted: **Fri Sep 10, 2010 1:06 pm**

Gurus,

Off late, we are having all these processes which got created during the job runs getting stuck/hung up. The problem is, they are chewing up a lot of CPU. There are also a lot of locks being created which are not getting released. I have an earlier post about this problem.

Just today, we had a stray process which was eating up 97% of CPU. This was traced back to a terminated DataStage Director session which was terminated because it got hung up while trying to look for a job. However, the person went into the web console and disconnected the session. The stray process, when looked in Linux is as follows:

Code: Select all

iisexecp 12766 12765  0 08:27 ?        00:00:00 dsapi_slave 8 7 0
iisexecp 12897 12766 97 08:29 ?        05:58:10 SH -c at -l

This brings me to my questions.
>> What is this "SH -c at..."?
>> Why is it running when we are just browsing for a job? This particular "SH..." starts even if we are running jobs too.
>> Why is it still running when the session is terminated?

P.S: I do not mean to create multiple posts for this, so please feel free to tell me if my question falls into the same realm as this post so that I have everything under one post.

Looking forward to your replies... :D

Posted: **Fri Sep 10, 2010 6:07 pm**

at -l command suggests that the Director was open in schedule view, and the user probably got impatient waiting for the result of this command. dsapi_slave is the agent process of the Director client connection, and therefore the parent process of the at command.

Posted: **Sat Sep 11, 2010 9:59 am**

Ray - You are right on. That is what has happened for this particular process. However, it took 15 minutes for it to return anything, so he killed/terminated it.

But, my confusion/question is, why did it keep running even after he terminated that connection from Web console? Also, this is happening a lot even when jobs run. This is becoming a huge problem and even at one time, it locked all of us out from logging in to the director in Production. We had to wait until the whole process is done (which is like a few hours) to log-in and monitor stuff.

These stray processes have become more of a norm right now and the Linux admins are contacting us every time this is happening, which is almost every day now. We rebooted the machine a lot of times in hope that we can avoid these, but they have been in vain.

How to avoid these in the future or more practically, how to reduce these occurrences?

Posted: **Sat Sep 11, 2010 5:56 pm**

Stay on the case with your official support provider.

Meanwhile, do you have the DataStage deadlock daemon running? It has a secondary task of cleaning up orphaned processes. (Note that processes waiting on database responses are not orphaned.)

Posted: **Sun Sep 12, 2010 6:43 am**

ray.wurlod wrote:...do you have the DataStage deadlock daemon running?

Thanks Ray, for the response. I do not know anything about the "dslockd" process. I am unsure if I could check whether it should run continously or it should/will run only during job execution. Can you provide some more info about this daemon process. Also, is there any harm in running it at any point of time?

Posted: **Sun Sep 12, 2010 6:44 am**

Forgot to mention one more thing, I did:

Code: Select all

ps -ef | grep dslockd

and it returned nothing. But, I ran this when there were no jobs running.

Posted: **Sun Sep 12, 2010 7:24 am**

vivekgadwal wrote:I do not know anything about the "dslockd" process.

Well, it seems you know enough about the deadlock daemon to know that the actual process name is 'dslockd'.

When on it would run 'all the time', mostly sleeping but waking up every 'x' (a configurable parameter) to check for locks, seeing as how your grep returned nothing it must not be enabled. A search here should turn up the syntax needed for that, something I don't recall off the top of my head.

Posted: **Sun Sep 12, 2010 2:43 pm**

There's a file in the DSEngine directory called dsdlockd.config. In here you can configure the daemon to start automatically when the engine starts.

Posted: **Sun Sep 12, 2010 4:03 pm**

chulett wrote: Well, it seems you know enough about the deadlock daemon to know that the actual process name is 'dslockd'.

I did a quick little search before I posted and so it appears that I know the name of it, but in reality that was the first time I saw this name.

I opened up the file "dsdlockd.config". This is how it looks right now:

Code: Select all

start=0
timer=900
res=0
log=

What do these entries mean? How should I change it? Since this being production, I would like to be extra sure about it. Can you please help me with that?

Also, a question here. This file is the same in Prod and other box (we occasionally use it for testing too) and it seems that there is not much of this problem there. How do I construe this?

Thanks.

Posted: **Sun Sep 12, 2010 4:11 pm**

Simplest answer is you change it so that "start" is a "1" for true rather than the zero that means false. The timer means it will check every 900 seconds - 15 minutes. Not sure about "res" (resident?) but log would allow you to override the default log file name/location, I assume.

As to the test versus production question, it's all about your access. Less access there, less incidents of people doing silly things like terminating connections would equate to less of a need for a process like the deadlock daemon.

Posted: **Sun Sep 12, 2010 5:01 pm**

To complete what Craig surmised, res stands for "resolution", a coded value for how to resolve deadlocks in the "UniVerse" database (you can kill the process with the newest-begun transaction, you can kill the process with the fewest locks in a transaction, you can kill one of the deadlocked processes at random). Not really relevant for DataStage.

The log parameter allows you to specify a non-default location for the dsdlockd.log file (its default location is in $DSHOME, and that's usually OK).

Incidentally, it's dsdlockd (not dslockd) so you might want to try your ps command again.

Posted: **Sun Sep 12, 2010 5:47 pm**

Ah, yes... thanks for the clarification on the exact spelling of the daemon and what "res" stands for. I remember it now but it has been awhile so the details were not forthcoming. I never saw a need to change that setting and honestly never really realized it wasn't relevant for DataStage.

Posted: **Mon Sep 13, 2010 12:15 pm**

Thank you very much, Ray and Craig, for your responses.

ray.wurlod wrote:Incidentally, it's dsdlockd (not dslockd) so you might want to try your ps command again.

My understanding from Craig's post is that if the "start" value is '0', then it means that the "dsdlockd" process is not ON. Am I right in understanding that?

I executed ps again just in case and I did not find the process running (no jobs were running at that time). Another question is, whenever we reboot the server, do we have to come in and manually set the values in this file "dsdlockd.config"?

[As an FYI, our Linux admin did a detailed analysis and check on his end to see how the server is configured and he found no issues on that end. So, he says the onus is on us (DataStage team) to get the configurations right.]

Posted: **Mon Sep 13, 2010 12:36 pm**

Ray just meant you missed a "d" in the name you grepped for, so you should correct the spelling and try again to be certain.

Posted: **Mon Sep 13, 2010 1:24 pm**

Got it. I did ps and I did not find it in prod, development and test environments. I am in process of trying to convince people to do this as I cannot directly implement this change in Prod. The thing is, I am not sure if there will be any repercussions elsewhere in the processing if I enable this dsdlockd in Prod.

The curious thing here is that nothing is different between the test/devl/prod environments, yet the jobs were a lot slow in prod versus running in different boxes. That is another part that is confusing to me. However, thanks for your help and I will keep you all posted.

DSXchange

Stray processes created by DataStage

Stray processes created by DataStage