Stray processes created by DataStage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Stray processes created by DataStage

Post by vivekgadwal »

Gurus,

Off late, we are having all these processes which got created during the job runs getting stuck/hung up. The problem is, they are chewing up a lot of CPU. There are also a lot of locks being created which are not getting released. I have an earlier post about this problem.

Just today, we had a stray process which was eating up 97% of CPU. This was traced back to a terminated DataStage Director session which was terminated because it got hung up while trying to look for a job. However, the person went into the web console and disconnected the session. The stray process, when looked in Linux is as follows:

Code: Select all

iisexecp 12766 12765  0 08:27 ?        00:00:00 dsapi_slave 8 7 0
iisexecp 12897 12766 97 08:29 ?        05:58:10 SH -c at -l
This brings me to my questions.
>> What is this "SH -c at..."?
>> Why is it running when we are just browsing for a job? This particular "SH..." starts even if we are running jobs too.
>> Why is it still running when the session is terminated?

P.S: I do not mean to create multiple posts for this, so please feel free to tell me if my question falls into the same realm as this post so that I have everything under one post. :)

Looking forward to your replies... :D
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

at -l command suggests that the Director was open in schedule view, and the user probably got impatient waiting for the result of this command. dsapi_slave is the agent process of the Director client connection, and therefore the parent process of the at command.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Post by vivekgadwal »

Ray - You are right on. That is what has happened for this particular process. However, it took 15 minutes for it to return anything, so he killed/terminated it.

:?: But, my confusion/question is, why did it keep running even after he terminated that connection from Web console? Also, this is happening a lot even when jobs run. This is becoming a huge problem and even at one time, it locked all of us out from logging in to the director in Production. We had to wait until the whole process is done (which is like a few hours) to log-in and monitor stuff.

These stray processes have become more of a norm right now and the Linux admins are contacting us every time this is happening, which is almost every day now. We rebooted the machine a lot of times in hope that we can avoid these, but they have been in vain.
:?: How to avoid these in the future or more practically, how to reduce these occurrences?
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Stay on the case with your official support provider.

Meanwhile, do you have the DataStage deadlock daemon running? It has a secondary task of cleaning up orphaned processes. (Note that processes waiting on database responses are not orphaned.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Post by vivekgadwal »

ray.wurlod wrote:...do you have the DataStage deadlock daemon running?
Thanks Ray, for the response. I do not know anything about the "dslockd" process. I am unsure if I could check whether it should run continously or it should/will run only during job execution. Can you provide some more info about this daemon process. Also, is there any harm in running it at any point of time?
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Post by vivekgadwal »

Forgot to mention one more thing, I did:

Code: Select all

ps -ef | grep dslockd
and it returned nothing. But, I ran this when there were no jobs running.
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

vivekgadwal wrote:I do not know anything about the "dslockd" process.
Well, it seems you know enough about the deadlock daemon to know that the actual process name is 'dslockd'. :wink:

When on it would run 'all the time', mostly sleeping but waking up every 'x' (a configurable parameter) to check for locks, seeing as how your grep returned nothing it must not be enabled. A search here should turn up the syntax needed for that, something I don't recall off the top of my head.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There's a file in the DSEngine directory called dsdlockd.config. In here you can configure the daemon to start automatically when the engine starts.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Post by vivekgadwal »

chulett wrote: Well, it seems you know enough about the deadlock daemon to know that the actual process name is 'dslockd'. :wink:
:lol: I did a quick little search before I posted and so it appears that I know the name of it, but in reality that was the first time I saw this name. :)

I opened up the file "dsdlockd.config". This is how it looks right now:

Code: Select all

start=0
timer=900
res=0
log=
What do these entries mean? How should I change it? Since this being production, I would like to be extra sure about it. Can you please help me with that?

Also, a question here. This file is the same in Prod and other box (we occasionally use it for testing too) and it seems that there is not much of this problem there. How do I construe this?

Thanks.
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Simplest answer is you change it so that "start" is a "1" for true rather than the zero that means false. The timer means it will check every 900 seconds - 15 minutes. Not sure about "res" (resident?) but log would allow you to override the default log file name/location, I assume.

As to the test versus production question, it's all about your access. Less access there, less incidents of people doing silly things like terminating connections would equate to less of a need for a process like the deadlock daemon.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

To complete what Craig surmised, res stands for "resolution", a coded value for how to resolve deadlocks in the "UniVerse" database (you can kill the process with the newest-begun transaction, you can kill the process with the fewest locks in a transaction, you can kill one of the deadlocked processes at random). Not really relevant for DataStage.

The log parameter allows you to specify a non-default location for the dsdlockd.log file (its default location is in $DSHOME, and that's usually OK).

Incidentally, it's dsdlockd (not dslockd) so you might want to try your ps command again.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ah, yes... thanks for the clarification on the exact spelling of the daemon and what "res" stands for. I remember it now but it has been awhile so the details were not forthcoming. I never saw a need to change that setting and honestly never really realized it wasn't relevant for DataStage.
-craig

"You can never have too many knives" -- Logan Nine Fingers
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Post by vivekgadwal »

Thank you very much, Ray and Craig, for your responses.
ray.wurlod wrote:Incidentally, it's dsdlockd (not dslockd) so you might want to try your ps command again.
My understanding from Craig's post is that if the "start" value is '0', then it means that the "dsdlockd" process is not ON. Am I right in understanding that?

I executed ps again just in case and I did not find the process running (no jobs were running at that time). Another question is, whenever we reboot the server, do we have to come in and manually set the values in this file "dsdlockd.config"?

[As an FYI, our Linux admin did a detailed analysis and check on his end to see how the server is configured and he found no issues on that end. So, he says the onus is on us (DataStage team) to get the configurations right.]
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ray just meant you missed a "d" in the name you grepped for, so you should correct the spelling and try again to be certain.
-craig

"You can never have too many knives" -- Logan Nine Fingers
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Post by vivekgadwal »

Got it. I did ps and I did not find it in prod, development and test environments. I am in process of trying to convince people to do this as I cannot directly implement this change in Prod. The thing is, I am not sure if there will be any repercussions elsewhere in the processing if I enable this dsdlockd in Prod.

The curious thing here is that nothing is different between the test/devl/prod environments, yet the jobs were a lot slow in prod versus running in different boxes. That is another part that is confusing to me. However, thanks for your help and I will keep you all posted.
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
Post Reply