Error calling DSRunJob -99

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
edwds
Premium Member
Premium Member
Posts: 32
Joined: Mon May 01, 2006 1:06 pm

Error calling DSRunJob -99

Post by edwds »

Good morning. I need your help guys!!! For past 14 evenings we have been having a particular sequence fail. This sequence fires off about 100 simulataneous server and parallel jobs (mostly server). All the jobs are very simple. No hash files are used in these jobs. All they do is read from one Oracle table and load to another. Every evening we get the above error associated to a different job, but always part of the same sequence. Once it fails we submit the sequence again 5 minutes later and it works. How can we debug this problem? :shock: :shock:
sachin1
Participant
Posts: 325
Joined: Wed May 30, 2007 7:42 am
Location: india

Re: Error calling DSRunJob -99

Post by sachin1 »

what is an error message please post it
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Re: Error calling DSRunJob -99

Post by kwwilliams »

Sounds like your server is overloaded. DataStage cannot run all of the sequences because it does not have the resources and so it fails. Which is why you can rerun the jobs five minutes later without a problem.
edwds
Premium Member
Premium Member
Posts: 32
Joined: Mon May 01, 2006 1:06 pm

Error message

Post by edwds »

23:44:04: Exception raised: @srcTRANTYP, Error calling DSRunJob(srcTRANTYP), code=-99 [General repository interface 'other error']

We have looked at memory and cpu once the job has failed and both are around 50% when this error occurs.
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Re: Error message

Post by kwwilliams »

Those aren't the only resources. By starting up 100 jobs how many pids are you creating? Have you hit the limit for your user? Can DataStage spawn them fast enough for you to be able to avoid the wait time, after a certain period of time it will abort because it could spawn the next process.

Try not starting one hundred jobs at one time. Pretty sure that will solve your problem.
edwds
Premium Member
Premium Member
Posts: 32
Joined: Mon May 01, 2006 1:06 pm

Post by edwds »

But then why would the same exact sequence run fine 5 minutes later.
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

Resources became free at that point. Whether PID or any things else. Having 100 jobs kick off at the same time is going to run into issues, Is there a reson to ick them all off at the same time or was it just easier?
edwds
Premium Member
Premium Member
Posts: 32
Joined: Mon May 01, 2006 1:06 pm

Post by edwds »

To save time we kick them off simultaneously. Also none of them are dependant on any of the others, so we figured it was safe to do so. It's been running fine for a couple of years but we do add about 10 to 20 jobs to this sequence a year. After more research and more failures I ran into this error:

Code: Select all

Program "DSD.Init": Line 41, Unable to allocate Type 30 descriptor, table is full.
DataStage Job 318 Phantom 28130
Job Aborted after Fatal Error logged.
Program "DSD.WriteLog": Line 250, Abort.
Attempting to Cleanup after ABORT raised in stage seqLMS_SRC..JobControl
We have our T30FILE property set to the default 200. Do you think changing this will fix the problem? Reason I ask is why would this same sequence run fine later on and not give the error above. I would think making the change to this uvconfig parameter would only help it it fails all the time consistently.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Changing T30FILE is the solution to this problem. Increase it to 1000.
You will then need to stop, regenerate and restart the DataStage server.

T30FILE is the total number of dynamic hashed files that can be open simultaneously. Although you assert that your job designs do not use hashed files, the Repository tables are all hashed files; for every job that runs there are three or four hashed files open to record run-time process metadata.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

edwds wrote:Reason I ask is why would this same sequence run fine later on and not give the error above.
It's a resource constraint and is all about what is running in total at the time the error occurs. That's why it 'runs fine later on' and why changing the parameter, as Ray notes, is needed.
-craig

"You can never have too many knives" -- Logan Nine Fingers
edwds
Premium Member
Premium Member
Posts: 32
Joined: Mon May 01, 2006 1:06 pm

Post by edwds »

We changed the T30 File to 1000 and then the error changed to error -14. We then moved the jobs in the sequence around so that less run simultaneously. This solved the problem. Instead of running 100 jobs at the same time we are now down to about 75.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Did you regenerate and restart DataStage after changing T30FILE? Execute the command analyze.shm -t | grep T30FILE to find out whether T30FILE has indeed been increased.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply