Job appears not to have started after 60 secs

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Amos.Rosmarin
Premium Member
Premium Member
Posts: 385
Joined: Tue Oct 07, 2003 4:55 am

Job appears not to have started after 60 secs

Post by Amos.Rosmarin »

Hi,

I get this msg when I run multiple processes:
JobNmae appears not to have started after 60 secs
I guess it is somthing to do with lack of resources, and related to the T30FILE/MFILES parameters.... am i right ?

My T30FILE is now 2000, MFILES=500 and the uvregen does not let me increase it more, what it the related kernel parameter.

Can someone give an idea ?

Amos
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I think if it was T30FILE related you would get a different message. This, as far as I know, is just a sign of an overloaded machine bumping up against a hard-coded limit in the engine. You might want to define what 'multiple' means and describe your server's hardware.

See this thread for an example.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Amos.Rosmarin
Premium Member
Premium Member
Posts: 385
Joined: Tue Oct 07, 2003 4:55 am

Post by Amos.Rosmarin »

Thanks,

It is solaris 2.9 with 16G and 8CPU

I read the link you gave me and it's the same problem, I thought raising the T30 but the uvregen does not let me go higher then 500.

There were about 40 instances of the job that failed and some other jobs that uses big (static) hash files + some px jobs

So the in terms of Datastage there was a lot of work going on but in terms of the machine, it was working hard but not 100% loaded (about 80-90% idle and 2G memory free)
the ulimit is 2000

I guess it is DS tunning issue, I run shmtest and changed the uvconfig according to the results and brought the mfiles and T30 to the max posible. I there somthing to do with the kernel parameters.


Thanks,
Amos
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

How many jobs are you are starting at the same time?
Mamu Kim
Amos.Rosmarin
Premium Member
Premium Member
Posts: 385
Joined: Tue Oct 07, 2003 4:55 am

Post by Amos.Rosmarin »

it's about 40 jobs

some are sequencers, for the ones that are executed in parallel I put a little sleep of 3 seconds between job




Thanks,
Amos
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Amos.Rosmarin wrote:So the in terms of Datastage there was a lot of work going on but in terms of the machine, it was working hard but not 100% loaded (about 80-90% idle and 2G memory free)
I'm not sure I'd call 80-90% idle "working hard". :wink: Did you mean 10-20% idle?

You'll probably need to cut back on the number of jobs you run simultaneously, it doesn't sound like a 3 second sleep between launches is going to cut it. Maybe run portions of them in 'waves'?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Amos.Rosmarin
Premium Member
Premium Member
Posts: 385
Joined: Tue Oct 07, 2003 4:55 am

Post by Amos.Rosmarin »

Of course, you are right :oops: .... it's the opposite (staying till 23:00 at the office , which is the time central Europe right now)


The problem is that I must have the data as fast as posible, and the jobs are very short. each is different and they can not be joind.

Is there a upper limit for the T30 ,
for example, if I put the kernels per process open file limit to 2008, does T30 = 2000 makes sense ?

(still looking for the name of this kernel parameter)



Cheers,
Amos
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yeesh, go home.
Amos.Rosmarin wrote:Is there a upper limit for the T30, for example, if I put the kernels per process open file limit to 2008, does T30 = 2000 makes sense ?
There's an upper limit to everything, I would think, but I'm afraid I don't know what that one is. Is it documented in the uvconfig file?
Then Amos.Rosmarin wrote:(still looking for the name of this kernel parameter)
You'll probably need to talk to your SA to find out for sure.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Amos.Rosmarin
Premium Member
Premium Member
Posts: 385
Joined: Tue Oct 07, 2003 4:55 am

Post by Amos.Rosmarin »

Oops again , I see now that I confused between mfiles and T30 ....

I guess I'll goto sleep.


If anyone has some thoughts , i'll be happy to hear.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

You're on Solaris 2.9, so use

prstat -a

from a unix command line to monitor server process and load. DS jobs show up under "phantom" processes. If you have 8 cpus, then a fully utilized cpu by a process shows as 1/8 or 13%. If the sum of user processes (the -a option shows top 5 user summary at the bottom) approach 100%, then your machine is HAMMERED. DS has issues with job control and you need to talk to tech support about any patches that mitigate this issue.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Amos.Rosmarin
Premium Member
Premium Member
Posts: 385
Joined: Tue Oct 07, 2003 4:55 am

Post by Amos.Rosmarin »

Thanks Kenneth

The machine is not hammered, it's working hard but not 100% utilized.
It looks like a DS configuration issue.

Amos
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi Amos,
how do you statr the jobs?
you did mention a 3 secs wait?
having 40 jobs?
even starting all 40 in the same sequence job will not be instantanious.
they will gradually all come up but some later then others.

can you specify a bit more on how do you run all of then in parallel?

Another thing you did mention multi instance jobs?
imagine 20 multi instance jobs bashing on the poor log simultaniously while a new instance comes up, not to mention if the log file got big....

Come to think of it (lmao), We had this in one of our customers.
Their problem was big log files of multi instance jobs.
Out solution was to purge the logs (we log all important info to an ascii log) periodically (depending on the frequency you run the jobs) to make sure they stay small.


IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

It think Roy's solution makes sense. Also changing 3 seconds to 5 or 10 makes sense. I cannot image that these all have to run in the fewest seconds as possible. If so then change your design.

If you are processing log files and need to do it as fast as possible then you need rotate the log file and process the old log file and then you do not lose transactions. I assume you have that kind of situation is the need for speed in a situation like this.

Describe why 40 processes need to run at the same time when your machine is incapable of doing this? Especially when these jobs are small. There has got to be another solution available to you. Explain your options.
Mamu Kim
Luciana
Participant
Posts: 60
Joined: Fri Jun 10, 2005 7:22 am
Location: Brasil

Post by Luciana »

Code: Select all

Job control fatal error (-14)  
(DSRunJob) Job "Name" appears not to have started after 60 secs  
The error -14 happens when the server is overloaded in some intervals. The parameter regarding the time of 60 seconds no there is as being altered.

There are some parameters of the file uvconfig that can be adjusted:

1. Stop the service of DataStage using the command:
$DSHOME/bin/uv -admin -stop

2. it bends the value of the parameters (GLTABSZ, RLTABSZ and MAXRLOCK) in the file $DSHOME/uvconfig.
Ex.: If the values is (75,75,74) respectively, inform the values (150,150,149).

Note: This value cannot be greater then RLTABSZ - 1.

3. As user dsadm it executes:
$DSHOME/bin/uv - admin - regen

Note: If the command above not to execute with success, alter the values of the variables (Nmemoff, Cmemoff, Pmemoff and

Dmemoff) in the file uvconfig for "0x0", and executes the command of the step 3 again.

4. Restart of DataStage using the command:
$DSHOME/bin/uv - admin - start[/code]
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Hello Luciana,

you seem to have picked up on a long-dead thread (from June) on your response. Also, I think that the solution you proposed doesn't address the original problem. The 60-second value is hardcoded and the solutions were to make the job actually start quicker on a heavily loaded system.

Your solution will have a positive effect for systems that have group and record lock contention. There was no indication in the thread that this was the case, and changing these values is something to be done only when necessary and with care.
Post Reply