Page 1 of 1

unable to run DS job

Posted: Wed Mar 15, 2006 6:33 pm
by xli
Hi,

I have a old testing project, I can create and compile testing jobs against it but cannot run these job. Can anybody tell me what may cause this issue ?

Thanks in advance

xli

Posted: Wed Mar 15, 2006 6:39 pm
by ray.wurlod
More information please Jonny. After you compile the jobs, does the status change to Compiled in Director? How are you trying to run the job - from Director, from dsjob, or by some other means? What symptom indicates that the job can not run? Are any events recorded in the job log?

Posted: Wed Mar 15, 2006 6:57 pm
by xli
Hi, Ray

The job status changes to be "compiled" in Director after I comile it. I tried running it in both Designer and Director, there is no reaction. While I tried to run it by using dsjob, it took a while, then issue message as below :

$ bin/dsjob -run Training SimpleSevTest
Error running job

Status code = -14 DSJE_TIMEOUT

Obviously, there is none in the job log

Thanks

Posted: Wed Mar 15, 2006 7:11 pm
by ray.wurlod
What else is the machine doing? This code is usually an indication that the machine is overloaded; either the number of processes exceeds the limit, or the total demand for resources hugely exceeds supply. Failure to start in a timely fashion can also be influenced by too small a value for the T30FILE setting, or by very many entries in that job's log and/or in the &PH& directory. You need to check all of these things. For example, use top or sar to monitor how busy the system is.

Keep in mind, too, that a parallel job will want to create many processes; one conductor process, one section leader process per processing node, and up to one process per stage in the job design. Have you tried running on a single-node configuration file, to reduce the startup time and the total number of processes?

Posted: Wed Mar 15, 2006 7:45 pm
by xli
Hi, Ray

It doesn't work even for a simplest server job. However, same server job can run with no problem in another project in the same server.

xli

Posted: Wed Mar 15, 2006 8:14 pm
by daniel0623
Hi,
Ever I had same issue.Pls export your job,and import into a new project.Run it in new project.You'd better restart server before running.Good luck.

Posted: Wed Mar 15, 2006 8:18 pm
by ray.wurlod
Did you check any of the suggestions I made? Even though it's a server job (you posted in the parallel job forum) these suggestions are still valid.

Posted: Wed Mar 15, 2006 9:57 pm
by xli
well, I don't think that the machine is overloaded as I can still run the same job in other project residing on the same machine. Especially, to simplify this problem, I created a very simple server job to process a few . The testing result is the same.

I presume that there were something wrong with this project. but I am not able to figure out what happened.

Posted: Wed Mar 15, 2006 10:45 pm
by kumar_s
Well, in that case, try to restart the server.
Try executing the command in Adminsitrator client.

Code: Select all

COUNT DS_JOBOBJECTS 
Make sure you get a valid count number rather than an error.

Posted: Wed Mar 15, 2006 11:21 pm
by xli
ok, I'll have to arrange to restart server later.

I run the COUNT DS_JOBOBJECTS against this project,

20849 records counted.

It seems there are too many objects in this project.

xli

Posted: Thu Mar 16, 2006 1:46 am
by ArndW
xli - there are not too many objects in your projects. The reason for doing a COUNT DS_JOBOBJECTS was to make DS traverse the whole set of file keys to make sure that each link was working, otherwise you would have gotten a fatal error message.

Can you try to create a very simple server job with no stages, in the Job Control section put in the statement CALL DSLogInfo("Hello World",""). Compile and try to run this job from either the command line or director.

If you still get a timeout then there is something very wrong in the project. First stop is to use the DS.TOOLS to reindex the repository files. If that fails, just out of curiosity, try to create a dummy routine in the manager with just one line as above and see if you can compile and test it in the manager.