Job Corruption problem

sankarsadasivan · Post by **sankarsadasivan** » Wed Jan 11, 2006 10:19 am

We have a problem in production..

All of a sudden one of the DS jobs is not running properly.

It has a oracle -> Hash file stage, followed by lookups and the sequential file and finally update/inser on oracle stage. Its a fairly big job.

The job was hanging initially while creating the hash file. We then cleared
existsing hash file on Unix, rename the hash file in the job and finally now hash gets created but the job is not progressing further. Its just stand still for 12 hrs, normally it finished in 3-4 hrs.

Director shows o rows, designer shows blue and no oracle sessions at all.

We tried all the following
1. Recompiled the job
2. Renamed the job
3. Saved a copy, renamed and compiled.
4.Deleted the job from the server and imported again
5. Imported on a different name

Nothing worked!!

When we loginto the project thru director , we frequently get these error message
Cannot open executable job file RT_CONFIG1219

We cleared the jobs logs, status file and resources through unix and director. Also imported the job in a different name, again the new job hangs.

Rest other jobs are running fine.
Totally clueless...

Anyidea??

ArndW · Post by **ArndW** » Wed Jan 11, 2006 10:29 am

A job that just hangs there forever without doing anything is most likely waiting on a lock - any playing around with "kill -9" or other hanky-panky with the engine process might lead to a lock left over that isn't going to get cleared. It looks like the lock is on the hashed file, which is why running a different copy of the job isn't making a difference.

I would recommend you start your deadlock daemon to clear this up; or better yet to stop and re-start DataStage to make sure your locks are "clean".

The RT_CONFIG error means that you should also use DS.TOOLS to clean up your repository indices and it would make sense to couple this with your restart of DataStage, as it requires that all users are out. You can use the time to also do a quick call to clean up the project job files from DS.TOOLS as well.

sankarsadasivan · Post by **sankarsadasivan** » Wed Jan 11, 2006 11:11 am

Hi

Will try that.

Can you please tell me what options to select in DS.TOOLS to clear repository indices.

That would be of great help

thanks

jzparad · Post by **jzparad** » Wed Jan 11, 2006 11:26 am

Arnd,

Could you explain how to do this.

I would recommend you start your deadlock daemon to clear this up;

ArndW · Post by **ArndW** » Wed Jan 11, 2006 11:30 am

Sankar,

you will need to start numbers (2) to rebuild the indices and (4) to check integrity of job files.

Jim,

the DataStage $DSHOME directory has a file called "dsdlock.config" with a line that reads either start=1 or start=0 to control whether or not the lock daemon is fired off when DataStage is started.

You need to be root to start the deadlock daemon manually. Attach to the DataStage home directory, and enter the command "bin/dsdlock -config"

sankarsadasivan · Post by **sankarsadasivan** » Wed Jan 11, 2006 11:40 am

I assume that DS engine should be up while doing this..
Am I right?

Pls suggest

ArndW · Post by **ArndW** » Wed Jan 11, 2006 11:59 am

yes, the engine needs to be up while doing the options (2) and (4) and you should ensure that no users are in DataStage.

ray.wurlod · Post by **ray.wurlod** » Wed Jan 11, 2006 6:51 pm

If RT_CONFIG1219 can not be opened, reindexing will have no benefit at all, because there are no indexes on the configuration files. Probably the best thing to do is to make a copy of the job, compile and run that (then delete the old version of the job that has a corrupted configuration file).

sankarsadasivan · Post by **sankarsadasivan** » Fri Jan 13, 2006 8:31 am

Sorry for the delay!.
Once we stopped and started Datastage the problem was resolved.

Appreciate all the help.
Thanks all of you.