Job Corruption problem
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 39
- Joined: Tue Dec 23, 2003 3:47 am
- Location: India
Job Corruption problem
We have a problem in production..
All of a sudden one of the DS jobs is not running properly.
It has a oracle -> Hash file stage, followed by lookups and the sequential file and finally update/inser on oracle stage. Its a fairly big job.
The job was hanging initially while creating the hash file. We then cleared
existsing hash file on Unix, rename the hash file in the job and finally now hash gets created but the job is not progressing further. Its just stand still for 12 hrs, normally it finished in 3-4 hrs.
Director shows o rows, designer shows blue and no oracle sessions at all.
We tried all the following
1. Recompiled the job
2. Renamed the job
3. Saved a copy, renamed and compiled.
4.Deleted the job from the server and imported again
5. Imported on a different name
Nothing worked!!
When we loginto the project thru director , we frequently get these error message
Cannot open executable job file RT_CONFIG1219
We cleared the jobs logs, status file and resources through unix and director. Also imported the job in a different name, again the new job hangs.
Rest other jobs are running fine.
Totally clueless...
Anyidea??
All of a sudden one of the DS jobs is not running properly.
It has a oracle -> Hash file stage, followed by lookups and the sequential file and finally update/inser on oracle stage. Its a fairly big job.
The job was hanging initially while creating the hash file. We then cleared
existsing hash file on Unix, rename the hash file in the job and finally now hash gets created but the job is not progressing further. Its just stand still for 12 hrs, normally it finished in 3-4 hrs.
Director shows o rows, designer shows blue and no oracle sessions at all.
We tried all the following
1. Recompiled the job
2. Renamed the job
3. Saved a copy, renamed and compiled.
4.Deleted the job from the server and imported again
5. Imported on a different name
Nothing worked!!
When we loginto the project thru director , we frequently get these error message
Cannot open executable job file RT_CONFIG1219
We cleared the jobs logs, status file and resources through unix and director. Also imported the job in a different name, again the new job hangs.
Rest other jobs are running fine.
Totally clueless...
Anyidea??
A job that just hangs there forever without doing anything is most likely waiting on a lock - any playing around with "kill -9" or other hanky-panky with the engine process might lead to a lock left over that isn't going to get cleared. It looks like the lock is on the hashed file, which is why running a different copy of the job isn't making a difference.
I would recommend you start your deadlock daemon to clear this up; or better yet to stop and re-start DataStage to make sure your locks are "clean".
The RT_CONFIG error means that you should also use DS.TOOLS to clean up your repository indices and it would make sense to couple this with your restart of DataStage, as it requires that all users are out. You can use the time to also do a quick call to clean up the project job files from DS.TOOLS as well.
I would recommend you start your deadlock daemon to clear this up; or better yet to stop and re-start DataStage to make sure your locks are "clean".
The RT_CONFIG error means that you should also use DS.TOOLS to clean up your repository indices and it would make sense to couple this with your restart of DataStage, as it requires that all users are out. You can use the time to also do a quick call to clean up the project job files from DS.TOOLS as well.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Premium Member
- Posts: 39
- Joined: Tue Dec 23, 2003 3:47 am
- Location: India
Sankar,
you will need to start numbers (2) to rebuild the indices and (4) to check integrity of job files.
Jim,
the DataStage $DSHOME directory has a file called "dsdlock.config" with a line that reads either start=1 or start=0 to control whether or not the lock daemon is fired off when DataStage is started.
You need to be root to start the deadlock daemon manually. Attach to the DataStage home directory, and enter the command "bin/dsdlock -config"
you will need to start numbers (2) to rebuild the indices and (4) to check integrity of job files.
Jim,
the DataStage $DSHOME directory has a file called "dsdlock.config" with a line that reads either start=1 or start=0 to control whether or not the lock daemon is fired off when DataStage is started.
You need to be root to start the deadlock daemon manually. Attach to the DataStage home directory, and enter the command "bin/dsdlock -config"
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Premium Member
- Posts: 39
- Joined: Tue Dec 23, 2003 3:47 am
- Location: India
yes, the engine needs to be up while doing the options (2) and (4) and you should ensure that no users are in DataStage.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
If RT_CONFIG1219 can not be opened, reindexing will have no benefit at all, because there are no indexes on the configuration files. Probably the best thing to do is to make a copy of the job, compile and run that (then delete the old version of the job that has a corrupted configuration file).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 39
- Joined: Tue Dec 23, 2003 3:47 am
- Location: India