Job Designs Gone Missing

johnmwilliams · Post by **johnmwilliams** » Thu Jul 20, 2006 12:34 am

Our server running 7.0 became unstable yesterday and starting exhibiting strange behaviour, including jobs that apparently hung in mid-process, then subsequently continued; an inability to log in using Designer, Admin, etc; and weird messages during compilation for jobs that had previously been fine.

At a point somewhere in the middle of all this, in a project currently being worked on, all the Job Designs disappeared from the Repository directory. The jobs still seem to be there, as the ones most recently opened can be re-opened, but most are invisible.

After restarting the server, even importing the previous day's backup does not restore the Job Designs to visibility. However I was able to import them into new project, so we haven't lost anything significant. But I'd like to understand a couple of things:

1. how could this have happened?
2. how could it be fixed?
3. I don't know if something happended yesterday or if something was corrupted prior and just waiting for the right circumstance. Is there a 'health check' script or command/job that could be run to verify a project? I've seen lots of commands that can be run from the Admin screen (in response to similar questions posted in the past) but I am far from proficient in TCL/Universe so I am not sure what they mean and indeed when/if they are safe to run.

Many thanks,
John Williams

ArndW · Post by **ArndW** » Thu Jul 20, 2006 12:56 am

Hello John,

quite often you will see re-indexing as a possible fix to some of the problems you have mentioned. When DataStage accesses records in hashed files it will automatically utilize indices as criteria when selecting records and if those indices are corrupt then the record will not be found although it still exists and might even be accessible using some other selection method that doesn't use the corrupt index.

Going into the ADMINistrator or TCL to fix this is often sufficient to restore full access. It is important to ensure that all DataStage users are logged off so that the rebuild of the indices will work (otherwise you will have completely corrupt indices instead of just partially corrupt ones).

I can think of two causes for problems with DataStage accounts and files that occur most frequently - disk full in a busy project directory partition and someone doing a VI on an important hashed file and exiting with wq!. The first condition might cause corruption, the latter will definately do so.

There is no built-in complete health check function you can call. There are the DS.ADMIN tools and one more command that can be called from TCL (I'll have to add that later when I'm at a DS installation, I think it is DS.CHECKER but am not certain) which will look through the DS files and search for inconsistancies and orphaned files. I've seen homegrown applications that search through all the project and do a "resize * * *" on files or "count" on files to see if there is any internal hashed file corruption - but these might be somewhat of an overkill in most applications.

loveojha2 · Post by **loveojha2** » Thu Jul 20, 2006 12:58 am

Reindexing might help, but is the last resort.
Did you try that? Is it on a production or a developement server.
Is there any chance that your dsx file is only having program sources and the executables, no job design.
Can you rename/delete a corrupted job? Then import it from the backup.