Sequencer getting errors trying to run

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
lgharis
Premium Member
Premium Member
Posts: 56
Joined: Wed May 26, 2004 10:08 am
Location: Dallas, TX

Sequencer getting errors trying to run

Post by lgharis »

My developers are having recurring problems with Sequencer jobs in DS 7.5.1A. Sequencers trying to invoke jobs are not able to get correct status of jobs. The jobs although being in compiled or finished state are not being able to be attached by the sequencer.

They normally resolve by recompiling the job. Does anyone have any suggestions as to what might be causing these errors?

Ex:
Sequencer sqPM_DwHoldMonthly gives following messages when trying to invoke job jbPM_SyncMlyHoldTransDW:

sqPM_DwHoldMonthly.1.JobControl (DSPrepareJob): Error getting status for job jbPM_SyncMlyHoldTransDW.1
sqPM_DwHoldMonthly.1.JobControl (@jbPM_SyncMlyHoldTransDW): Controller problem: Error calling DSPrepareJob(jbPM_SyncMlyHoldTransDW.1)
(DSGetJobInfo) Failed to open RT_STATUS1439 file.
The sequencer then aborts with fatal error
sqPM_DwHoldMonthly.1.JobControl (fatal error from @Coordinator): Sequence job will abort due to previous unrecoverable errors

Same with sequencer sqPM_DwHoldDaily trying to invoke bPM_SyncDlyAcctRtnsDW

sqPM_DwHoldDaily.1.JobControl (DSPrepareJob): Error getting status for job jbPM_SyncDlyAcctRtnsDW.1
sqPM_DwHoldDaily.1.JobControl (@jbPM_SyncDlyAcctRtnsDW1): Controller problem: Error calling DSPrepareJob(jbPM_SyncDlyAcctRtnsDW.1)
(DSGetJobInfo) Failed to open RT_STATUS1429 file.

sqPM_DwHoldDaily.1.JobControl (fatal error from @Coordinator): Sequence job will abort due to previous unrecoverable errors

The jobs the sequencers are trying to invoke are in Compiled Status.
Leroy Gharis

Dallas, TX
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

If job control is having issues getting info from log and status files, it's probably a maximum dynamic hashed files open issue. Can you confirm your T30FILES setting in the uvconfig file? If it's too low, like the default 200, then you can't properly execute jobs. The entire repository is based on dynamic hashed files, even if you're just running PX jobs you still are using the internal repository and therefore fall under this setting. Consider upping the value to 1000 or 2000, but not too high. This probably will fix your problem.

The other thing to consider is looking at the server node and measuring the cpu and disk resources to see if the server node is struggling to manage the repository and start/stop jobs.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Did the file system on which your project exists ever become full? If so, some of the repository tables (which are hashed files) may have become corrupted.

Search the (server) forum for ways to check the integrity of hashed files in the project. To check a single file you can run a query against it that must touch every page, for example

Code: Select all

SELECT COUNT(*) FROM RT_STATUS1429;
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
lgharis
Premium Member
Premium Member
Posts: 56
Joined: Wed May 26, 2004 10:08 am
Location: Dallas, TX

Post by lgharis »

kcbland,
Thanks, yes the T30FILE setting was still set at the default of 200. I do see that we have changed that setting to 800 on another server but this one is newer and has not been changed. We will update the uvconfig.

Is it also possible that a corruption of the hash files occurs because the developers attempt to update a job design after the job completes but while the sequencer job that executed it is still active? Or, if they reset the status of the job before the sequencer completes?


ray,
Thanks for that suggestion but I do not believe the filesystem filled up.
Leroy Gharis

Dallas, TX
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's perfectly OK (if not recommended practice) to work on a job design while the job is running. The running job uses the compiled version (generated OSH and any C++ components) of a job (not the design components), and you will find that DataStage prevents you from compiling a job that is actually running.

Resetting the status similarly performs an UPDATE on the RT_STATUSnnn hashed file; it is almost impossible that this would corrupt the hashed file (apart from hardware errors and so on).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply