Page 1 of 1

dsjob restart: "Unable to load value file"

Posted: Mon Dec 12, 2016 4:39 pm
by asorrell
Using v11.3, I've got a small test Job Sequencer with checkpointing enabled that also contains one parameter set and one parameter. The parameter set contains several environment variables all set to $PROJDEF.

I start the job from the command line with these options:
dsjob -authfile /home/dsadm/.authfile -run -mode NORMAL -wait -param JOB_RUN_ID=6660225 dstage1 Test_Seq

When the job aborts and is restarted with the Director (run, no reset), the checkpointing works fine.

However, if I restart the aborted job from the command line:
dsjob -authfile /home/dsadm/.authfile -run -mode RESTART -wait -param JOB_RUN_ID=6660225 dstage1 Test_Seq

It aborts again, placing this error in the director log:
Unable to load value file (As pre-defined)

I think it wants me to set the parameter set values on the command line to (As Defined). I find this puzzling since I didn't do that on the "start". This is also problematic since this script is supposed to be generic enough that we can use it to restart all check-pointed jobs.

Any suggestions as to how to resolve?

Posted: Mon Dec 12, 2016 7:58 pm
by JRodriguez
Andy,
Since your parameter set contain only environment variables set to $PROJDEF try without passing -mode to the dsjob command for both regular run and restart. Please notice that the DataStage job sequence, if configured to be restart-able, will always start from the failure point ... and you, I hope, won't experience the error about the value file

Hope it help you

Regards

Posted: Tue Dec 13, 2016 9:06 am
by FranklinE
I've seen something that looks similar. I offer it for comparison.

We invoke a generic script from our scheduler, which builds and sends the dsjob command line. We don't modify any command line parameters to rerun an aborted job, such as you show for -mode. We depend on the checkpoint to issue a reset of the aborted parallel job, if that's what was involved with the failure.

We do pass parameters (like process date) from the scheduler to the script. If the rerun changes any of those parameters, the restart/reset will fail because of that difference.