How to submit a shell script on a grid?
Moderators: chulett, rschirm, roy
How to submit a shell script on a grid?
When running on a grid, what is the best way to execute a shell script to ensure that it will run on an available compute node rather than the head/conductor node?
I tried using the exec_command activity in a sequencer, but that executes the script on the head node and never even calls the load levelor.
I have also use the external source stage in a Px job to submit the script, and that works. I'm just not sure this is the "right" way.
Is there a better way?
I tried using the exec_command activity in a sequencer, but that executes the script on the head node and never even calls the load levelor.
I have also use the external source stage in a Px job to submit the script, and that works. I'm just not sure this is the "right" way.
Is there a better way?
Bob
To create a dynamic configuration file for a job sequence, you must invoke the sequencer.sh script. In your downstream job activity stages, did you use the expression: Field(<stagename>.$CommandOutput," ",<fieldnum>) to accept the values passed from the sequencer.sh script? For each job in the sequence, did you set APT_GRID_ENABLE parameter to NO?
Is there a particular reason/need to run the script on a compute node rather than the head node?
Sequence jobs, because they are at heart server jobs, can run only on the head node (the Engine server or tier). The ExecCommand activity, as well as BeforeJob and AfterJob ExecSH functions (I believe so, at least), also execute their targets on the head node (as you have already seen). If you need to run the script on a compute node, you can use one of the External stages (External Source/Target/Filter).
Another option would be to submit the script directly to Load Leveler using it's standard commands to do so, which you may even be able to do using ExecCommand in a Sequence Job. Look through the Load Leveler documentation (it is available online) to figure out how to do that.
Regards,
Sequence jobs, because they are at heart server jobs, can run only on the head node (the Engine server or tier). The ExecCommand activity, as well as BeforeJob and AfterJob ExecSH functions (I believe so, at least), also execute their targets on the head node (as you have already seen). If you need to run the script on a compute node, you can use one of the External stages (External Source/Target/Filter).
Another option would be to submit the script directly to Load Leveler using it's standard commands to do so, which you may even be able to do using ExecCommand in a Sequence Job. Look through the Load Leveler documentation (it is available online) to figure out how to do that.
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.
-
- Participant
- Posts: 117
- Joined: Wed Feb 06, 2013 9:24 am
- Location: Chennai,TN, India
Yes, the processing that is being done in the shell script is causing high wait CPU % on the head node and running it on a compute node doesn't have that problem. We need to reserve the head node for sequence/conductor type processes to keep things running smoothly.jwiles wrote:Is there a particular reason/need to run the script on a compute node rather than the head node?
Bob
I guess there is a piece of this that I am still not understanding. Are you saying that if I execute sequncer.sh then I can somehow coax a subsequent exec command stage to run the application script on a compute node?lstsaur wrote:That's exactly how to (using compute node info passed from the sequencer.sh script) get your ExecCommand activity job processed on a compute node.
currently my sequence job is merely the exec command stage to execute the application shell script.
Or (and this is the only thing I have found so far that will get the application script to run on a compute node) the Sequencer contains one job activity that calls a parallel job that contains nothing but an external source stage, that executes the application script, and a peek stage.
Thanks,
Bob
You may be able to get it to work by using the ssh command in the ExecCommand activity:
ssh hostname scriptname
Pull the hostname from the values returned by sequencer.sh. If you're feeling adventurous, pull a hostname from one of the node entries in the config file returned by sequencer.sh
Regards,
ssh hostname scriptname
Pull the hostname from the values returned by sequencer.sh. If you're feeling adventurous, pull a hostname from one of the node entries in the config file returned by sequencer.sh
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.
Bobyon, There's two ways I see you doing your task.
External source stage and limit the execution of the stage to Node1.
or
Exec Sequencer and have your script call the proper grid dispatching command that you would normaly be doing via DynamicGrid.sh.
Not sure which flavor of Grid you have, but the easy path is external source stage.
Talk to the datastage admins within your environment. They most likely can direct you to your answer. If you want your grid resource manager to track work activity then go that route. You will have to redirect stdout and stderr to a log file you can use later. External source stage might kick that into your job log for you.
slapping shell scripts onto your work horses is EXACTLY THE RIGHT THING TO DO... if the overhead of dispatching the work is less than the work itself.
I HATE using the Head Node (Conductor) as a number cruncher or data mover. Hate.
Remember that if you chose to have your script do the grid dispatching call, use the same resource requirements that your datastage jobs are being submitted with. You still want to hit the same pool of servers that are dedicated to your DataStage setup. (mainly because your mounts are all present there)
External source stage and limit the execution of the stage to Node1.
or
Exec Sequencer and have your script call the proper grid dispatching command that you would normaly be doing via DynamicGrid.sh.
Not sure which flavor of Grid you have, but the easy path is external source stage.
Talk to the datastage admins within your environment. They most likely can direct you to your answer. If you want your grid resource manager to track work activity then go that route. You will have to redirect stdout and stderr to a log file you can use later. External source stage might kick that into your job log for you.
slapping shell scripts onto your work horses is EXACTLY THE RIGHT THING TO DO... if the overhead of dispatching the work is less than the work itself.
I HATE using the Head Node (Conductor) as a number cruncher or data mover. Hate.
Remember that if you chose to have your script do the grid dispatching call, use the same resource requirements that your datastage jobs are being submitted with. You still want to hit the same pool of servers that are dedicated to your DataStage setup. (mainly because your mounts are all present there)
This looks like the approach I am going to take. I'll let the load leveler determine which node to run it on but will set the grid parms to a 1x1 config.PaulVL wrote:Bobyon, There's two ways I see you doing your task.
External source stage and limit the execution of the stage to Node1.
The mirror has not been much help on this oneTalk to the datastage admins within your environment.
Thanks to all contributors for your help and advice.
Bob