DataStage V11.3 Grid - Restart
Moderators: chulett, rschirm, roy
DataStage V11.3 Grid - Restart
We have created a grid environment ( 1 primary and 6 conductor) and installed DataStage V 11.3.1.2. Migrated our jobs from 8.5 into new 11.3.1.2 . We have submitted a sequence that runs 40 jobs in parallel. We have installed Platform LSF 9.3. All the jobs are submitted to the same host, can you please suggest how to distribute the load into multiple host.
Incert a 1 second delay into your job submission.
in lsb.queues, add this
JOB_ACCEPT_INTERVAL = 1
(read about it)
add this to lsb.params
MBD_SLEEP_TIME = 1 #Amount of time in seconds used for calculating parameter values
That should simulate a round robin yet still maintain load balancing.
you might also want to limit submitting jobs to a compute node if it is over a certain CPU threshhold
ut = 0.85
(in lsb.queues)
in lsb.queues, add this
JOB_ACCEPT_INTERVAL = 1
(read about it)
add this to lsb.params
MBD_SLEEP_TIME = 1 #Amount of time in seconds used for calculating parameter values
That should simulate a round robin yet still maintain load balancing.
you might also want to limit submitting jobs to a compute node if it is over a certain CPU threshhold
ut = 0.85
(in lsb.queues)
Test it.
What you are running into is the fact that the host you are submitting to is a valid candidate for a job because you told it that it had a maximum job slot (maybe 64), and you are submitting 40 jobs. At the time of submittion, your grid does not understand how much CPU each job will use. So 40 goes into 60 just fine. Boom, first server offered up to you is fare game.
Even if you put in the "ut=0.85" that is not enough to stop the flooding of one server. It will just stop concidering that host to be a candidate if the CPU is above 85%. At the start of your sequencer, that host would not be 85+. So all 40 jobs get sent to it because it's still fare game.
By introducing a 1 second delay before the host can accept another job, you will basically be able to submit X number of jobs per second where X is the quantity of compute nodes in your pool, 1 per compute node, the other jobs are held in the queue. On second #2 you would submit another X amount of jobs. This would happen until the backlog is all done.
So yes, there would be a delay in your job submissions because of the wait time in the queue.
That's load balancing for ya. Test the load on the box... then deploy.
Of course, the above technique is totally thrown out the window if you are using the sequencer.sh method of pre-generating your APT file in the sequencer, then passing it to the jobs. The jobs at that point are not grid jobs which are individually load balanced.
What you are running into is the fact that the host you are submitting to is a valid candidate for a job because you told it that it had a maximum job slot (maybe 64), and you are submitting 40 jobs. At the time of submittion, your grid does not understand how much CPU each job will use. So 40 goes into 60 just fine. Boom, first server offered up to you is fare game.
Even if you put in the "ut=0.85" that is not enough to stop the flooding of one server. It will just stop concidering that host to be a candidate if the CPU is above 85%. At the start of your sequencer, that host would not be 85+. So all 40 jobs get sent to it because it's still fare game.
By introducing a 1 second delay before the host can accept another job, you will basically be able to submit X number of jobs per second where X is the quantity of compute nodes in your pool, 1 per compute node, the other jobs are held in the queue. On second #2 you would submit another X amount of jobs. This would happen until the backlog is all done.
So yes, there would be a delay in your job submissions because of the wait time in the queue.
That's load balancing for ya. Test the load on the box... then deploy.
Of course, the above technique is totally thrown out the window if you are using the sequencer.sh method of pre-generating your APT file in the sequencer, then passing it to the jobs. The jobs at that point are not grid jobs which are individually load balanced.