Execute n number of jobs at a time
Posted: Thu Jan 22, 2015 3:28 am
Hi,
I am not sure as to which forum my question goes. So I have posted in General forum. Requesting moderator to move the post to appropriate forum.
My Question:
We are working on Datastage 8.7 and have around 200 jobs in a datastage project. There is no dependency between jobs. We want to design the sequence in such a way that at any given point there will be 15 jobs running. Currently we are doing this using a shell script. We store the list of all the jobs in a file and looping through the file we execute the jobs with "dsjob" command.
We take the first 15 job names and trigger them using dsjob. Now we wait for atleast one of the jobs to finish and then trigger another job. This way we are trying to execute only 15 jobs at a given time. If there is any job that aborts we store the name in another file(like aborted_jobs.txt) and once the shell script finished executing we will run it again. If there is any data in aborted_jobs.txt then only those jobs will be executed.
But we are trying to do this design using a sequence. The reason for trying to do it using a sequence is we assume that the signals between shell script and datastage is taking time sometimes and even though the job gets finished faster, the overall time taken by the script is increasing. If we try to put it in a sequence atleast everything will be done inside datastage process and no need to send any signals back to script for each job.
So to do this in Sequence we have designed as shown below.
When the sequence is started Job1,Job2...Job15 are started and once Job1 is finished(success or fail) Job16 gets triggered. And once all the jobs are finished I am checking the status of all the jobs using a NestedCondition stage and aborting the sequence(in case atleast one job is aborted). When I restart the sequence, only the aborted jobs are executed! But the only drawback I have with this approach is with the sequence link I am creating. Lets say are in a sequence. Job70 is taking lot of time for execution and all the others jobs have finished execution. At this time I am looking to trigger Job71 and not to wait for Job70 to finish as there is no dependency. One idea is to identify the jobs which will take time and keep them in a separate sequence. But we really are not sure about the jobs taking time. We have seen scenarios where Jobs take 10mins on some days and takes 40mins on other due to data that we pull and load.
May be am being silly in asking this question but really want to know if there is any design approach or solution for this scenario using a sequence or similar!? Please help if anyone has come across this anytime!?
I am not sure as to which forum my question goes. So I have posted in General forum. Requesting moderator to move the post to appropriate forum.
My Question:
We are working on Datastage 8.7 and have around 200 jobs in a datastage project. There is no dependency between jobs. We want to design the sequence in such a way that at any given point there will be 15 jobs running. Currently we are doing this using a shell script. We store the list of all the jobs in a file and looping through the file we execute the jobs with "dsjob" command.
We take the first 15 job names and trigger them using dsjob. Now we wait for atleast one of the jobs to finish and then trigger another job. This way we are trying to execute only 15 jobs at a given time. If there is any job that aborts we store the name in another file(like aborted_jobs.txt) and once the shell script finished executing we will run it again. If there is any data in aborted_jobs.txt then only those jobs will be executed.
But we are trying to do this design using a sequence. The reason for trying to do it using a sequence is we assume that the signals between shell script and datastage is taking time sometimes and even though the job gets finished faster, the overall time taken by the script is increasing. If we try to put it in a sequence atleast everything will be done inside datastage process and no need to send any signals back to script for each job.
So to do this in Sequence we have designed as shown below.
Code: Select all
Job1->Job16->Job31.....|
Job2->Job17->Job32.....|
Job3->Job18->Job33.....|
Job4->Job19->Job34.....|--Sequencer-->NestedCondition->Termination
Job5->Job20->Job35.....|
Job6->Job21->Job36.....|
.......................|
.......................|
.......................|
Job15->Job30->Job45....|
Code: Select all
Job70->Job71->Job72->Job73->Job74
May be am being silly in asking this question but really want to know if there is any design approach or solution for this scenario using a sequence or similar!? Please help if anyone has come across this anytime!?