Page 1 of 1

Execute n number of jobs at a time

Posted: Thu Jan 22, 2015 3:28 am
by austin_316
Hi,
I am not sure as to which forum my question goes. So I have posted in General forum. Requesting moderator to move the post to appropriate forum.

My Question:
We are working on Datastage 8.7 and have around 200 jobs in a datastage project. There is no dependency between jobs. We want to design the sequence in such a way that at any given point there will be 15 jobs running. Currently we are doing this using a shell script. We store the list of all the jobs in a file and looping through the file we execute the jobs with "dsjob" command.
We take the first 15 job names and trigger them using dsjob. Now we wait for atleast one of the jobs to finish and then trigger another job. This way we are trying to execute only 15 jobs at a given time. If there is any job that aborts we store the name in another file(like aborted_jobs.txt) and once the shell script finished executing we will run it again. If there is any data in aborted_jobs.txt then only those jobs will be executed.
But we are trying to do this design using a sequence. The reason for trying to do it using a sequence is we assume that the signals between shell script and datastage is taking time sometimes and even though the job gets finished faster, the overall time taken by the script is increasing. If we try to put it in a sequence atleast everything will be done inside datastage process and no need to send any signals back to script for each job.
So to do this in Sequence we have designed as shown below.

Code: Select all

Job1->Job16->Job31.....|
Job2->Job17->Job32.....|
Job3->Job18->Job33.....|
Job4->Job19->Job34.....|--Sequencer-->NestedCondition->Termination
Job5->Job20->Job35.....|
Job6->Job21->Job36.....|
.......................|
.......................|
.......................|
Job15->Job30->Job45....|
When the sequence is started Job1,Job2...Job15 are started and once Job1 is finished(success or fail) Job16 gets triggered. And once all the jobs are finished I am checking the status of all the jobs using a NestedCondition stage and aborting the sequence(in case atleast one job is aborted). When I restart the sequence, only the aborted jobs are executed! But the only drawback I have with this approach is with the sequence link I am creating. Lets say

Code: Select all

Job70->Job71->Job72->Job73->Job74
are in a sequence. Job70 is taking lot of time for execution and all the others jobs have finished execution. At this time I am looking to trigger Job71 and not to wait for Job70 to finish as there is no dependency. One idea is to identify the jobs which will take time and keep them in a separate sequence. But we really are not sure about the jobs taking time. We have seen scenarios where Jobs take 10mins on some days and takes 40mins on other due to data that we pull and load.
May be am being silly :oops: in asking this question but really want to know if there is any design approach or solution for this scenario using a sequence or similar!? Please help if anyone has come across this anytime!?

Posted: Thu Jan 22, 2015 6:45 am
by eph
Hi,

I would suggest you to implement your previous script logic into a job control so that you can execute jobs as if they were in a pool, instead of having them in several serialized chains. You can loop on you job list, run them until an increment tops your limit, them release the loop to next job ect.

Eric

Posted: Thu Jan 22, 2015 7:14 am
by Mike
I suggest investing as little effort as possible in managing workload in version 8.7.

When you upgrade to version 9.1 or later, workload management is part of the infrastructure and there is no longer a need to manage it within your application.

Mike

Posted: Thu Jan 22, 2015 9:23 am
by chulett
Stick with your script. There's no way to emulate that behavior in a Sequence job.

Posted: Thu Jan 22, 2015 12:05 pm
by qt_ky
Ditto on using the workload management feature. Create a queue with a job limit of 15. :D

Posted: Thu Jan 22, 2015 12:27 pm
by chulett
That's awesome they finally added that feature. 8)

Posted: Tue Jan 27, 2015 7:12 am
by austin_316
Thanks Everyone. I guess I will have to stick with shell script for now. Hopefully we will migrate to 9.1 :lol: and I will get to use Workload Management or Do I have it in 8.7 already? :roll:
I would love to see how that helps me with my design scenario.

Thanks again for you suggestions.

Posted: Tue Jan 27, 2015 11:28 am
by qt_ky
The Workload Management (WLM) feature was introduced in version 9.1 and is there in version 11.3.x also.

Posted: Tue Jan 27, 2015 10:32 pm
by austin_316
Thanks Eric :)

Posted: Wed Jan 28, 2015 8:11 am
by chulett
pssstttt... he's suggesting you skip 9.1 and aim your upgrade sights a little... higher. :wink:

Posted: Wed Jan 28, 2015 8:14 am
by qt_ky
Indeed. :lol:

Posted: Tue Apr 07, 2015 9:07 am
by austin_316
chulett wrote:pssstttt... he's suggesting you skip 9.1 and aim your upgrade sights a little... higher. :wink:
I guess I did not read it right :lol:

Currently we have the datastage upgraded to 9.1. Not sure when will we reach the 11.x version which Eric was mentioning :roll:
Please can you advise where can I find it and how can use this feature in 9.1?

Posted: Tue Apr 07, 2015 11:39 am
by qt_ky
I'm not sure about 9.1; I skipped that release. Try searching the IBM Knowledge Center on "Workload Management." There is probably a config file you have to edit on the server. I vaguely recall that the feature is off by default on 9.1, whereas it is on by default in 11.3.

Posted: Wed Apr 08, 2015 3:32 pm
by chulett
Looks to be here, copy/paste as a direct link won't work:

http://www-01.ibm.com/support/knowledge ... s/wlm.html