Design help on Multi Instance job

New2DS · Post by **New2DS** » Thu May 12, 2005 10:52 am

Hello All,

I saw many posts about the divide and conquer rule and I am planning to follow the same for the performance of my job. My job design is, reading a seq file with 30 million records and writing it to four different files based on the constraints.

I need some ideas in building a multi instance job. I saw option on the job parameters where we check for the multi instance but the questions I have is what should be my parameters for the input and all the 4 outputs and how do I run the job multiple times if we don't use a job sequencer. The job scheduler we use is Autosys. We have a 8 cpu machine

-Thanks

vmcburney · Post by **vmcburney** » Thu May 12, 2005 5:13 pm

Make sure you have the Multiple instance check box checked in job properties. Instead of calling the standard job name ProcessBigAssFile you call it with an instance id ProcessBigAssFile.one and ProcessBigAssFile.two. You can name these instance suffixes anything you want but it is good to choose a name that matches your partitioning method.

If you have partitioning in your job so that each job processes every 4th row starting at a different number then you could number then 1 to 4. It might be faster to break your source file up with some operating system scripts before you start processing.

kduke · Post by **kduke** » Thu May 12, 2005 8:51 pm

I would use the UNIX split command or dd to split this file into multiple files before running the jobs. This allows each job to read less data. You will need the disk space to do this.