Multiple instance job: Different input files,one output file
Posted: Fri Oct 21, 2011 8:44 am
We have a parallel job that has five input files and writes to two output files. We are planning to make this job a multiple instance job due to the record count involved. Let us say one of the input files has about 15 millions records. We might run three instances of the same job.
My questions are:
1. I need to divide the 15 million record input file into three different files based on a column value and then use each file for each instance of the job. How can I specify different files for different instances?
2. The output file must created/overwritten the first time and the remaining two instance must append to the created/overwritten output file. How can I achieve creation/overwrite of the output file for the first instance that gets completed, and then append for the remaining instances. One way might be creating three different output files and then merging the three files. Any other suggestions?
Any suggestions are greatly appreciated.
Thanks.
My questions are:
1. I need to divide the 15 million record input file into three different files based on a column value and then use each file for each instance of the job. How can I specify different files for different instances?
2. The output file must created/overwritten the first time and the remaining two instance must append to the created/overwritten output file. How can I achieve creation/overwrite of the output file for the first instance that gets completed, and then append for the remaining instances. One way might be creating three different output files and then merging the three files. Any other suggestions?
Any suggestions are greatly appreciated.
Thanks.