Running multi-instance job for two file
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 5
- Joined: Fri Dec 30, 2011 2:51 am
Running multi-instance job for two file
Hi ,
I have a multiinstance job which run for multiple files which comes in various directories,
But how should I handle the scenario of running a job in multiinstance when two files comes in same directory.
I have to implement this scenario without loop, as when I will use loop then it will run one after another but my requirement is that I have to run the sequence Parallel for both the files.
For example in a source directory two file comes with name abc.txt and abc1.txt , then I have to execute my jobs in parallel to process both files.
Please suggest the best way .
Thanks
I have a multiinstance job which run for multiple files which comes in various directories,
But how should I handle the scenario of running a job in multiinstance when two files comes in same directory.
I have to implement this scenario without loop, as when I will use loop then it will run one after another but my requirement is that I have to run the sequence Parallel for both the files.
For example in a source directory two file comes with name abc.txt and abc1.txt , then I have to execute my jobs in parallel to process both files.
Please suggest the best way .
Thanks
-
- Premium Member
- Posts: 730
- Joined: Tue Nov 04, 2008 10:14 am
- Location: Bangalore
The whole concept of multi instance is to be able to run the process parallely but with different sets of data. To accomplish the parallel run you need to parameterize the file name and run the job with different invocation id's.
Guess you don't have any dependency across the processing of different files.
Guess you don't have any dependency across the processing of different files.
- Zulfi
I believe using a file pattern & reading more than 1 file process the data in parallel . If there are 5 files all those are read in parallel & pass to the down stream stages .With this method you may need to put additional logic to know which records belong to which file .pandeesh wrote:The files will be processed one by one in case of file pattern method.
But,he wants to process all the files in parallel.
Nag
As per the documentation:
Can anyone shed some light on this?
Thanks
So,if there are 3 files 1.txt,2.txt and 3.txt, then in file pattern method how it will get processed?whether each file gets processed separately or all the records get combined(e.g:cat 1.txt 2.txt 3.txt>final.txt) and(final.txt) processed in a single run?File pattern
Specifies a group of files to import. Specify file containing a list of files or a job parameter representing
the file. The file could also contain be any valid shell expression, in Bourne shell syntax, that generates a
list of file names.
Can anyone shed some light on this?
Thanks
pandeeswaran
By default, a list of files whether generated by a file pattern, command output or hardcoded will be concatenated together and read into the Sequential File stage sequentially. In order to read the files in parallel with each other, add the APT_IMPORT_PATTERN_USES_FILESET environment variable mentioned by pandeesh. This variable is discussed in the product documentation.
Regards,
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.
Suppose you are trying to read all the files starting with the name datastage_file , you would be specifying datastage_file*. I think if you are not setting the variable APT_IMPORT_PATTERN_USES_FILESET & using the file pattern , It will check for the file with the name 'datastage_file*'.
You need to set that variable to make it work as expected .Please correct me if i am wrong .
You need to set that variable to make it work as expected .Please correct me if i am wrong .
Nag
James is correct. By default, without the env var set, it will expand any wildcards into the matching filenames and cat those files together (sequential). With the env var set, if there are multiple files, then it will process the files in parallel.
Either way, it will expand the wildcard and read the same files.
Either way, it will expand the wildcard and read the same files.
Choose a job you love, and you will never have to work a day in your life. - Confucius