Scenario: Multiple files of same format (same columns, csv) need to be processed. Each file also 3 extra lines at the start(description line, blank line, header line).
What the best way to process these? Such that if one file fails, the others still should continue to be processed:
Option 1. cat *.csv (in the Sequential stage filter) is not an option as there are unwanted lines(3 lines) in each file.
Option 2. Using a Folder Stage (say in Job1) and then calling another Job(say Job2) using UtilityRunJob. e.g.
Job1: [Folder Stage] > [Transformer - Invokes UtilityJob for each row]
Job2: [Sequential Stage] > [Transformer] > [DynamicRDBMS]
In option 2, what happens if one of the file load fails for some reason, will others still be processed?
Any other options? Thanks for your help!
Multiple files of same format (same columns, csv)
Moderators: chulett, rschirm, roy
Assuming I use cat. How do I skip processing them inside the job? The job errors out at the Sequential Stage itself, bec the rows(3 lines) are not in the mentioned format(Columns mentioned in Sequential Stage).chulett wrote:I would still cat them. You could pass them thru something like 'sed' at the same time to strip the unwanted lines at the same time or simply skip processing them inside the job. ...
Are you suggesting that I use the "Incomlpete Column option - Discard or Retain and Warn" and then some how filter them in the Transformer.
Yes, I would think you should be able to use the Incomplete Column option set to a none 'warning or error' option (like Replace) and then constrain your Transformer to only pass a row on where a 'key' field (say the first) is not null. Pretty sure I've done that in the past.
Dont' worry about the cat right off the bat, take a single file (or perhaps just a small number of records from a real file) and see if you can make it work for that first in a test bed job.
Dont' worry about the cat right off the bat, take a single file (or perhaps just a small number of records from a real file) and see if you can make it work for that first in a test bed job.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
OK. Replace works. I tried with one file. Now am trying the cat thing.chulett wrote:Yes, I would think you should be able to use the Incomplete Column option set to a none 'warning or error' option (like Replace) and then constrain your Transformer to only pass a row on where a 'key' field (say the first) is not null. Pretty sure I've done that in the past.
Dont' worry about the cat right off the bat, take a single file (or perhaps just a small number of records from a real file) and see if you can make it work for that first in a test bed job.
Files are:
a_091907.csv
b_091907.csv
c_091907.csv
So in Sequential Stage in Filter option: cat *_091907.csv
What do I specify in the File Name option