Page 1 of 1

Multiple files of same format (same columns, csv)

Posted: Thu Sep 20, 2007 7:35 am
by news78
Scenario: Multiple files of same format (same columns, csv) need to be processed. Each file also 3 extra lines at the start(description line, blank line, header line).

What the best way to process these? Such that if one file fails, the others still should continue to be processed:

Option 1. cat *.csv (in the Sequential stage filter) is not an option as there are unwanted lines(3 lines) in each file.

Option 2. Using a Folder Stage (say in Job1) and then calling another Job(say Job2) using UtilityRunJob. e.g.
Job1: [Folder Stage] > [Transformer - Invokes UtilityJob for each row]
Job2: [Sequential Stage] > [Transformer] > [DynamicRDBMS]

In option 2, what happens if one of the file load fails for some reason, will others still be processed?

Any other options? Thanks for your help!

Posted: Thu Sep 20, 2007 7:43 am
by chulett
I would still cat them. You could pass them thru something like 'sed' at the same time to strip the unwanted lines at the same time or simply skip processing them inside the job.

Posted: Thu Sep 20, 2007 7:54 am
by news78
chulett wrote:I would still cat them. You could pass them thru something like 'sed' at the same time to strip the unwanted lines at the same time or simply skip processing them inside the job. ...
Assuming I use cat. How do I skip processing them inside the job? The job errors out at the Sequential Stage itself, bec the rows(3 lines) are not in the mentioned format(Columns mentioned in Sequential Stage).
Are you suggesting that I use the "Incomlpete Column option - Discard or Retain and Warn" and then some how filter them in the Transformer.

Posted: Thu Sep 20, 2007 7:59 am
by chulett
Yes, I would think you should be able to use the Incomplete Column option set to a none 'warning or error' option (like Replace) and then constrain your Transformer to only pass a row on where a 'key' field (say the first) is not null. Pretty sure I've done that in the past. :?

Dont' worry about the cat right off the bat, take a single file (or perhaps just a small number of records from a real file) and see if you can make it work for that first in a test bed job.

Posted: Thu Sep 20, 2007 8:43 am
by news78
chulett wrote:Yes, I would think you should be able to use the Incomplete Column option set to a none 'warning or error' option (like Replace) and then constrain your Transformer to only pass a row on where a 'key' field (say the first) is not null. Pretty sure I've done that in the past. :?

Dont' worry about the cat right off the bat, take a single file (or perhaps just a small number of records from a real file) and see if you can make it work for that first in a test bed job.
OK. Replace works. I tried with one file. Now am trying the cat thing.
Files are:
a_091907.csv
b_091907.csv
c_091907.csv
So in Sequential Stage in Filter option: cat *_091907.csv
What do I specify in the File Name option

Posted: Thu Sep 20, 2007 9:28 am
by chulett
If you do it in the Filter, there is no filename. You cat them to standard out and the stage reads it as a stream. However, since it requires a value, I would put something like '/dev/null' as the 'Filename'.