Multiple files of same format (same columns, csv)

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
news78
Participant
Posts: 44
Joined: Fri Jul 07, 2006 1:37 pm

Multiple files of same format (same columns, csv)

Post by news78 »

Scenario: Multiple files of same format (same columns, csv) need to be processed. Each file also 3 extra lines at the start(description line, blank line, header line).

What the best way to process these? Such that if one file fails, the others still should continue to be processed:

Option 1. cat *.csv (in the Sequential stage filter) is not an option as there are unwanted lines(3 lines) in each file.

Option 2. Using a Folder Stage (say in Job1) and then calling another Job(say Job2) using UtilityRunJob. e.g.
Job1: [Folder Stage] > [Transformer - Invokes UtilityJob for each row]
Job2: [Sequential Stage] > [Transformer] > [DynamicRDBMS]

In option 2, what happens if one of the file load fails for some reason, will others still be processed?

Any other options? Thanks for your help!
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I would still cat them. You could pass them thru something like 'sed' at the same time to strip the unwanted lines at the same time or simply skip processing them inside the job.
-craig

"You can never have too many knives" -- Logan Nine Fingers
news78
Participant
Posts: 44
Joined: Fri Jul 07, 2006 1:37 pm

Post by news78 »

chulett wrote:I would still cat them. You could pass them thru something like 'sed' at the same time to strip the unwanted lines at the same time or simply skip processing them inside the job. ...
Assuming I use cat. How do I skip processing them inside the job? The job errors out at the Sequential Stage itself, bec the rows(3 lines) are not in the mentioned format(Columns mentioned in Sequential Stage).
Are you suggesting that I use the "Incomlpete Column option - Discard or Retain and Warn" and then some how filter them in the Transformer.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yes, I would think you should be able to use the Incomplete Column option set to a none 'warning or error' option (like Replace) and then constrain your Transformer to only pass a row on where a 'key' field (say the first) is not null. Pretty sure I've done that in the past. :?

Dont' worry about the cat right off the bat, take a single file (or perhaps just a small number of records from a real file) and see if you can make it work for that first in a test bed job.
-craig

"You can never have too many knives" -- Logan Nine Fingers
news78
Participant
Posts: 44
Joined: Fri Jul 07, 2006 1:37 pm

Post by news78 »

chulett wrote:Yes, I would think you should be able to use the Incomplete Column option set to a none 'warning or error' option (like Replace) and then constrain your Transformer to only pass a row on where a 'key' field (say the first) is not null. Pretty sure I've done that in the past. :?

Dont' worry about the cat right off the bat, take a single file (or perhaps just a small number of records from a real file) and see if you can make it work for that first in a test bed job.
OK. Replace works. I tried with one file. Now am trying the cat thing.
Files are:
a_091907.csv
b_091907.csv
c_091907.csv
So in Sequential Stage in Filter option: cat *_091907.csv
What do I specify in the File Name option
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

If you do it in the Filter, there is no filename. You cat them to standard out and the stage reads it as a stream. However, since it requires a value, I would put something like '/dev/null' as the 'Filename'.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply