Page 1 of 1

CSV

Posted: Mon Sep 19, 2011 6:46 am
by jellecommerce
I want to load the data of a CSV file into a DB2 database. The CSV file is quite complex as it contains different section and each sections has a different number of columns. So records of a csv file could be like:

a,b,c
d,f
g,h,i,j,k,l,m

Currently I am making use of the Sequential File Stage, I've specified the columns to the maximum number of columns. In the example above, that would be 7 columns of datatype Varchar(255) with Nullable ste to Yes.

But now I'm having the problem that only those records are coming through which actually have 7 columns. So the records that do not have 7 columns are not being read.

Why is this?
I want to load all data from the CSV file into DB2 no matter how many columns. How can I do this?

Posted: Mon Sep 19, 2011 7:06 am
by chulett
The PX Sequential File stage is quite militant about file structure and I'm not sure there is a way to handle that with the properties it has. A Server job would handle that quite easily, however. Be curious what others say.

For PX, you may need to read the file as one long string and then parse the fields out "in job". Is there some way to recognize each section, the first value or counting the delimiters?

Posted: Mon Sep 19, 2011 7:27 am
by chulett
I'd look at doing the detection in a Transformer. You'll need to recognize the section you are in and then control your outputs via constraints. Stage variables would help with that. Each section specific output link could then use a Column Import stage to parse them.

What's the bottom line here, how does it all come together? Does each section have a different target table?

Posted: Mon Sep 19, 2011 7:36 am
by jellecommerce
Indeed each section has a different end target. My first idea was to load the flat file to one staging table. And then I would use this staging table to load to the differtent target tables.

But it might be better to load every section to a different target staging table already.

Posted: Mon Sep 19, 2011 7:53 am
by chulett
I don't see a burning need to stage everything first and would probably just suggest you load them directly. However, is there RI between the target tables? That would complicate things...

Posted: Mon Sep 19, 2011 3:34 pm
by ray.wurlod
Server jobs are much more "sensible" (less militant) about handling data like this.

Or you can use a server Sequential File stage in a server Shared Container in a parallel job to leverage the same flexibility.

Posted: Tue Sep 20, 2011 1:59 am
by jellecommerce
I have created a Server Shared Container with a Sequential File Stage to read my flat file. But I get an error, which says that I have a required column missing.

I have specified 41 columns, because the longest record in the CSV file is 41 fields long. But the first record for example contains only one field, so now he's complaiining that he he columns missing.

How can I make the Sequential File Stage in a way that he doesn't care if he find only one column or 41 columns?

Thank you kindly for your help.

Posted: Tue Sep 20, 2011 3:00 am
by battaliou
Craig already suggested you read the data as 1 column in a string and process it in a transformer. Is there a problem here?

Posted: Tue Sep 20, 2011 5:36 am
by ray.wurlod
Scroll to the right in the Columns grid. There you will find more rules, including "contains terminator" and "missing column". For example you could have the stage output NULL if the column is missing.

Posted: Tue Sep 20, 2011 6:54 am
by chulett
You'll need to set all except the first field's "Missing" property to "Replace" so that it doesn't error and replaces each missing column with a null.