CSV

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
jellecommerce
Participant
Posts: 4
Joined: Mon Oct 11, 2010 1:29 am

CSV

Post by jellecommerce »

I want to load the data of a CSV file into a DB2 database. The CSV file is quite complex as it contains different section and each sections has a different number of columns. So records of a csv file could be like:

a,b,c
d,f
g,h,i,j,k,l,m

Currently I am making use of the Sequential File Stage, I've specified the columns to the maximum number of columns. In the example above, that would be 7 columns of datatype Varchar(255) with Nullable ste to Yes.

But now I'm having the problem that only those records are coming through which actually have 7 columns. So the records that do not have 7 columns are not being read.

Why is this?
I want to load all data from the CSV file into DB2 no matter how many columns. How can I do this?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

The PX Sequential File stage is quite militant about file structure and I'm not sure there is a way to handle that with the properties it has. A Server job would handle that quite easily, however. Be curious what others say.

For PX, you may need to read the file as one long string and then parse the fields out "in job". Is there some way to recognize each section, the first value or counting the delimiters?
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I'd look at doing the detection in a Transformer. You'll need to recognize the section you are in and then control your outputs via constraints. Stage variables would help with that. Each section specific output link could then use a Column Import stage to parse them.

What's the bottom line here, how does it all come together? Does each section have a different target table?
-craig

"You can never have too many knives" -- Logan Nine Fingers
jellecommerce
Participant
Posts: 4
Joined: Mon Oct 11, 2010 1:29 am

Post by jellecommerce »

Indeed each section has a different end target. My first idea was to load the flat file to one staging table. And then I would use this staging table to load to the differtent target tables.

But it might be better to load every section to a different target staging table already.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I don't see a burning need to stage everything first and would probably just suggest you load them directly. However, is there RI between the target tables? That would complicate things...
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Server jobs are much more "sensible" (less militant) about handling data like this.

Or you can use a server Sequential File stage in a server Shared Container in a parallel job to leverage the same flexibility.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jellecommerce
Participant
Posts: 4
Joined: Mon Oct 11, 2010 1:29 am

Post by jellecommerce »

I have created a Server Shared Container with a Sequential File Stage to read my flat file. But I get an error, which says that I have a required column missing.

I have specified 41 columns, because the longest record in the CSV file is 41 fields long. But the first record for example contains only one field, so now he's complaiining that he he columns missing.

How can I make the Sequential File Stage in a way that he doesn't care if he find only one column or 41 columns?

Thank you kindly for your help.
battaliou
Participant
Posts: 155
Joined: Mon Feb 24, 2003 7:28 am
Location: London
Contact:

Post by battaliou »

Craig already suggested you read the data as 1 column in a string and process it in a transformer. Is there a problem here?
3NF: Every non-key attribute must provide a fact about the key, the whole key, and nothing but the key. So help me Codd.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Scroll to the right in the Columns grid. There you will find more rules, including "contains terminator" and "missing column". For example you could have the stage output NULL if the column is missing.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You'll need to set all except the first field's "Missing" property to "Replace" so that it doesn't error and replaces each missing column with a null.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply