Page 1 of 1

Processing a sequential file with varying anumber of columns

Posted: Thu Jun 12, 2008 9:07 pm
by kld05
What is the best way to process a file with a varying numer of columns. I'm thinking of definining the metadata as one very large varchar column and then using the field command to parse the fields that I need. Is this an acceptable approach? I'm concerned about performance. How is space allocated when you define a column say varchar 9999? Does it does the equivalent of a malloc and then return leftover memory after the fact?

Posted: Thu Jun 12, 2008 11:32 pm
by ray.wurlod
Reading as a single column is an excellent approach. Investigate using a Column Import stage to parse the record - it may be more efficient than multiple Field() functions in a Transformer stage.

Also investigate using a Schema File with a Sequential File stage, rather than explicitly defined columns. You need to enable runtime column propagation in this case.