Processing a sequential file with varying anumber of columns

kld05 · Post by **kld05** » Thu Jun 12, 2008 9:07 pm

What is the best way to process a file with a varying numer of columns. I'm thinking of definining the metadata as one very large varchar column and then using the field command to parse the fields that I need. Is this an acceptable approach? I'm concerned about performance. How is space allocated when you define a column say varchar 9999? Does it does the equivalent of a malloc and then return leftover memory after the fact?

ray.wurlod · Post by **ray.wurlod** » Thu Jun 12, 2008 11:32 pm

Reading as a single column is an excellent approach. Investigate using a Column Import stage to parse the record - it may be more efficient than multiple Field() functions in a Transformer stage.

Also investigate using a Schema File with a Sequential File stage, rather than explicitly defined columns. You need to enable runtime column propagation in this case.