Processing a sequential file with varying anumber of columns

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kld05
Charter Member
Charter Member
Posts: 36
Joined: Fri Apr 28, 2006 8:12 am

Processing a sequential file with varying anumber of columns

Post by kld05 »

What is the best way to process a file with a varying numer of columns. I'm thinking of definining the metadata as one very large varchar column and then using the field command to parse the fields that I need. Is this an acceptable approach? I'm concerned about performance. How is space allocated when you define a column say varchar 9999? Does it does the equivalent of a malloc and then return leftover memory after the fact?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Reading as a single column is an excellent approach. Investigate using a Column Import stage to parse the record - it may be more efficient than multiple Field() functions in a Transformer stage.

Also investigate using a Schema File with a Sequential File stage, rather than explicitly defined columns. You need to enable runtime column propagation in this case.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply