Reading Multiformat Records

parimi123 · Post by **parimi123** » Mon Mar 26, 2007 9:03 pm

The input sequential file has both insert and delete records. These records are of variable length and fields are PIPE (|) delimited. The first field of the record indicates if the record is a delete record or insert/update record.
I know separating the records using an awk script and I don't want to that.

Given below is the sample data.

DEL|KEY
INS|KEY|FIELD1|FIELD2|
INS|KEY|FIELD1|FIELD2|
DEL|KEY

I would like to do one job which would read both the record layouts and process accordingly.

Thanks in advance for any inputs.

ray.wurlod · Post by **ray.wurlod** » Mon Mar 26, 2007 9:08 pm

Method 1. Define a four-column record schema, using which you read the file using a Sequential File stage that has a reject link. INS records will pass, DEL records will not, and be sent down the reject link. There place a Column Import stage to re-parse the raw string with a two-column record schema.

Method 2. Define the record to be a single VarChar. In an immediately-following Transformer stage use Field() functions to parse the records. This stage has two outputs constrained on the first three characters (or first field) of the input row. This should be evaluate in a stage variable, as you may also need it as an output column value, and prefer to calculate it only once.