Reading Multiformat Records

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
parimi123
Participant
Posts: 12
Joined: Fri Nov 04, 2005 9:43 am
Location: Atlanta

Reading Multiformat Records

Post by parimi123 »

The input sequential file has both insert and delete records. These records are of variable length and fields are PIPE (|) delimited. The first field of the record indicates if the record is a delete record or insert/update record.
I know separating the records using an awk script and I don't want to that.

Given below is the sample data.

DEL|KEY
INS|KEY|FIELD1|FIELD2|
INS|KEY|FIELD1|FIELD2|
DEL|KEY

I would like to do one job which would read both the record layouts and process accordingly.

Thanks in advance for any inputs.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Method 1. Define a four-column record schema, using which you read the file using a Sequential File stage that has a reject link. INS records will pass, DEL records will not, and be sent down the reject link. There place a Column Import stage to re-parse the raw string with a two-column record schema.

Method 2. Define the record to be a single VarChar. In an immediately-following Transformer stage use Field() functions to parse the records. This stage has two outputs constrained on the first three characters (or first field) of the input row. This should be evaluate in a stage variable, as you may also need it as an output column value, and prefer to calculate it only once.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply