Reusable jobs for Fixed record width sources

gsherry1 · Post by **gsherry1** » Tue Aug 23, 2005 7:23 am

I have many files that are all fixed format. They differ in record length but the first 3 columns are the same. I wish to create a job that manipulates only the first few fields and can run on all of my data sources. However, I seem unable to do this in Server edition as I don't know how to setup the column metadata for the remaining fields as they differ in each case.

For delimited files I solved this problem by setting up a varchar that covered the entire record, and parsed the individual common columns in transform. This solution won't work for fixed files.

Is this one of the situations that RCP in PX was meant to handle? Can it be done in Server Edition?

TIA.

ArndW · Post by **ArndW** » Tue Aug 23, 2005 7:42 am

Yes, this is where column propagation works wonders in Px. You could also declere the sequential input with just the initial 3 columns and then click on the "suppress row truncation warnings" and ignore the rest of the input columns.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Tue Aug 23, 2005 8:10 am

You can define the file to be of varying length to achieve this.

gsherry1 · Post by **gsherry1** » Tue Aug 23, 2005 11:02 am

ArndW wrote:Yes, this is where column propagation works wonders in Px. You could also declere the sequential input with just the initial 3 columns and then click on the "suppress row truncation warnings" and ignore the rest of the input columns.

Yes that would work, but it implies that I don't want the trailing columns to be written to my output file, and also implies that my fixed data sources have newlines to terminate the record.

chulett · Post by **chulett** » Tue Aug 23, 2005 11:16 am

If you've got a true fixed width file with no record terminators, I don't see how you can expect anything to be able to handle that in any kind of 'reusable' fashion.

It needs to use the metadata to know how many bytes to read in for each record... and if that can differ on each file...

Perhaps in PX with a custom BuildOp and then only if there's something in the first few bytes of the record that can be used to determine the length of the rest of the record.

ray.wurlod · Post by **ray.wurlod** » Tue Aug 23, 2005 5:19 pm

Define a table definition consisting of four columns. The fourth column is the remainder of the record; figure out its width from the metadata. Or, if it's variable length as you say, then there must be a line terminator, and you declare the data type as VarChar.

You can read the file using this table definition in your Sequential File stage and do whatever you like (e.g. discard) the fourth column.

Get the stage property "line terminator" correct - UNIX-style, DOS-style or None.