Complex Flat File stage - variable record lengths

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
djbarham
Participant
Posts: 34
Joined: Wed May 07, 2003 4:39 pm
Location: Brisbane, Australia

Complex Flat File stage - variable record lengths

Post by djbarham »

I'm having a problem variable record lengths on the CFF stage.

I think my problem is the same as this thread, but I don't think respondants there really understood the problem:

viewtopic.php?p=391304

I have several different record types and I have the meta data defined for each record type. All records end with a Unix newline.

The problem is that some of the fields on the end of some records are optional and as a result, some input records end before the end of the definition. In this case, these records are rejected with an input buffer overrun error.

It doesn't seem to matter how I define the fields that are missing from the input record (nullable or set a default), I still get the error.

Whether or not the optional fields exist does not depend on the record type.

For example, if I have record types A, B and C.

Record types A and B are always filled and work fine. Record type C will be rejected if the record is shorter than the metadata.

If I remove extra fields from the definition of record tye C, then these records no longer give an error. As soon as I define even a single field that goes past the physical end of the shortest record, these shorter records are rejected. The longer type C records still come through.

Is there a solution to this or is this a limitation of the CFF stage?
djbarham
Participant
Posts: 34
Joined: Wed May 07, 2003 4:39 pm
Location: Brisbane, Australia

Re: Complex Flat File stage - variable record lengths

Post by djbarham »

Pending a resolution of this with the CFF stage, I think build another job to preprocess the file by padding out records to their respective maximum lengths.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

CFF does not readily support a single record type having varying numbers of columns, outside of the support of the "OCCURS DEPENDING ON" COBOL FD clause.

The solution you mention (pre-processing the file) will likely work by creating your missing columns for you. Another possibility, should your file format support it, is to define the 'C' record type as a single variable-length column in CFF then use a transformer to parse out the columns which are present while providing default values for the missing columns.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
djbarham
Participant
Posts: 34
Joined: Wed May 07, 2003 4:39 pm
Location: Brisbane, Australia

Post by djbarham »

jwiles wrote:CFF does not readily support a single record type having varying numbers of columns, outside of the support of the "OCCURS DEPENDING ON" COBOL FD clause.
Hmmm ... I wouldn't call it a varying number of columns so much as optional columns. ;)

If it hits the end of record before the end of the definition, I just want it to default the remaining columns. Enhancement request maybe. :)
The solution you mention (pre-processing the file) will likely work by creating your missing columns for you.
Yep, already built this and it works fine. It just pads out every record to a predefined length based on the record type.
Another possibility, should your file format support it, is to define the 'C' record type as a single variable-length column in CFF then use a transformer to parse out the columns which are present while providing default values for the missing columns.
Yeah, ah ... no, that is what i was trying to avoid.

Thanks, you have confirmed what I was beginning to suspect - that the CFF stage cannot handle records shorter than the record definition (it does not seem to mind if they are longer).
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Varying columns/optional columns: Means the same thing :)

A variation of the second solution would be to make the last always-there column an varchar and use a transformer to parse it and your "optional" columns (or a transformer to pad and a column importer to parse). I generally prefer this solution because it's relatively simple and doesn't require an addition pass of the data file (and hence additional storage for the modified file). That can make a difference as data volumes increase.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

The way we have got around this is by setting the FTP transfer mode to "linemode=rdw" on the Mainframe that sends us the file, and defining two 2-byte bigendian fields "RDW_LEN" and "RDW_FILLER" at the beginning of the record. Then, use RDW_LEN in the Records ID tab to identify what kind of record it is based on the length. Bad luck if you have two different record types of the same length...
Phil Hibbs | Capgemini
Technical Consultant
Post Reply