Complex File Stage: variable block files

michaeld · Post by **michaeld** » Tue May 06, 2008 11:51 am

What is the trick to load in variable block data using the complex file stage? I see the variable block dropdown option, but I do not know how to import a variable block layout. Has anybody done this before?

ray.wurlod · Post by **ray.wurlod** » Tue May 06, 2008 4:22 pm

The COBOL FD importer should handle it automatically. You probably end up with a tagged subrecord in the schema definition.

michaeld · Post by **michaeld** » Wed May 07, 2008 8:02 am

I wish it did, but it doesn't. If I merge two copy books together and import them then it says that the record lenght of one does not match the record lenght of the other.

Each record definition is in a separate copy book. They all have different record lengths. The CFF stage in DataStage version 8 has a tab that lets you specify multiple record layouts, but it's not there in version 7.5. We are planning on using the mainframe utility FILE-AID to create one file per schema, but this is a workaround that adds overhead to the mainframe.

ray.wurlod · Post by **ray.wurlod** » Wed May 07, 2008 3:45 pm

Yes, DataStage expects constant record length. If FILE AID is too burdensome, could the application create a FILLER to make the shorter record the same length as the longer?

Otherwise you're going to have to have DataStage read and combine the two files.

michaeld · Post by **michaeld** » Thu May 08, 2008 12:38 pm

We can not modify the source. It is a shared among many legacy applications. We're on a tight timeline so using FILE-AID seem like the best solution for now. Thanks for you help though.

ray.wurlod · Post by **ray.wurlod** » Thu May 08, 2008 3:42 pm

Simple DataStage job - read each line as VarChar and output as Char. Your filler is now done.

michaeld · Post by **michaeld** » Fri May 09, 2008 9:57 am

We are dealing with fixed width records so there is no record delimiter. the problem is that we do not know how big the record is until we read it. So the alignment will get messed up. For example

Record type A: 5 bytes
Record type B: 3 bytes
Position 1: identifier of record type

sample data with 5 records:
A1234A1234B12B12A1234

read in 5 byte chunks (because it's the longest record width)

A1234 - so far so good.
A1234 - so far so good.
B12B1 - we see that it is type B so we truncate the last 2 bytes
2A123 - now the allignment is messed up and the data is corrupt
4 - now datastage reports a short read and gives an error

Maybe what we could do is read it in the smallest record lenght varchar chunks and use a stage variable to hold a copy of the previous record. That way we can combine multiple chunks in to a signle record of the correct size based on the record type.

in: A12 sv: out:
in: 34A sv:A12 out: A1234
in: 123 sv:A out:
int 4B1 sv:A123 out:A1234
int 2B1 sv:B1 out:B12
int 2A1 sv:B1 out: B12
int 234 sv:A1 out:A1234

michaeld · Post by **michaeld** » Fri May 09, 2008 12:46 pm

ok, so I tried implementing what I listed above and it works except that I can't seem to load a file in blocks when the file size is not divisible by the block size. I'm using a sequential file stage with record-type=implicit and field delimiter=none.

Eaxmple: 5 bytes of data in file, 2 byte field size.

If I set the field type to CHAR(2) in the sequential file stage then it rejects the last row.

If I set the field type to VARCHAR(2) in the sequential file stage then it only loads the first record and skips the rest.

Any ideas how to get it to load the last record in to the CHAR field and pass the missing bytes or load the records in to a varchar field?

ray.wurlod · Post by **ray.wurlod** » Fri May 09, 2008 5:03 pm

You could pre-process the file to add terminators. In DataStage BASIC you would use ReadBlk to read one byte (record type) then ReadBlk to read the requisite number of bytes for the remainder of that data. Use WriteSeq to write the result with an automatically-added line terminator.

You could so the same in C or C++ - processing the file sequentially, reading the record type byte then reading the apposite number of bytes to write into the target file followed by a line terminator.