Complex File Stage: variable block files

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
michaeld
Premium Member
Premium Member
Posts: 88
Joined: Tue Apr 04, 2006 8:42 am
Location: Toronto, Canada

Complex File Stage: variable block files

Post by michaeld »

What is the trick to load in variable block data using the complex file stage? I see the variable block dropdown option, but I do not know how to import a variable block layout. Has anybody done this before?
Mike
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The COBOL FD importer should handle it automatically. You probably end up with a tagged subrecord in the schema definition.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
michaeld
Premium Member
Premium Member
Posts: 88
Joined: Tue Apr 04, 2006 8:42 am
Location: Toronto, Canada

Post by michaeld »

I wish it did, but it doesn't. If I merge two copy books together and import them then it says that the record lenght of one does not match the record lenght of the other.

Each record definition is in a separate copy book. They all have different record lengths. The CFF stage in DataStage version 8 has a tab that lets you specify multiple record layouts, but it's not there in version 7.5. We are planning on using the mainframe utility FILE-AID to create one file per schema, but this is a workaround that adds overhead to the mainframe.
Mike
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes, DataStage expects constant record length. If FILE AID is too burdensome, could the application create a FILLER to make the shorter record the same length as the longer?

Otherwise you're going to have to have DataStage read and combine the two files.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
michaeld
Premium Member
Premium Member
Posts: 88
Joined: Tue Apr 04, 2006 8:42 am
Location: Toronto, Canada

Post by michaeld »

We can not modify the source. It is a shared among many legacy applications. We're on a tight timeline so using FILE-AID seem like the best solution for now. Thanks for you help though.
Mike
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Simple DataStage job - read each line as VarChar and output as Char. Your filler is now done.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
michaeld
Premium Member
Premium Member
Posts: 88
Joined: Tue Apr 04, 2006 8:42 am
Location: Toronto, Canada

Post by michaeld »

We are dealing with fixed width records so there is no record delimiter. the problem is that we do not know how big the record is until we read it. So the alignment will get messed up. For example

Record type A: 5 bytes
Record type B: 3 bytes
Position 1: identifier of record type

sample data with 5 records:
A1234A1234B12B12A1234

read in 5 byte chunks (because it's the longest record width)

A1234 - so far so good.
A1234 - so far so good.
B12B1 - we see that it is type B so we truncate the last 2 bytes
2A123 - now the allignment is messed up and the data is corrupt
4 - now datastage reports a short read and gives an error



Maybe what we could do is read it in the smallest record lenght varchar chunks and use a stage variable to hold a copy of the previous record. That way we can combine multiple chunks in to a signle record of the correct size based on the record type.

in: A12 sv: out:
in: 34A sv:A12 out: A1234
in: 123 sv:A out:
int 4B1 sv:A123 out:A1234
int 2B1 sv:B1 out:B12
int 2A1 sv:B1 out: B12
int 234 sv:A1 out:A1234
Mike
michaeld
Premium Member
Premium Member
Posts: 88
Joined: Tue Apr 04, 2006 8:42 am
Location: Toronto, Canada

Post by michaeld »

ok, so I tried implementing what I listed above and it works except that I can't seem to load a file in blocks when the file size is not divisible by the block size. I'm using a sequential file stage with record-type=implicit and field delimiter=none.

Eaxmple: 5 bytes of data in file, 2 byte field size.

If I set the field type to CHAR(2) in the sequential file stage then it rejects the last row.

If I set the field type to VARCHAR(2) in the sequential file stage then it only loads the first record and skips the rest.
:cry:
Any ideas how to get it to load the last record in to the CHAR field and pass the missing bytes or load the records in to a varchar field?
Mike
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You could pre-process the file to add terminators. In DataStage BASIC you would use ReadBlk to read one byte (record type) then ReadBlk to read the requisite number of bytes for the remainder of that data. Use WriteSeq to write the result with an automatically-added line terminator.

You could so the same in C or C++ - processing the file sequentially, reading the record type byte then reading the apposite number of bytes to write into the target file followed by a line terminator.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply