Issue Reading in a Sequential File with trailing blanks

iq_etl · Post by **iq_etl** » Tue Nov 19, 2013 12:58 pm

We are having issues reading in Sequential Files with blank values in the final columns.

For example, lets say I have a Sequential File with 5 columns:

Col_1 CHAR(5)
Col_2 CHAR(5)
Col_3 CHAR(1)
Col_4 CHAR(2)
Col_5 CHAR(6)

If my record has values in Col_1, Col_2, and Col_3, but Col_4 and Col_5 are empty, DataStage is not reading in the record properly.

If I specify the Sequential File as follows:

Record delimiter = UNIX newline
Record length = fixed

I get the following error:

FileIn,0: Bad record delimiter after fixed-length record: expected "
", got " "

It is reading beyond the UNIX newline and into the next record.

When I examine my dataset with a Hex viewer, the UNIX newline is where it's supposed to be.

If I change the Sequential File to specify, only the Record delimiter (and remove the Record length specification), I get the following error:

FileIn,0: Input buffer overrun at field "FE_PRINT_ALTERNATE_SW", at offset: 103

That field is the first blank field of the final three fields and the offset is the first byte of that field.

When I look at the record with the Hex viewer, this observation is confirmed. Starting from that field on, the remaining fields are blank.

If I edit the Sequential File and add a final byte filled with "X" (a dummy field), DataStage reads in the file and processes the data.

My problem is that we are in the midst of a conversion and we have hundreds of sequential files created by outside programmers that potentially have blank values at the ends of some of the incoming records. We would like to be able to read in the Sequential Files without having to artificially add a dummy field to the end of them.

Does anyone have any suggestions?

FranklinE · Post by **FranklinE** » Tue Nov 19, 2013 1:27 pm

I'm just a bit confused by your description. Are the "blank" fields actually filled with spaces to the length shown, are they null fields, or do they have only one space where more are expected?

From the rest of your description, one workaround is to read the file as a single Varchar column (length max to the expected fixed-lenght record) with Unix newline as the record delimiter. From there you'll have to parse the single column to the required columns for each file.

It's not a good idea, just the one that I know can work.

iq_etl · Post by **iq_etl** » Tue Nov 19, 2013 1:36 pm

FranklinE wrote:I'm just a bit confused by your description. Are the "blank" fields actually filled with spaces to the length shown, are they null fields, or do they have only one space where more are expected?

Franklin, they are filled with spaces to the length shown.

FranklinE · Post by **FranklinE** » Tue Nov 19, 2013 2:08 pm

Franklin, they are filled with spaces to the length shown.

I was afraid that was the answer.

I suggest you verify as best you can all the control characters in your sequential files, and make sure the stage settings are accurate. One possibility is removing the record length = fixed setting and see how that works, because it might be based on including control characters.

That's my best guess based on experience, not actual advice.

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Tue Nov 19, 2013 3:25 pm

I was so intrigued by this one that I actually took a few minutes during lunch to write a quick test. On my system it works perfectly.

I have sequential file defined as delimiter "None", quotes "None", Final Delimiter = "End".

I wrote and read 5 char fields, with data in first field, blanks in other fields. Data came back clean.

When you looked at the file in Hex, did it show 0x20 for the spaces? Did any of the data fields contain 0x00?

Do you have Extended set to "Unicode" for the fields or NCHAR as a data type?

iq_etl · Post by **iq_etl** » Wed Nov 20, 2013 10:14 am

I owe an apology to Franklin. I was mistaken. The UNIX Newline comes after the last filled position, so the remaining fields seem to be truncated.

I tried what you specified above (delimiter "None", quotes "None", Final Delimiter = "End") and I got the Input buffer overrun error again.

I think I need to go back and look at my incoming dataset to determine why everything is truncated after the last filled field.

Thank you to everyone. This has been most helpful!

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Wed Nov 20, 2013 10:36 am

Using my settings for the file will only work if the fields were written out correctly, which you've just confirmed it wasn't. I suspect the output fields may have been empty VARCHAR fields instead of CHAR. If that's the case, then they will need to be switched to CHAR and space-filled or to use a field delimiter before you can read it correctly.

FranklinE · Post by **FranklinE** » Wed Nov 20, 2013 11:17 am

Thank you, no apology needed. I'm a firm believer in at least one pair of fresh eyes on a problem, even if all they do is help your eyes look better.

I once had to debug a complex IF statement in Cobol -- didn't write it, don't ask why EVALUATE wasn't used instead -- that was four 8x11 pages long (10 font). I printed it and used multiple colored pens to keep track of all the else-then branches, and not so silently cursed at it for three hours.

A fellow developer came in and in 60 seconds found the problem: a misplaced period. From then til now, I watch the clock. If I haven't found the problem in 20 minutes, I ask for help.