Issue Reading in a Sequential File with trailing blanks

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
iq_etl
Premium Member
Premium Member
Posts: 105
Joined: Tue Feb 08, 2011 9:26 am

Issue Reading in a Sequential File with trailing blanks

Post by iq_etl »

We are having issues reading in Sequential Files with blank values in the final columns.

For example, lets say I have a Sequential File with 5 columns:

Col_1 CHAR(5)
Col_2 CHAR(5)
Col_3 CHAR(1)
Col_4 CHAR(2)
Col_5 CHAR(6)

If my record has values in Col_1, Col_2, and Col_3, but Col_4 and Col_5 are empty, DataStage is not reading in the record properly.

If I specify the Sequential File as follows:

Record delimiter = UNIX newline
Record length = fixed

I get the following error:

FileIn,0: Bad record delimiter after fixed-length record: expected "
", got " "

It is reading beyond the UNIX newline and into the next record.

When I examine my dataset with a Hex viewer, the UNIX newline is where it's supposed to be.

If I change the Sequential File to specify, only the Record delimiter (and remove the Record length specification), I get the following error:

FileIn,0: Input buffer overrun at field "FE_PRINT_ALTERNATE_SW", at offset: 103

That field is the first blank field of the final three fields and the offset is the first byte of that field.

When I look at the record with the Hex viewer, this observation is confirmed. Starting from that field on, the remaining fields are blank.

If I edit the Sequential File and add a final byte filled with "X" (a dummy field), DataStage reads in the file and processes the data.

My problem is that we are in the midst of a conversion and we have hundreds of sequential files created by outside programmers that potentially have blank values at the ends of some of the incoming records. We would like to be able to read in the Sequential Files without having to artificially add a dummy field to the end of them.

Does anyone have any suggestions?
Last edited by iq_etl on Tue Nov 19, 2013 1:33 pm, edited 1 time in total.
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

I'm just a bit confused by your description. Are the "blank" fields actually filled with spaces to the length shown, are they null fields, or do they have only one space where more are expected?

From the rest of your description, one workaround is to read the file as a single Varchar column (length max to the expected fixed-lenght record) with Unix newline as the record delimiter. From there you'll have to parse the single column to the required columns for each file.

It's not a good idea, just the one that I know can work.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
iq_etl
Premium Member
Premium Member
Posts: 105
Joined: Tue Feb 08, 2011 9:26 am

Post by iq_etl »

FranklinE wrote:I'm just a bit confused by your description. Are the "blank" fields actually filled with spaces to the length shown, are they null fields, or do they have only one space where more are expected?
Franklin, they are filled with spaces to the length shown.
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

Franklin, they are filled with spaces to the length shown.
I was afraid that was the answer.

I suggest you verify as best you can all the control characters in your sequential files, and make sure the stage settings are accurate. One possibility is removing the record length = fixed setting and see how that works, because it might be based on including control characters.

That's my best guess based on experience, not actual advice.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

I was so intrigued by this one that I actually took a few minutes during lunch to write a quick test. On my system it works perfectly.

I have sequential file defined as delimiter "None", quotes "None", Final Delimiter = "End".

I wrote and read 5 char fields, with data in first field, blanks in other fields. Data came back clean.

When you looked at the file in Hex, did it show 0x20 for the spaces? Did any of the data fields contain 0x00?

Do you have Extended set to "Unicode" for the fields or NCHAR as a data type?
Last edited by asorrell on Tue Nov 19, 2013 3:29 pm, edited 1 time in total.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
iq_etl
Premium Member
Premium Member
Posts: 105
Joined: Tue Feb 08, 2011 9:26 am

Post by iq_etl »

I owe an apology to Franklin. I was mistaken. The UNIX Newline comes after the last filled position, so the remaining fields seem to be truncated.

I tried what you specified above (delimiter "None", quotes "None", Final Delimiter = "End") and I got the Input buffer overrun error again.

I think I need to go back and look at my incoming dataset to determine why everything is truncated after the last filled field.

Thank you to everyone. This has been most helpful!
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Using my settings for the file will only work if the fields were written out correctly, which you've just confirmed it wasn't. I suspect the output fields may have been empty VARCHAR fields instead of CHAR. If that's the case, then they will need to be switched to CHAR and space-filled or to use a field delimiter before you can read it correctly.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

Thank you, no apology needed. I'm a firm believer in at least one pair of fresh eyes on a problem, even if all they do is help your eyes look better.

I once had to debug a complex IF statement in Cobol -- didn't write it, don't ask why EVALUATE wasn't used instead -- that was four 8x11 pages long (10 font). I printed it and used multiple colored pens to keep track of all the else-then branches, and not so silently cursed at it for three hours.

A fellow developer came in and in 60 seconds found the problem: a misplaced period. From then til now, I watch the clock. If I haven't found the problem in 20 minutes, I ask for help. 8)
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
Post Reply