Problem reading binary separators in cobol char field

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
manuel.gomez
Premium Member
Premium Member
Posts: 291
Joined: Wed Sep 26, 2007 11:23 am
Location: Madrid, Spain

Problem reading binary separators in cobol char field

Post by manuel.gomez »

Hello guys,

Let me introduce my scenario and the problem I am having.

I am reading a host file, using FTP Enteprise stage. At the moment I a just performing a test: reading a file with FTP enteprise, pass it thorugh other FTP enterprise to write to same host. Result should be an exact copy from the source.

Project NLS is defined as UTF-8, and I can perfectly read special Spanish characters.

But there is a slight difference in the result file: a name field (read as char) contians binary separators (to stablish cuts between name and surname). These separators are not being written in the same way in destination.

When reading/writting this particular field as binary format, I get same result in destination than read from source. Problem comes when read the field as character.
Char field is NOT defined as Unicode (if so, I get reading errors)

I just made a final test: read with FTP and write to a sequential file, coded as EBCDIC. Downloaded this text file to local computer, opened with UltraEdit and checked hexadecimal character for this separator, code was '1A'

We asked for mainframers for host codepage, they answered it was 1145.
I tried to read with FTP enterprise change its particular NLS to IBM001145. Stage failed to read, as the file did not exist in the host anymore. Returned message was:
wrong uri syntax for the file ....
Same error message for NLS = ibm-1145_P100-1997
Changing back to UTF8, the stage read again.

My main objective is to be able to read the field as char, write as char and get same value that I have in source.

Any suggestions?
manuel.gomez
Premium Member
Premium Member
Posts: 291
Joined: Wed Sep 26, 2007 11:23 am
Location: Madrid, Spain

Re: Problem reading binary separators in cobol char field

Post by manuel.gomez »

manuel.gomez wrote: I just made a final test: read with FTP and write to a sequential file, coded as EBCDIC. Downloaded this text file to local computer, opened with UltraEdit and checked hexadecimal character for this separator, code was '1A'
I was wrong here

The hex codes for these separators are: FB and FE
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

Hello, Manuel. You seem to be having much fun dealing with your mainframe system. :wink:
These separators are not being written in the same way in destination.
Can you be more specific? Do the characters get dropped, or are they being written as different characters/hex-values?

I should warn you that I have little experience with non-English character sets. That said, I'm concerned about the FB and FE "separators" your system is using. Is there a reason for this choice, and also why are there two of them? Particularly for name fields, there is a wide choice of display characters that do not appear in names that can be used as separators/delimiters. Using one of them makes reading the data much easier. Character set translations should not be made difficult with non-standard usages.

Finally, it would help to know what your actual goal is here. I understand wanting to test, but if your actual job is going to read from a dataset and write to another one, I wonder why DataStage is being used instead of a local, mainframe process. It's possible that if your actual destination is a platform other than the mainframe, the separator changing may not be a problem.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
manuel.gomez
Premium Member
Premium Member
Posts: 291
Joined: Wed Sep 26, 2007 11:23 am
Location: Madrid, Spain

Post by manuel.gomez »

I would not use "fun" as the more appropiate word, hahaha

I dont mean to be rude at all (thanks for bothering answer me), apologizes if my English sounds so, but "philosophy" questions as you are raising are out of the scope of this topic. It is true what you are saying and very reasonable. I already warned about this:
- "Build your process with mainframe coding", but we have to do this data integration project with ETL, which, by the way, allows much shorter develop times.

- "Use different separators, such as # or |, that are ascii readable", but unfortunately, we dont have any possibility to ask for other different coding in our source files

- "This data can be read/written in binary format if you are just passing it through", yes, but we want to know what's going on, and why characters cannot be read

So let's focus on the problem, no matter the meaning.

Answering your question, what it is being written at destination (instead of source character) is a dot, hexadecimal 1A
Post Reply