Special Character reading issue
Moderators: chulett, rschirm, roy
When you have an issue like this you start a new post rather then reply to anything similar you can find. I've taken the liberty of splitting your post out on its own and deleting the others so we can have a conversation in one place... here.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Premium Member
- Posts: 20
- Joined: Tue Jun 22, 2010 9:02 am
You will first need to find out which EBCDIC the original data is encoded in. There are many variants and without knowing which one you will not be able to convert the special characters into any ASCII representation.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Premium Member
- Posts: 20
- Joined: Tue Jun 22, 2010 9:02 am
Without knowing which EBCDIC character set you are sourcing from you cannot correctly convert your characters. Do you know what character/glyph the "?" is supposed to map to? If so, you can check the source EBCDIC binary value and look through the various EBCDIC implementations to see which one matches.
It will be easier and quicker to get your host people to tell you which character set they used.
It will be easier and quicker to get your host people to tell you which character set they used.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Premium Member
- Posts: 20
- Joined: Tue Jun 22, 2010 9:02 am
The binary value of '@' which not getting read is "0110110 0110100 " . So how can this help?
I can ready every data only problem with this kind of character using sequentail stage as source with below settings
NLS MAP:ISO_8859-1:1987
Record Length=fixed
Field Defaults Delimiter =None
Type Defaults==>
genral
Character Set=EBCDIC
Byte Order= Big endian
Data FOrmt=Binary
I have asked this to mainframe resoruce " EBCDIC character set " and as per them its is EBCDIC fixed width file. So what do they need to check in mainframe to understand the EBCDIC character set in mainframe. ANy help from any one?
I can ready every data only problem with this kind of character using sequentail stage as source with below settings
NLS MAP:ISO_8859-1:1987
Record Length=fixed
Field Defaults Delimiter =None
Type Defaults==>
genral
Character Set=EBCDIC
Byte Order= Big endian
Data FOrmt=Binary
I have asked this to mainframe resoruce " EBCDIC character set " and as per them its is EBCDIC fixed width file. So what do they need to check in mainframe to understand the EBCDIC character set in mainframe. ANy help from any one?
Last edited by pran.praveen on Thu Oct 17, 2013 3:45 am, edited 1 time in total.
Praveen
There are at least 20 different EBCDIC variants around, probably many more. The standard LATIN-1 characters are the same in most of them, but the extra characters aren't.
The binary code you gave is 54 decimal, which the common EBCDIC maps as a special code NBS (numeric backspace). The default position of the "@" character is decimal 124.
If you are certain that in your EBCDIC the "@" sign is represented by 54 decimal, then you need to find an EBCDIC variant that has that mapping, and when you define that EBCDIC as your input then DataStage will correctly convert the character.
As it is, there is no glyph for EBCDIC 54 and thus it correctly gets mapped to "?" by DataStage.
The binary code you gave is 54 decimal, which the common EBCDIC maps as a special code NBS (numeric backspace). The default position of the "@" character is decimal 124.
If you are certain that in your EBCDIC the "@" sign is represented by 54 decimal, then you need to find an EBCDIC variant that has that mapping, and when you define that EBCDIC as your input then DataStage will correctly convert the character.
As it is, there is no glyph for EBCDIC 54 and thus it correctly gets mapped to "?" by DataStage.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Premium Member
- Posts: 20
- Joined: Tue Jun 22, 2010 9:02 am
I don't know how often I need to repeat this answer - nobody can give you a definitive answer. The characters you posted are not part of the standard characters in EBCDIC (those are the letters a-z,A-z,0-9, and a couple of punctuation characters. All the rest can be different.
As mentioned before, you need to find one of these characters in your source, get the numeric value of that character and then check the EBCDIC table to see if you have a match.
As stated earlier, the "@" character, if it really is mapped at 54 decimal in your EBCDIC is non-standard and you need to find out which EBCDIC is being used.
Does your data come from a DB2 database on the Host? From a text editor? Which OS and version are you using? I am sure that if you speak with an operator that he or she can determine which EBCDIC variant is being used.
As mentioned before, you need to find one of these characters in your source, get the numeric value of that character and then check the EBCDIC table to see if you have a match.
As stated earlier, the "@" character, if it really is mapped at 54 decimal in your EBCDIC is non-standard and you need to find out which EBCDIC is being used.
Does your data come from a DB2 database on the Host? From a text editor? Which OS and version are you using? I am sure that if you speak with an operator that he or she can determine which EBCDIC variant is being used.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Premium Member
- Posts: 20
- Joined: Tue Jun 22, 2010 9:02 am
-
- Premium Member
- Posts: 20
- Joined: Tue Jun 22, 2010 9:02 am
Arnd - Yes data is coming from DB2 database. And I see these characters over there when I query it. It is indeed surprising to see disability of datastage to read it properly because datastage is highly compatiable with DB2 as both are IBM products. I need to check with mainframe guy to see which variant of EBCDIC it belongs to. Hopefully I will get answer ASAP. Appreciate your time to have look at this issue
Praveen
-
- Premium Member
- Posts: 20
- Joined: Tue Jun 22, 2010 9:02 am
Hi Guys,
I have found a solution to this problem, rather I would say workaround. I was not able to read these special character in EBCDIC format. So I have made mainframe to send me file in ASCII format . I did face problem while reading decimal fields of this ascii file but did some setting changes to read the decimal fields. So finally it looks good . Thanks to people who helped me with some clues...Have a great day and greater week ahead...
I have found a solution to this problem, rather I would say workaround. I was not able to read these special character in EBCDIC format. So I have made mainframe to send me file in ASCII format . I did face problem while reading decimal fields of this ascii file but did some setting changes to read the decimal fields. So finally it looks good . Thanks to people who helped me with some clues...Have a great day and greater week ahead...
Praveen
-
- Premium Member
- Posts: 20
- Joined: Tue Jun 22, 2010 9:02 am
Hi Guys ,
I have found one more solution to this problem. Just wanted to share , may be it will help some of you. Read the EBCDIC file with following settings. In format tab set String==>Export EBCDIC as ASCII option and In general Character Set = EBCDIC and Byte order=Big endian , read the decimal columns as binary and packed also colums with special character should be read as ASCII. Later in transformer use StringToUString(Sourcecolumn,'IBM01142') for those columns with special character and Job NLS should be "ISO_8859-1:1987". Hope this helps..
I have found one more solution to this problem. Just wanted to share , may be it will help some of you. Read the EBCDIC file with following settings. In format tab set String==>Export EBCDIC as ASCII option and In general Character Set = EBCDIC and Byte order=Big endian , read the decimal columns as binary and packed also colums with special character should be read as ASCII. Later in transformer use StringToUString(Sourcecolumn,'IBM01142') for those columns with special character and Job NLS should be "ISO_8859-1:1987". Hope this helps..
Praveen