Special Character reading issue

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

When you have an issue like this you start a new post rather then reply to anything similar you can find. I've taken the liberty of splitting your post out on its own and deleting the others so we can have a conversation in one place... here.
-craig

"You can never have too many knives" -- Logan Nine Fingers
pran.praveen
Premium Member
Premium Member
Posts: 20
Joined: Tue Jun 22, 2010 9:02 am

Post by pran.praveen »

Hi chulett,

I understand your point however I was not getting option to start a new post, start new post option was disabled for me. So I have replied to similar kind of post. Anyways any one know solution to this problem?
Praveen
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

You will first need to find out which EBCDIC the original data is encoded in. There are many variants and without knowing which one you will not be able to convert the special characters into any ASCII representation.
sajidkp
Participant
Posts: 114
Joined: Thu Apr 30, 2009 12:17 am
Location: New Delhi

Post by sajidkp »

There can be Non Printable ASCII charecters in mainframe. Datastage may not be able to show it as is , as in mainframe. Eg : Low values (NULLS) . try to find out the Hex/ Oct value for these and do a tranformations.
Regards,
Sajid KP
pran.praveen
Premium Member
Premium Member
Posts: 20
Joined: Tue Jun 22, 2010 9:02 am

Post by pran.praveen »

Hi ArndW,

I am not sure how it is encoded in but for sure it is coming from IBM Mainframe. The output is looking like below when I peek it. That means datastage is not identifying this character. It is giving "?" sign to it. ANy suggestions??
Peek_13,1: ABC:02063700DKDK?G
Praveen
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Without knowing which EBCDIC character set you are sourcing from you cannot correctly convert your characters. Do you know what character/glyph the "?" is supposed to map to? If so, you can check the source EBCDIC binary value and look through the various EBCDIC implementations to see which one matches.

It will be easier and quicker to get your host people to tell you which character set they used.
pran.praveen
Premium Member
Premium Member
Posts: 20
Joined: Tue Jun 22, 2010 9:02 am

Post by pran.praveen »

The binary value of '@' which not getting read is "0110110 0110100 " . So how can this help?

I can ready every data only problem with this kind of character using sequentail stage as source with below settings

NLS MAP:ISO_8859-1:1987
Record Length=fixed
Field Defaults Delimiter =None
Type Defaults==>
genral
Character Set=EBCDIC
Byte Order= Big endian
Data FOrmt=Binary

I have asked this to mainframe resoruce " EBCDIC character set " and as per them its is EBCDIC fixed width file. So what do they need to check in mainframe to understand the EBCDIC character set in mainframe. ANy help from any one?
Last edited by pran.praveen on Thu Oct 17, 2013 3:45 am, edited 1 time in total.
Praveen
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

There are at least 20 different EBCDIC variants around, probably many more. The standard LATIN-1 characters are the same in most of them, but the extra characters aren't.

The binary code you gave is 54 decimal, which the common EBCDIC maps as a special code NBS (numeric backspace). The default position of the "@" character is decimal 124.

If you are certain that in your EBCDIC the "@" sign is represented by 54 decimal, then you need to find an EBCDIC variant that has that mapping, and when you define that EBCDIC as your input then DataStage will correctly convert the character.

As it is, there is no glyph for EBCDIC 54 and thus it correctly gets mapped to "?" by DataStage.
pran.praveen
Premium Member
Premium Member
Posts: 20
Joined: Tue Jun 22, 2010 9:02 am

Post by pran.praveen »

Thanks for repling quickly.For standard LATIN-1 characters what should be the NLS mapping . Basically this data is from northern Europe.
Praveen
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I don't know how often I need to repeat this answer - nobody can give you a definitive answer. The characters you posted are not part of the standard characters in EBCDIC (those are the letters a-z,A-z,0-9, and a couple of punctuation characters. All the rest can be different.

As mentioned before, you need to find one of these characters in your source, get the numeric value of that character and then check the EBCDIC table to see if you have a match.

As stated earlier, the "@" character, if it really is mapped at 54 decimal in your EBCDIC is non-standard and you need to find out which EBCDIC is being used.

Does your data come from a DB2 database on the Host? From a text editor? Which OS and version are you using? I am sure that if you speak with an operator that he or she can determine which EBCDIC variant is being used.
pran.praveen
Premium Member
Premium Member
Posts: 20
Joined: Tue Jun 22, 2010 9:02 am

Post by pran.praveen »

No its not..But my client is from northern europe.They have customers all over the world. Please suggest any solution if you have any ..It does not matter where the data is from..
Praveen
pran.praveen
Premium Member
Premium Member
Posts: 20
Joined: Tue Jun 22, 2010 9:02 am

Post by pran.praveen »

Arnd - Yes data is coming from DB2 database. And I see these characters over there when I query it. It is indeed surprising to see disability of datastage to read it properly because datastage is highly compatiable with DB2 as both are IBM products. I need to check with mainframe guy to see which variant of EBCDIC it belongs to. Hopefully I will get answer ASAP. Appreciate your time to have look at this issue
Praveen
pran.praveen
Premium Member
Premium Member
Posts: 20
Joined: Tue Jun 22, 2010 9:02 am

Post by pran.praveen »

Hi Guys,

I have found a solution to this problem, rather I would say workaround. I was not able to read these special character in EBCDIC format. So I have made mainframe to send me file in ASCII format . I did face problem while reading decimal fields of this ascii file but did some setting changes to read the decimal fields. So finally it looks good . Thanks to people who helped me with some clues...Have a great day and greater week ahead...
Praveen
pran.praveen
Premium Member
Premium Member
Posts: 20
Joined: Tue Jun 22, 2010 9:02 am

Post by pran.praveen »

Hi Guys ,

I have found one more solution to this problem. Just wanted to share , may be it will help some of you. Read the EBCDIC file with following settings. In format tab set String==>Export EBCDIC as ASCII option and In general Character Set = EBCDIC and Byte order=Big endian , read the decimal columns as binary and packed also colums with special character should be read as ASCII. Later in transformer use StringToUString(Sourcecolumn,'IBM01142') for those columns with special character and Job NLS should be "ISO_8859-1:1987". Hope this helps..
Praveen
Post Reply