How to read the multibyte character as byte by byte

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
vijaydev
Participant
Posts: 54
Joined: Sun May 20, 2007 6:31 pm

How to read the multibyte character as byte by byte

Post by vijaydev »

i want to read multy byte character as single byte by byte to find out total lenth its occupied including single byte characters
Vijay
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Never had to do it, but if I had to I would go to the online help Index and type byte to see what turned up.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The functions you require are listed in this post.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vijaydev
Participant
Posts: 54
Joined: Sun May 20, 2007 6:31 pm

Post by vijaydev »

my aim is to find out and to remove the broken character normally multi byte caracter set will occupy two byte for one character but broken char will occupty single byte that why i am trying to find it out the lenth if you have any answers to find it out let me know
Vijay
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I still can't see that you have any proof that a character is "broken" whatever that means.

It is possible to read one byte at a time and to determine the byte value and byte type, but you will need to write your own routine, and specify NONE as the map - otherwise the map will attempt to convert the Korean characters into UV-UTF8 encoding of Unicode, which is used internally by DataStage.

The functions you may need are:
BYTELEN()
BYTE()
BYTETYPE()
BYTEVAL()

All of these can be found in the DataStage BASIC manual. You will need to have a reference manual for the specific encoding (for example PC1040, KSC5601) so that you understand what each individual byte in the multi-byte encoding is doing.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vijaydev
Participant
Posts: 54
Joined: Sun May 20, 2007 6:31 pm

Post by vijaydev »

in job i am getting korean characters as input, actully in front end they are copy pasting the large text to the text box more than size while pressing the enter in two bytes one byte is storing in to the database i want to remove that character, in datastage it showes as ? and database it showing some thing else i want remove that character pls help and provide me the logic also if possible
Vijay
vijaydev
Participant
Posts: 54
Joined: Sun May 20, 2007 6:31 pm

Post by vijaydev »

Any one help me to read byte by byte korean char or tell me how to remove half byte
Vijay
Post Reply