handling non-english characters that is part of a string
Posted: Mon Feb 04, 2013 1:27 am
Hello,
We are processing non-english character fields (European) through DataStage 8.0 by setting both DB and DS in UTF-8 format. The data is finally getting loaded into SAP.
One open challenge we have here is when applying some transformation to fields having non-English characters. There is a source field "NAME" which is of 40 bytes. But the target stores this in two fields - NAME1 & NAME2
NAME1 is of length 25
NAME2 is of length 25
Since these non-english characters occupy more than 1-byte, when we apply the transformation to split the value in two fields, we are getting some junk characters in NAME2.
Is there a way to check if the character is non-english, and then include that character in the second half of the name if it comes in the 25th position?
One way could be validate by iterating through each character to see if it is part of extended character set through its equivalent hex codes and index it; then move the rest of the characters along with non-English character (that is occuring on the 25th position of source field) to NAME2 because this character actually occupies more than 1 byte.
Can anyone provide some thoughts on this?
Thanks,
Sreeja R
We are processing non-english character fields (European) through DataStage 8.0 by setting both DB and DS in UTF-8 format. The data is finally getting loaded into SAP.
One open challenge we have here is when applying some transformation to fields having non-English characters. There is a source field "NAME" which is of 40 bytes. But the target stores this in two fields - NAME1 & NAME2
NAME1 is of length 25
NAME2 is of length 25
Since these non-english characters occupy more than 1-byte, when we apply the transformation to split the value in two fields, we are getting some junk characters in NAME2.
Is there a way to check if the character is non-english, and then include that character in the second half of the name if it comes in the 25th position?
One way could be validate by iterating through each character to see if it is part of extended character set through its equivalent hex codes and index it; then move the rest of the characters along with non-English character (that is occuring on the 25th position of source field) to NAME2 because this character actually occupies more than 1 byte.
Can anyone provide some thoughts on this?
Thanks,
Sreeja R