Replacing Mutlibyte characters to Single byte characters
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 783
- Joined: Mon Jan 16, 2006 10:17 pm
- Location: Sydney, Australia
Replacing Mutlibyte characters to Single byte characters
My requirement is to replace any(and all) multbyte characters with a single white space.
if i disable NLS all characters are replaced with two white spaces.
if i disable NLS all characters are replaced with two white spaces.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Use a server job or a BASIC Transformer stage. Examine each character with the BYTELEN() function - this will be greater than 1 if you have a multi-byte character.
But do you really know what you're doing? For example, if you're using "pure" Unicode, every character is multi-byte (either two or four). Different UTF-8 encodings (such as UV_UTF-8) have different ways to store the same character, which may or may not be a single byte. For example, to preserve dynamic array delimiters as single-byte characters, the high-end ISO8859-1 characters are moved to the "private use area" and stored as multi-byte.
But do you really know what you're doing? For example, if you're using "pure" Unicode, every character is multi-byte (either two or four). Different UTF-8 encodings (such as UV_UTF-8) have different ways to store the same character, which may or may not be a single byte. For example, to preserve dynamic array delimiters as single-byte characters, the high-end ISO8859-1 characters are moved to the "private use area" and stored as multi-byte.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 783
- Joined: Mon Jan 16, 2006 10:17 pm
- Location: Sydney, Australia
The BASIC transform stage will suit you just fine, it will run on as many nodes as your job does. We recently had to do something similar and instead of writing a buildop we used a BASIC stage and ByteLen() to determine actual string sizes with multibyte characters.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Even i thought that basic transformer runs in sequential mode when used in a parallel job. But the answer is no. It runs in parallel. I think it is considered as an over head cause it creates 3 process, input, transformer, and output. And also consumes more CPU than a normal transformer. Bottom line, Basic transformer runs on all nodes (parallel). But it is not as efficient as transformer.keshav0307 wrote:i am looking for some thing which can be used in Parallel stage. Basic transformer will run on head node only.
"given enough eyeballs, all bugs are shallow" - Eric S. Raymond
-
- Premium Member
- Posts: 783
- Joined: Mon Jan 16, 2006 10:17 pm
- Location: Sydney, Australia
Are you on a MPP or grid or on a single machine? If you are in a distributed environment then you are correct in your restriction. I don't know the c++ function akin to ByteLen(), I think I couldn't find one and then decided to do the whole think in a BASIC transform stage.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Premium Member
- Posts: 783
- Joined: Mon Jan 16, 2006 10:17 pm
- Location: Sydney, Australia
-
- Premium Member
- Posts: 33
- Joined: Wed Jul 23, 2008 7:33 am
- Location: Mechelen, Belgium
- Contact:
Arndw,
I'm also getting the error for
Is there any workaround, I couldn't find any here?
I'm also getting the error for
We recently moved to a 2-server configuration in stead of just one. Does this mean we have to try to get rid of all our basic transformers, or the second server will be unused by all these processes?BASIC_Transformer_9,0: Unable to open project '<the project name>' - 81016.
Is there any workaround, I couldn't find any here?
-
- Premium Member
- Posts: 15
- Joined: Wed Feb 20, 2008 3:33 pm
I'm facing the similar issue here. Could you please give more details on how you do it?ArndW wrote:The BASIC transform stage will suit you just fine, it will run on as many nodes as your job does. We recently had to do something similar and instead of writing a buildop we used a BASIC stage and ByteLen() to determine actual string sizes with multibyte characters.
I have two type of sources, one Oracle and the other is DB2 UDB with unicode database setting. However, we really don't care the unicode part of it and just want to get some data from it. I used to have NLS enabled in DS, and it has always given us some weird (???) stuff and I tried to disable the NLS entirely. however, I'm now having trouble with some of the data from DB2 and they failed me with this type of error: [IBM][CLI Driver] CLI0002W Data truncated. SQLSTATE=01004. so apparently it has something to do with the unicode database setup. So now I'm thinking if I could convert the unicode data from DB2 to single byte asci data... I'm not quite sure whether this is the right approach, but at least worth a try...
thanks!