Replacing Mutlibyte characters to Single byte characters

keshav0307 · Post by **keshav0307** » Tue Aug 12, 2008 9:07 pm

My requirement is to replace any(and all) multbyte characters with a single white space.
if i disable NLS all characters are replaced with two white spaces.

ray.wurlod · Post by **ray.wurlod** » Tue Aug 12, 2008 10:02 pm

Use a server job or a BASIC Transformer stage. Examine each character with the BYTELEN() function - this will be greater than 1 if you have a multi-byte character.

But do you really know what you're doing? For example, if you're using "pure" Unicode, every character is multi-byte (either two or four). Different UTF-8 encodings (such as UV_UTF-8) have different ways to store the same character, which may or may not be a single byte. For example, to preserve dynamic array delimiters as single-byte characters, the high-end ISO8859-1 characters are moved to the "private use area" and stored as multi-byte.

keshav0307 · Post by **keshav0307** » Wed Aug 13, 2008 12:43 am

Thanks Ray.
i am looking for some thing which can be used in Parallel stage. Basic transformer will run on head node only.

ArndW · Post by **ArndW** » Wed Aug 13, 2008 12:44 am

The BASIC transform stage will suit you just fine, it will run on as many nodes as your job does. We recently had to do something similar and instead of writing a buildop we used a BASIC stage and ByteLen() to determine actual string sizes with multibyte characters.

mahadev.v · Post by **mahadev.v** » Wed Aug 13, 2008 12:49 am

keshav0307 wrote:i am looking for some thing which can be used in Parallel stage. Basic transformer will run on head node only.

Even i thought that basic transformer runs in sequential mode when used in a parallel job. But the answer is no. It runs in parallel. I think it is considered as an over head cause it creates 3 process, input, transformer, and output. And also consumes more CPU than a normal transformer. Bottom line, Basic transformer runs on all nodes (parallel). But it is not as efficient as transformer.

keshav0307 · Post by **keshav0307** » Wed Aug 13, 2008 1:29 am

ok then i may be missing something

i get this error

BASIC_Transformer_9,0: Unable to open project '<the project name>' - 81016.

if i don't use node constraint or map it to any compute node.

it works fine when map the node constraint to the conductor node.

ArndW · Post by **ArndW** » Wed Aug 13, 2008 1:40 am

Are you on a MPP or grid or on a single machine? If you are in a distributed environment then you are correct in your restriction. I don't know the c++ function akin to ByteLen(), I think I couldn't find one and then decided to do the whole think in a BASIC transform stage.

keshav0307 · Post by **keshav0307** » Wed Aug 13, 2008 1:50 am

on grid

telenet_bi · Post by **telenet_bi** » Wed Oct 29, 2008 3:34 pm

Arndw,

I'm also getting the error for

BASIC_Transformer_9,0: Unable to open project '<the project name>' - 81016.

We recently moved to a 2-server configuration in stead of just one. Does this mean we have to try to get rid of all our basic transformers, or the second server will be unused by all these processes?
Is there any workaround, I couldn't find any here?

ufl_developer · Post by **ufl_developer** » Thu Oct 30, 2008 2:39 pm

ArndW wrote:The BASIC transform stage will suit you just fine, it will run on as many nodes as your job does. We recently had to do something similar and instead of writing a buildop we used a BASIC stage and ByteLen() to determine actual string sizes with multibyte characters.

I'm facing the similar issue here. Could you please give more details on how you do it?

I have two type of sources, one Oracle and the other is DB2 UDB with unicode database setting. However, we really don't care the unicode part of it and just want to get some data from it. I used to have NLS enabled in DS, and it has always given us some weird (???) stuff and I tried to disable the NLS entirely. however, I'm now having trouble with some of the data from DB2 and they failed me with this type of error: [IBM][CLI Driver] CLI0002W Data truncated. SQLSTATE=01004. so apparently it has something to do with the unicode database setup. So now I'm thinking if I could convert the unicode data from DB2 to single byte asci data... I'm not quite sure whether this is the right approach, but at least worth a try...

thanks!