Replacing Mutlibyte characters to Single byte characters

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Replacing Mutlibyte characters to Single byte characters

Post by keshav0307 »

My requirement is to replace any(and all) multbyte characters with a single white space.
if i disable NLS all characters are replaced with two white spaces.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use a server job or a BASIC Transformer stage. Examine each character with the BYTELEN() function - this will be greater than 1 if you have a multi-byte character.

But do you really know what you're doing? For example, if you're using "pure" Unicode, every character is multi-byte (either two or four). Different UTF-8 encodings (such as UV_UTF-8) have different ways to store the same character, which may or may not be a single byte. For example, to preserve dynamic array delimiters as single-byte characters, the high-end ISO8859-1 characters are moved to the "private use area" and stored as multi-byte.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

Thanks Ray.
i am looking for some thing which can be used in Parallel stage. Basic transformer will run on head node only.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The BASIC transform stage will suit you just fine, it will run on as many nodes as your job does. We recently had to do something similar and instead of writing a buildop we used a BASIC stage and ByteLen() to determine actual string sizes with multibyte characters.
mahadev.v
Participant
Posts: 111
Joined: Tue May 06, 2008 5:29 am
Location: Bangalore

Post by mahadev.v »

keshav0307 wrote:i am looking for some thing which can be used in Parallel stage. Basic transformer will run on head node only.
Even i thought that basic transformer runs in sequential mode when used in a parallel job. But the answer is no. It runs in parallel. I think it is considered as an over head cause it creates 3 process, input, transformer, and output. And also consumes more CPU than a normal transformer. Bottom line, Basic transformer runs on all nodes (parallel). But it is not as efficient as transformer.
"given enough eyeballs, all bugs are shallow" - Eric S. Raymond
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

ok then i may be missing something

i get this error
BASIC_Transformer_9,0: Unable to open project '<the project name>' - 81016.
if i don't use node constraint or map it to any compute node.

it works fine when map the node constraint to the conductor node.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Are you on a MPP or grid or on a single machine? If you are in a distributed environment then you are correct in your restriction. I don't know the c++ function akin to ByteLen(), I think I couldn't find one and then decided to do the whole think in a BASIC transform stage.
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

on grid
telenet_bi
Premium Member
Premium Member
Posts: 33
Joined: Wed Jul 23, 2008 7:33 am
Location: Mechelen, Belgium
Contact:

Post by telenet_bi »

Arndw,

I'm also getting the error for
BASIC_Transformer_9,0: Unable to open project '<the project name>' - 81016.
We recently moved to a 2-server configuration in stead of just one. Does this mean we have to try to get rid of all our basic transformers, or the second server will be unused by all these processes?
Is there any workaround, I couldn't find any here?
ufl_developer
Premium Member
Premium Member
Posts: 15
Joined: Wed Feb 20, 2008 3:33 pm

Post by ufl_developer »

ArndW wrote:The BASIC transform stage will suit you just fine, it will run on as many nodes as your job does. We recently had to do something similar and instead of writing a buildop we used a BASIC stage and ByteLen() to determine actual string sizes with multibyte characters.
I'm facing the similar issue here. Could you please give more details on how you do it?

I have two type of sources, one Oracle and the other is DB2 UDB with unicode database setting. However, we really don't care the unicode part of it and just want to get some data from it. I used to have NLS enabled in DS, and it has always given us some weird (???) stuff and I tried to disable the NLS entirely. however, I'm now having trouble with some of the data from DB2 and they failed me with this type of error: [IBM][CLI Driver] CLI0002W Data truncated. SQLSTATE=01004. so apparently it has something to do with the unicode database setup. So now I'm thinking if I could convert the unicode data from DB2 to single byte asci data... I'm not quite sure whether this is the right approach, but at least worth a try...

thanks!
Post Reply