Parallel routines and Unicode

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Parallel routines and Unicode

Post by PhilHibbs »

So far, my parallel routines that take strings have always just accepted char* pointers, and that's been fine. I've assumed the strings coming in are null-terminated ASCII.

But, isn't all DataStage data processed internally as Unicode? Is it flattening it all down to ASCII to pass to a parallel routine? How would I write a px routine that processes Unicode text?
Phil Hibbs | Capgemini
Technical Consultant
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

With NLS enabled it should always be Unicode internally. With NLS not installed I understand it is ASCII internally.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

ray.wurlod wrote:With NLS enabled it should always be Unicode internally. With NLS not installed I understand it is ASCII internally.
NLS is enabled, so does that mean the strings are in UTF-8 format? I just ran a test, and indeed, passing in a string that has a non-ASCII character seems ok, but I suspect my single-character replacement method might make a mistake with multi-byte characters. I need to read up on handling UTF-8 in C.
Phil Hibbs | Capgemini
Technical Consultant
paultechm
Participant
Posts: 27
Joined: Wed Jul 25, 2007 2:09 am

Post by paultechm »

Can you try to pass it as stringtoUstring(columnvalue,'UTF-8')
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

It looks like that would convert the string to UTF-8 before passing it to my routine, but it is being passed as UTF8 already. What I want is, an easy way to process UTF-8 strings within my C or C++ routine. I'm looking at http://site.icu-project.org/ at the moment, when I have time and access.
Phil Hibbs | Capgemini
Technical Consultant
Post Reply