Page 1 of 1

Parallel routines and Unicode

Posted: Fri Nov 25, 2011 8:29 am
by PhilHibbs
So far, my parallel routines that take strings have always just accepted char* pointers, and that's been fine. I've assumed the strings coming in are null-terminated ASCII.

But, isn't all DataStage data processed internally as Unicode? Is it flattening it all down to ASCII to pass to a parallel routine? How would I write a px routine that processes Unicode text?

Posted: Fri Nov 25, 2011 1:44 pm
by ray.wurlod
With NLS enabled it should always be Unicode internally. With NLS not installed I understand it is ASCII internally.

Posted: Fri Nov 25, 2011 2:22 pm
by PhilHibbs
ray.wurlod wrote:With NLS enabled it should always be Unicode internally. With NLS not installed I understand it is ASCII internally.
NLS is enabled, so does that mean the strings are in UTF-8 format? I just ran a test, and indeed, passing in a string that has a non-ASCII character seems ok, but I suspect my single-character replacement method might make a mistake with multi-byte characters. I need to read up on handling UTF-8 in C.

Posted: Fri Nov 25, 2011 4:43 pm
by paultechm
Can you try to pass it as stringtoUstring(columnvalue,'UTF-8')

Posted: Tue Dec 20, 2011 7:39 pm
by PhilHibbs
It looks like that would convert the string to UTF-8 before passing it to my routine, but it is being passed as UTF8 already. What I want is, an easy way to process UTF-8 strings within my C or C++ routine. I'm looking at http://site.icu-project.org/ at the moment, when I have time and access.