Parallel routines and Unicode

PhilHibbs · Post by **PhilHibbs** » Fri Nov 25, 2011 8:29 am

So far, my parallel routines that take strings have always just accepted char* pointers, and that's been fine. I've assumed the strings coming in are null-terminated ASCII.

But, isn't all DataStage data processed internally as Unicode? Is it flattening it all down to ASCII to pass to a parallel routine? How would I write a px routine that processes Unicode text?

ray.wurlod · Post by **ray.wurlod** » Fri Nov 25, 2011 1:44 pm

With NLS enabled it should always be Unicode internally. With NLS not installed I understand it is ASCII internally.

PhilHibbs · Post by **PhilHibbs** » Fri Nov 25, 2011 2:22 pm

ray.wurlod wrote:With NLS enabled it should always be Unicode internally. With NLS not installed I understand it is ASCII internally.

NLS is enabled, so does that mean the strings are in UTF-8 format? I just ran a test, and indeed, passing in a string that has a non-ASCII character seems ok, but I suspect my single-character replacement method might make a mistake with multi-byte characters. I need to read up on handling UTF-8 in C.

paultechm · Post by **paultechm** » Fri Nov 25, 2011 4:43 pm

Can you try to pass it as stringtoUstring(columnvalue,'UTF-8')

PhilHibbs · Post by **PhilHibbs** » Tue Dec 20, 2011 7:39 pm

It looks like that would convert the string to UTF-8 before passing it to my routine, but it is being passed as UTF8 already. What I want is, an easy way to process UTF-8 strings within my C or C++ routine. I'm looking at http://site.icu-project.org/ at the moment, when I have time and access.