Issue with loading "few" non english characters in

ArndW · Post by **ArndW** » Tue Mar 16, 2010 10:20 am

What method are you using in your enterprise stage, 'load' or 'upsert', that might make a bit of a difference. On the face of it, if the sequential file output is correct, then I don't see a source for the error; particularly as UTF-8 is a superset of MS1252 but the German umlaut characters are encoded in two bytes in UTF-8 versus just one in MS1252.

vinodn · Post by **vinodn** » Tue Mar 16, 2010 1:33 pm

Can you try using Unicode across your job

ArndW · Post by **ArndW** » Thu Mar 18, 2010 3:13 am

What does the director log show for the runtime value of NLS_LANG (no dollar sign prefix)?

swades · Post by **swades** » Thu Mar 18, 2010 9:34 am

It shows NLS_LANG=AMERICAN_AMERICA.AL32UTF8 in the second log entry starts with Environment variable settings:

ArndW · Post by **ArndW** » Thu Mar 18, 2010 9:57 am

What about in the NLS tab of the Oracle enterprise stage, could you set that to use project default?

swades · Post by **swades** » Thu Mar 18, 2010 11:37 am

That it already set to Project default ( UTF-8 ) for both 'NCHAR/NVARCHAR2' and 'Other types'

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Thu Mar 25, 2010 9:52 am

There are literally dozens of possible maps, with anywhere from one to four bytes per character. There's also some overlap between certain maps in that they both represent the same special character the same way, but represent other characters differently. You can't make the assumption that if its not UTF8 it must be WE8ISO8859P1 based on one or two-byte storage.

You must put the responsibility for designating the correct map on the data provider. They MUST know what map they are using and provide you with the correct specification.

If they are too inept to do that - then your safest bet would be to request a test string of data that contained all the characters in their database and then verify the hex representations on a character-by-character basis until you find a map that correctly identifies all the characters - a tedious and possibly risky process.