Issue with loading "few" non english characters in

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

What method are you using in your enterprise stage, 'load' or 'upsert', that might make a bit of a difference. On the face of it, if the sequential file output is correct, then I don't see a source for the error; particularly as UTF-8 is a superset of MS1252 but the German umlaut characters are encoded in two bytes in UTF-8 versus just one in MS1252.
vinodn
Charter Member
Charter Member
Posts: 93
Joined: Tue Dec 13, 2005 11:00 am

Post by vinodn »

Can you try using Unicode across your job
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

What does the director log show for the runtime value of NLS_LANG (no dollar sign prefix)?
swades
Premium Member
Premium Member
Posts: 323
Joined: Mon Dec 04, 2006 11:52 pm

Post by swades »

It shows NLS_LANG=AMERICAN_AMERICA.AL32UTF8 in the second log entry starts with Environment variable settings:
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

What about in the NLS tab of the Oracle enterprise stage, could you set that to use project default?
swades
Premium Member
Premium Member
Posts: 323
Joined: Mon Dec 04, 2006 11:52 pm

Post by swades »

That it already set to Project default ( UTF-8 ) for both 'NCHAR/NVARCHAR2' and 'Other types'
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

There are literally dozens of possible maps, with anywhere from one to four bytes per character. There's also some overlap between certain maps in that they both represent the same special character the same way, but represent other characters differently. You can't make the assumption that if its not UTF8 it must be WE8ISO8859P1 based on one or two-byte storage.

You must put the responsibility for designating the correct map on the data provider. They MUST know what map they are using and provide you with the correct specification.

If they are too inept to do that - then your safest bet would be to request a test string of data that contained all the characters in their database and then verify the hex representations on a character-by-character basis until you find a map that correctly identifies all the characters - a tedious and possibly risky process.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
Post Reply