Handling data in UTF8 Character Set

dhletl · Post by **dhletl** » Thu Jul 07, 2005 8:23 pm

Hi -

We hv a data migration project using only the DS Server components.
In this, we would receive the source data in UTF8 character set, which also contains some special characters/non-ascii charcters as well - we need to handle them.

Solution attempted-
For this purpose, we changed the DS project level NLS to UTF8. Logically, this should let this NLS to flow-down to all the jobs/stages used in the DS project, right!

Result-
However, the output comes out with some junk characters or "?" wherever it finds any non-ascii characters.

The fields to contain the above mentioned special characters is of datatype Varchar.

We are not doing any changes for NLS at the individual job/ stage level - when checked for the NLS being reflected at any job I find-
value for Default map for stages is 'Project default (UTF8)'
value for Default locale categories CType is 'Project (DEFAULT)'

Am I missing something here..? Is there a better way of handling the subject requirement? Appreciate your help on this.

Additional Info/ details-
- DS version is 7.1
- We are deploying the server components/jobs on PX - though the configuration being used is 1x1 nodes.

Thanks,
Nitin

ArndW · Post by **ArndW** » Fri Jul 08, 2005 1:28 am

hello dhletl,

normally seeing a "?" in a DataStage NLS context does not necessarily mean that the value is really a question mark; many editors and view programs are not NLS enabled and will convert UTF-8 non-latin characters to a "?" in their output. This also applies to the DS view-data windows which mask undisplayable characters with a "?".

So it is important to be 100% certain that this is not the cause of the problems. If you use that same editor at the same workstation with the same user to look at a file with multibyte characters (I usually use some japanese language help file in windows or create a dummy file on UNIX) that display correctly, and then look at my DataStage output and get "?" marks then I know something has gone wrong; but more often than not it ends up displaying correctly.