Unmappable characters

asitagrawal · Post by **asitagrawal** » Tue Apr 17, 2007 3:05 pm

Hi,

I am having some isue with character translations.

The job under test is being run on exactly the same
database and input job params, except, the Operating System of the Ascential Server.

In case #1, the job is run on Windows 2000 Pro edition server,
and in case #2, I have Windows Server 2003 Enterprise Edition.

I am getting Unmappable characters Warning in the Case #2 and
not in Case #1. U may refer to the log file. ( see the URLs below ).

Can anyone throw some light on this situation ?
Case#1 Log File http://asit.agrawal.googlepages.com/Local.txt
Case#2 Log File http://asit.agrawal.googlepages.com/Production.txt

Warm Regards,
Asit

asitagrawal · Post by **asitagrawal** » Tue Apr 17, 2007 3:25 pm

On the DB2 stage and I have NLS set as Project default (MS1252)

while writing into Flat File I have UNICODE.

Also, note that the same job, with same setting are giving different logs
when run on different OS Server !!

asitagrawal · Post by **asitagrawal** » Tue Apr 17, 2007 3:27 pm

I tried UTF8 for the DB2 stage, but no luck..

How did u resolve? Did u start getting correct characters in the file ??

ray.wurlod · Post by **ray.wurlod** » Tue Apr 17, 2007 3:36 pm

DataStage is not a translation tool. Use a consistent mapping throughout. The other important aspect is that you use the correct map. Lacking other information, this choice is likely to be an educated guess. Think about ISO 8859-1 as a possibility.

rafik2k · Post by **rafik2k** » Tue Apr 17, 2007 3:48 pm

asitagrawal wrote:I tried UTF8 for the DB2 stage, but no luck..

How did u resolve? Did u start getting correct characters in the file ??

In my case I used UTF8 for entire job.
try changing some other map.
Those invalid unhandled character was trunacted some other character.

asitagrawal · Post by **asitagrawal** » Wed Apr 18, 2007 5:55 am

Now, I was doing a test for the correct nls map for my case.

I am reading from DataBase and writing it to a Sequential File ( Seq #1 ).

In the next job, I am reading from Seq #1 and writing its data to a new Sequential file ( Seq #2 ).

The NLS settings for the Sequential File stages is UNICODE, with Byte-swapped and byte-order mark options selected.

Now, when the Seq #1 contains any data, then both the jobs run ok.. i.e Finish (Ok).

But, the problem is , if the Seq #1 does not contain any data, i.e a Seq #1 is blank, then the first job, ( one which is creating Seq #1 ) runs ok, but the second job , which is reading from Seq #1 and writing to Seq #2, Aborts.
The error message in the Director is ds_seqgetnext: Unable to read byte order mark - The operation completed successfully.

ray.wurlod · Post by **ray.wurlod** » Wed Apr 18, 2007 3:09 pm

DataStage NLS replies upon a byte order mark being present in the file header. This tells DataStage whether the byte order of the machine is big-endian or little-endian. Byte order is critically important when there can be more than one byte per character.

For example, in a hashed file the first two bytes are 0xacef or 0xefac depending on the byte-order of the machine.

Clearly a totally empty file can not have a byte order mark.

Your solution will involve pre-checking for the file's size and only executing where this is greater than or equal to 2 bytes.

ray.wurlod · Post by **ray.wurlod** » Mon Apr 23, 2007 4:42 pm

"UTF8" is like "UNIX" - every vendor has their own idea about what it should be. There are many eight-bit encodings of Unicode, all of which are entitled to call themselves "UTF8". There is no single UTF8.