A question about NLS

Archive of postings to DataStageUsers@Oliver.com. This forum intended only as a reference and cannot be posted to.

Moderators: chulett, rschirm

Locked
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

A question about NLS

Post by admin »

Hi all,

Its a simple question. But I wont to know exactly to be sure. When I start a project with NLS is it the truth that all interaction with DS Server and workflow of jobs are carriet out in UNICODE? As Ive got its overhead in one byte per byte. I.e. thats an productivity impact. Am I right? Is it the truth that theres no necessarity to turn it on?

Thnx in advance

Best regards,
Marat mailto:maratkotik@mail.ru
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

At 04:36 PM 1/31/01 +0300, Marat S. Salimov wrote:
>Hi all,
>
>Its a simple question. But I wont to know exactly to be sure. When I
>start a project with NLS is it the truth that all interaction with DS
>Server and workflow of jobs are carriet out in UNICODE? As Ive got
>its overhead in one byte per byte. I.e. thats an productivity impact.
>Am I right? Is it the truth that theres no necessarity to turn it on?



Sort of. The DS Server (UniVerse) employs a full range of NLS capabilities, including both locale support as well as character mapping.

For character mapping, all data is held internally in what is known as UTF-8 format, which is a variation of UniCode. UniCode imposes a double byte standard on all character data, meaning your data set sizes double for everything you do. It also makes it very hard to have reserved characters that are available in any language. This poses a problem for the DS Server because it uses a series of special characters to help delimit data (what we call the mark characters) as well as represent the SQL NULL character.

What UTF-8 does is provide a mapping whereby the byte sequence is
self-defining.
It provides two important features:
a) All ascii data, which is traditionally represented with a
single byte,
is still represented in that fashion. For shops that deal
with
predominately the ISO8859 character set, this greatly
reduces your
storage space as well as your mapping requirements.
b) It allows for us to create a private reserved set of
characters in the
mapping that are uniquely ours and available.

Now, thats how data is handled internally. At every external boundary,
you can
establish the appropriate mapping to occur, so that you can map the data
into the
appropriate external representation. This means, if need be, you can
define multiple
different maps for different data source/targets. For example, you may
have some
data source/targets that want ISO8859-5 (for Cyrillic), or perhaps PC866 if the
data comes from an NT data source. Additionally, perhaps one of your
sources uses
KOI8-R, another version of the Russian/Cyrillic character set. And still
another
uses ISO8859-1. All could be correctly handled using NLS.

Because of all this, the answer to your question of whether you need it is
it depends
If all your data exchange is done using ISO8859-1, then you dont need
it. Otherwise,
its very possible you do.

If you have any other questions, let me know...

Dave

========================================================================
David T. Meeks || "All my life Im taken by surprise
Development Engineer, DataStage || Im someones waste of time
Ascential Software || Now I walk a balanced line
dave.meeks@ascentialsoftware.com || and step into tomorrow" - IQ
========================================================================
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Marat,

our UTF-8 uses single-byte representations, so not all traffic is double-byte and wont incur all the excess overhead of conversion.

But the I/O between client and server isnt great and shouldnt impact performance even across overloaded networks or slow connections.

-Arnd.

-----Original Message-----
From: Marat S. Salimov [mailto:maratkotik@mail.ru]
Sent: Wednesday, January 31, 2001 06:36
To: informix-datastage@oliver.com
Subject: A question about NLS


Hi all,

Its a simple question. But I wont to know exactly to be sure. When I start a project with NLS is it the truth that all interaction with DS Server and workflow of jobs are carriet out in UNICODE? As Ive got its overhead in one byte per byte. I.e. thats an productivity impact. Am I right? Is it the truth that theres no necessarity to turn it on?

Thnx in advance

Best regards,
Marat mailto:maratkotik@mail.ru
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Dave,

sorry, I posted my response before checking to see that you had answered it (in more detail and with more accuracy than I had). Thanks, -Arnd.

-----Original Message-----
From: David T. Meeks [mailto:dave.meeks@informix.com]
Sent: Wednesday, January 31, 2001 06:56
To: informix-datastage@oliver.com; informix-datastage@oliver.com
Subject: Re: A question about NLS


At 04:36 PM 1/31/01 +0300, Marat S. Salimov wrote:
>Hi all,
>
>Its a simple question. But I wont to know exactly to be sure. When I
>start a project with NLS is it the truth that all interaction with DS
>Server and workflow of jobs are carriet out in UNICODE? As Ive got
>its overhead in one byte per byte. I.e. thats an productivity impact.
>Am I right? Is it the truth that theres no necessarity to turn it on?



Sort of. The DS Server (UniVerse) employs a full range of NLS capabilities, including both locale support as well as character mapping.

For character mapping, all data is held internally in what is known as UTF-8 format, which is a variation of UniCode. UniCode imposes a double byte standard on all character data, meaning your data set sizes double for everything you do. It also makes it very hard to have reserved characters that are available in any language. This poses a problem for the DS Server because it uses a series of special characters to help delimit data (what we call the mark characters) as well as represent the SQL NULL character.

What UTF-8 does is provide a mapping whereby the byte sequence is
self-defining.
It provides two important features:
a) All ascii data, which is traditionally represented with a
single byte,
is still represented in that fashion. For shops that deal
with
predominately the ISO8859 character set, this greatly
reduces your
storage space as well as your mapping requirements.
b) It allows for us to create a private reserved set of
characters in the
mapping that are uniquely ours and available.

Now, thats how data is handled internally. At every external boundary,
you can
establish the appropriate mapping to occur, so that you can map the data
into the
appropriate external representation. This means, if need be, you can
define multiple
different maps for different data source/targets. For example, you may
have some
data source/targets that want ISO8859-5 (for Cyrillic), or perhaps PC866 if the
data comes from an NT data source. Additionally, perhaps one of your
sources uses
KOI8-R, another version of the Russian/Cyrillic character set. And still
another
uses ISO8859-1. All could be correctly handled using NLS.

Because of all this, the answer to your question of whether you need it is
it depends
If all your data exchange is done using ISO8859-1, then you dont need
it. Otherwise,
its very possible you do.

If you have any other questions, let me know...

Dave

========================================================================
David T. Meeks || "All my life Im taken by surprise
Development Engineer, DataStage || Im someones waste of time
Ascential Software || Now I walk a balanced line
dave.meeks@ascentialsoftware.com || and step into tomorrow" - IQ
========================================================================
Locked