Teradata String conversion Warnings

DS_FocusGroup · Post by **DS_FocusGroup** » Mon Oct 20, 2008 10:16 am

I am running a job which is showing warnings on all charactor fields

When binding input interface field "Field" to field "Field": Implicit conversion from source type "ustring[1]" to result type "string[1]": Converting ustring to string using codepage ISO-8859-1.

The source is Teradata connector stage, with property client charactor set as ASCII. The next stage is a transformer stage which is generating this warning.

ray.wurlod · Post by **ray.wurlod** » Mon Oct 20, 2008 3:03 pm

They are alerts, and they are OK. If you don't want them to be warnings use a message handler.

shershahkhan · Post by **shershahkhan** » Wed Oct 22, 2008 2:00 am

If these are not harmful why are they shown as warning in the first place? any particular casting that can be done to remove these warning rather then suppressing them? Secondly how to make them information at project level and whats the procedure to move these handlers to other envirnoment when the jobs are moved to test/prd envirnoment

hamzaqk · Post by **hamzaqk** » Wed Oct 22, 2008 2:31 am

Correct me if i am wrong here people . but ustring is a unicode string in UTF 8 character or UTF 16 or even UTF 32 format and looking at this alert it seems like the project is set to ISO-8859-1 character set which is defined at a project level. so i guess DS is trying to convert the data from its present character set to what it is now defined in the project. As far as i remeber you can set the character set at project level and it may get rid of the warning.

hamzaqk · Post by **hamzaqk** » Wed Oct 22, 2008 2:32 am

p.s. i think this can also be ignored by using StringToUstring and UstringToString functions in DS

shershahkhan · Post by **shershahkhan** » Mon Nov 03, 2008 11:08 am

I enabled LATIN1_0A on Teradata and use the same characterset in Teradata connector stage and its working fine now, the warning are not coming out now.

toshea · Post by **toshea** » Thu Nov 06, 2008 12:23 pm

As you discovered, that warning can occur when the connector's Client character set does not match the job's NLS map. The connector's default Client character set is UTF8, but a job's default NLS map is iso-8859-1. Whenever there is a mismatch in character sets, the connector must convert from the Client character set (UTF8) to ustring (UTF16) and then from ustring to the NLS map (iso-8859-1). Converting from Unicode to iso-8859-1 could lose characters, since not all Unicode characters can be represented in iso-8859-1. By setting the Client character set to LATIN1_0A, no conversion is required, so no warning is necessary.

johnreece@talk21.com · Fri Nov 07, 2008 9:15 am

I've got similar problem but LATIN1_0A not helping.

Using Teradata connector as extract with LATIN1_0A as the character set and rcp enabled, loading to dataset (via a forced copy), I'm getting ustrings for my character data even though unicode is not set on the database.

Conversely, if I take rcp off and define a field as char in the schema it's fine, and if I define it as nchar (ustring equivalent) it moans about having to convert it from iso-8859-1 to UTF16

Any ideas why I can't get 'string' out from rcp? Does LATIN1_0A have to be enabled elsewhere maybe?

toshea · Post by **toshea** » Fri Nov 07, 2008 12:25 pm

When you define a field as NChar or set the Extended attribute of a Char to Unicode, the PX Engine uses a ustring representation for transmitting the data to the next stage. A ustring is in UTF16 format. If you are using ustring, then you should be setting the Teradata Connector's Client character set property to UTF16 to avoid any character set conversion. If any character set conversion is going on, you will get warning.

johnreece · Post by **johnreece** » Mon Nov 10, 2008 8:25 am

thanks for the reply toshea. I appreciate what a ustring is (and its connection to nchar).

However, the core of my problem is that I am downloading using rcp (i.e. no columns defined) from a Teradata table with character data.

The Teradata column is defined as Latin, so no involvement of UTF16 here. I'm using LATIN1_0A as the client character set, so no UTF16 here. So why am I getting ustrings defined for the character data when I look at the dataset schema produced?

toshea · Post by **toshea** » Mon Nov 10, 2008 1:37 pm

If you are using RCP and truly had no columns defined, then you would not get a warning. The column definitions would be generated at run-time as a result of preparing the SELECT statement and there would be no type mismatch. The warning occurs when you define the column's data type in the Columns tab, and that definition does not match the result of preparing the SELECT statement. The connector uses ustring to describe the result of a SELECT, because ustring is specified in terms of characters, and that length is not affected by the client character set. A string must be specified in terms of bytes, and that length may vary depending on the client character set. If you had imported the column definition using the connector meta data import, the column would have been imported with the Extended attribute set to Unicode, and the length would be specified in characters, so there would be no mismatch between the definition on the Columns tab and result of the prepared SELECT.

The server character set on a Teradata column doesn't really matter all that much. What matters is the client character set. Whether a column is defined as LATIN or UNICODE, the data will still come out of Teradata in multi-byte Unicode if the client character set is UTF8. The server character set only restricts the set of characters that the column can contain, but it does not affect its representation when received by the connector. A LATIN column can only contain European characters, and at most a CHAR(10) would use 20 bytes in UTF8, since European accented characters use 2 bytes in UTF8. A UNICODE column can contain Asian characters, and a CHAR(10) could take up to 30 bytes in UTF8.

If your client character set is LATIN1_0A and your NLS map is iso-8859-1, then a CHAR(10) CHARACTER SET LATIN would take no more than 10 bytes and there'd be no need for conversion between Teradata and a DataStage string. I agree that you shouldn't get a warning if your column is defined as string[10]. It's a bug.

DSXchange

Teradata String conversion Warnings

Teradata String conversion Warnings

Extract via rcp