Teradata String conversion Warnings

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
DS_FocusGroup
Premium Member
Premium Member
Posts: 197
Joined: Sun Jul 15, 2007 11:45 pm
Location: Prague

Teradata String conversion Warnings

Post by DS_FocusGroup »

I am running a job which is showing warnings on all charactor fields

Code: Select all

When binding input interface field "Field" to field "Field": Implicit conversion from source type "ustring[1]" to result type "string[1]": Converting ustring to string using codepage ISO-8859-1.
The source is Teradata connector stage, with property client charactor set as ASCII. The next stage is a transformer stage which is generating this warning.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

They are alerts, and they are OK. If you don't want them to be warnings use a message handler.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
shershahkhan
Participant
Posts: 64
Joined: Fri Jan 25, 2008 4:41 am

Post by shershahkhan »

If these are not harmful why are they shown as warning in the first place? any particular casting that can be done to remove these warning rather then suppressing them? Secondly how to make them information at project level and whats the procedure to move these handlers to other envirnoment when the jobs are moved to test/prd envirnoment
hamzaqk
Participant
Posts: 249
Joined: Tue Apr 17, 2007 5:50 am
Location: islamabad

Post by hamzaqk »

Correct me if i am wrong here people . but ustring is a unicode string in UTF 8 character or UTF 16 or even UTF 32 format and looking at this alert it seems like the project is set to ISO-8859-1 character set which is defined at a project level. so i guess DS is trying to convert the data from its present character set to what it is now defined in the project. As far as i remeber you can set the character set at project level and it may get rid of the warning.
Teradata Certified Master V2R5
hamzaqk
Participant
Posts: 249
Joined: Tue Apr 17, 2007 5:50 am
Location: islamabad

Post by hamzaqk »

p.s. i think this can also be ignored by using StringToUstring and UstringToString functions in DS
Teradata Certified Master V2R5
shershahkhan
Participant
Posts: 64
Joined: Fri Jan 25, 2008 4:41 am

Post by shershahkhan »

I enabled LATIN1_0A on Teradata and use the same characterset in Teradata connector stage and its working fine now, the warning are not coming out now.
toshea
Participant
Posts: 79
Joined: Thu Aug 14, 2008 6:46 pm

Post by toshea »

As you discovered, that warning can occur when the connector's Client character set does not match the job's NLS map. The connector's default Client character set is UTF8, but a job's default NLS map is iso-8859-1. Whenever there is a mismatch in character sets, the connector must convert from the Client character set (UTF8) to ustring (UTF16) and then from ustring to the NLS map (iso-8859-1). Converting from Unicode to iso-8859-1 could lose characters, since not all Unicode characters can be represented in iso-8859-1. By setting the Client character set to LATIN1_0A, no conversion is required, so no warning is necessary.
johnreece@talk21.com
Participant
Posts: 1
Joined: Fri Nov 07, 2008 7:12 am

Extract via rcp

Post by johnreece@talk21.com »

I've got similar problem but LATIN1_0A not helping.

Using Teradata connector as extract with LATIN1_0A as the character set and rcp enabled, loading to dataset (via a forced copy), I'm getting ustrings for my character data even though unicode is not set on the database.

Conversely, if I take rcp off and define a field as char in the schema it's fine, and if I define it as nchar (ustring equivalent) it moans about having to convert it from iso-8859-1 to UTF16

Any ideas why I can't get 'string' out from rcp? Does LATIN1_0A have to be enabled elsewhere maybe?
toshea
Participant
Posts: 79
Joined: Thu Aug 14, 2008 6:46 pm

Post by toshea »

When you define a field as NChar or set the Extended attribute of a Char to Unicode, the PX Engine uses a ustring representation for transmitting the data to the next stage. A ustring is in UTF16 format. If you are using ustring, then you should be setting the Teradata Connector's Client character set property to UTF16 to avoid any character set conversion. If any character set conversion is going on, you will get warning.
johnreece
Participant
Posts: 2
Joined: Mon Oct 09, 2006 8:23 am

Post by johnreece »

thanks for the reply toshea. I appreciate what a ustring is (and its connection to nchar).

However, the core of my problem is that I am downloading using rcp (i.e. no columns defined) from a Teradata table with character data.

The Teradata column is defined as Latin, so no involvement of UTF16 here. I'm using LATIN1_0A as the client character set, so no UTF16 here. So why am I getting ustrings defined for the character data when I look at the dataset schema produced?
toshea
Participant
Posts: 79
Joined: Thu Aug 14, 2008 6:46 pm

Post by toshea »

If you are using RCP and truly had no columns defined, then you would not get a warning. The column definitions would be generated at run-time as a result of preparing the SELECT statement and there would be no type mismatch. The warning occurs when you define the column's data type in the Columns tab, and that definition does not match the result of preparing the SELECT statement. The connector uses ustring to describe the result of a SELECT, because ustring is specified in terms of characters, and that length is not affected by the client character set. A string must be specified in terms of bytes, and that length may vary depending on the client character set. If you had imported the column definition using the connector meta data import, the column would have been imported with the Extended attribute set to Unicode, and the length would be specified in characters, so there would be no mismatch between the definition on the Columns tab and result of the prepared SELECT.

The server character set on a Teradata column doesn't really matter all that much. What matters is the client character set. Whether a column is defined as LATIN or UNICODE, the data will still come out of Teradata in multi-byte Unicode if the client character set is UTF8. The server character set only restricts the set of characters that the column can contain, but it does not affect its representation when received by the connector. A LATIN column can only contain European characters, and at most a CHAR(10) would use 20 bytes in UTF8, since European accented characters use 2 bytes in UTF8. A UNICODE column can contain Asian characters, and a CHAR(10) could take up to 30 bytes in UTF8.

If your client character set is LATIN1_0A and your NLS map is iso-8859-1, then a CHAR(10) CHARACTER SET LATIN would take no more than 10 bytes and there'd be no need for conversion between Teradata and a DataStage string. I agree that you shouldn't get a warning if your column is defined as string[10]. It's a bug.
Post Reply