Updating MS SQL for Korean (UTF-8 encoded) characters

ArndW · Post by **ArndW** » Mon Oct 23, 2006 1:01 am

Hello Mayankrakesh and welcome to DSXChange.

mayankrakesh wrote:..3. The txt file is being opened in Notepad and being saved as UTF-8 encoded file....

There is no need to do this and it might cause some issues, particularly as I don't trust MS Notepad to do this correctly. Just read the file into into DataStage and specify that it is UTF-8 encoded.

mayankrakesh wrote:...The collate being used for the database is Latin...

Using Latin collation on non-Latin data will result in very odd sort order. Latin collation is simple in that it takes the bytes of a string from the left and does a numeric greater-or-less-than comparison. This will only work correctly for Latin text, or texts or multibyte characters will not collate correctly when doing this.

Try to output to a Unicode text file after doing your logic and see if the data is represented correctly. I would prefer using wordpad.exe over notepad.exe from experience (this is when looking at Japanese and Chinese text). If this works then output to your MS SQL and see if things are being represented correctly.

mayankrakesh · Post by **mayankrakesh** » Mon Oct 23, 2006 1:51 am

Thanks for the info

Just wanted to know if there is some utility using which i can UPDATE records from the file to MS SQL database instead of using Datastage.

regards,
Mayank

ArndW · Post by **ArndW** » Mon Oct 23, 2006 2:35 am

SQL Server has a "BULK INSERT" capability to load sequential files directly. There are tricks and limitations in doing this, but you might be able to work around these issues.

ray.wurlod · Post by **ray.wurlod** » Mon Oct 23, 2006 6:59 am

There is no guarantee that your step using Notepad will successfully translate all characters. What character map is used for encoding the original Korean characters?

roblew · Post by **roblew** » Thu Jan 11, 2007 7:49 pm

Hi,

I've having similar issues, loading spanish unicode data from Oracle (UTF8) into a SQL Server 2000 database.

I've tried using NCHAR datatypes, which didn't resolve the problem. But, after setting the ODBC Enterprise stage datatype on the column to LongVarBinary for the target SQL Server db, it seemed to work properly.

I don't know why this works. Does that make sense to anyone? Any comments?

I've started a new thread in the Parallel section.
viewtopic.php?p=213181

DSXchange

Updating MS SQL for Korean (UTF-8 encoded) characters

Re: Updating MS SQL for Korean (UTF-8 encoded) characters