Page 1 of 1

Re: Updating MS SQL for Korean (UTF-8 encoded) characters

Posted: Mon Oct 23, 2006 1:01 am
by ArndW
Hello Mayankrakesh and welcome to DSXChange.
mayankrakesh wrote:..3. The txt file is being opened in Notepad and being saved as UTF-8 encoded file....
There is no need to do this and it might cause some issues, particularly as I don't trust MS Notepad to do this correctly. Just read the file into into DataStage and specify that it is UTF-8 encoded.
mayankrakesh wrote:...The collate being used for the database is Latin...
Using Latin collation on non-Latin data will result in very odd sort order. Latin collation is simple in that it takes the bytes of a string from the left and does a numeric greater-or-less-than comparison. This will only work correctly for Latin text, or texts or multibyte characters will not collate correctly when doing this.

Try to output to a Unicode text file after doing your logic and see if the data is represented correctly. I would prefer using wordpad.exe over notepad.exe from experience (this is when looking at Japanese and Chinese text). If this works then output to your MS SQL and see if things are being represented correctly.

Posted: Mon Oct 23, 2006 1:51 am
by mayankrakesh
Thanks for the info :)

Just wanted to know if there is some utility using which i can UPDATE records from the file to MS SQL database instead of using Datastage.

regards,
Mayank

Posted: Mon Oct 23, 2006 2:35 am
by ArndW
SQL Server has a "BULK INSERT" capability to load sequential files directly. There are tricks and limitations in doing this, but you might be able to work around these issues.

Posted: Mon Oct 23, 2006 6:59 am
by ray.wurlod
There is no guarantee that your step using Notepad will successfully translate all characters. What character map is used for encoding the original Korean characters?

Posted: Thu Jan 11, 2007 7:49 pm
by roblew
Hi,

I've having similar issues, loading spanish unicode data from Oracle (UTF8) into a SQL Server 2000 database.

I've tried using NCHAR datatypes, which didn't resolve the problem. But, after setting the ODBC Enterprise stage datatype on the column to LongVarBinary for the target SQL Server db, it seemed to work properly.

I don't know why this works. Does that make sense to anyone? Any comments?

I've started a new thread in the Parallel section.
viewtopic.php?p=213181