Updating MS SQL for Korean (UTF-8 encoded) characters

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Re: Updating MS SQL for Korean (UTF-8 encoded) characters

Post by ArndW »

Hello Mayankrakesh and welcome to DSXChange.
mayankrakesh wrote:..3. The txt file is being opened in Notepad and being saved as UTF-8 encoded file....
There is no need to do this and it might cause some issues, particularly as I don't trust MS Notepad to do this correctly. Just read the file into into DataStage and specify that it is UTF-8 encoded.
mayankrakesh wrote:...The collate being used for the database is Latin...
Using Latin collation on non-Latin data will result in very odd sort order. Latin collation is simple in that it takes the bytes of a string from the left and does a numeric greater-or-less-than comparison. This will only work correctly for Latin text, or texts or multibyte characters will not collate correctly when doing this.

Try to output to a Unicode text file after doing your logic and see if the data is represented correctly. I would prefer using wordpad.exe over notepad.exe from experience (this is when looking at Japanese and Chinese text). If this works then output to your MS SQL and see if things are being represented correctly.
mayankrakesh
Participant
Posts: 2
Joined: Wed Sep 27, 2006 10:41 pm

Post by mayankrakesh »

Thanks for the info :)

Just wanted to know if there is some utility using which i can UPDATE records from the file to MS SQL database instead of using Datastage.

regards,
Mayank
MayankRakesh
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

SQL Server has a "BULK INSERT" capability to load sequential files directly. There are tricks and limitations in doing this, but you might be able to work around these issues.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There is no guarantee that your step using Notepad will successfully translate all characters. What character map is used for encoding the original Korean characters?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
roblew
Charter Member
Charter Member
Posts: 123
Joined: Mon Mar 27, 2006 7:32 pm
Location: San Ramon

Post by roblew »

Hi,

I've having similar issues, loading spanish unicode data from Oracle (UTF8) into a SQL Server 2000 database.

I've tried using NCHAR datatypes, which didn't resolve the problem. But, after setting the ODBC Enterprise stage datatype on the column to LongVarBinary for the target SQL Server db, it seemed to work properly.

I don't know why this works. Does that make sense to anyone? Any comments?

I've started a new thread in the Parallel section.
viewtopic.php?p=213181
Last edited by roblew on Thu Jan 11, 2007 11:18 pm, edited 1 time in total.
Post Reply