Viewing Korean Characters in Information Analyzer

sigma · Post by **sigma** » Thu Dec 18, 2008 12:57 pm

I have a Information Analyzer issue. The data is from Oracle table and is Korean characters stored as UTF 8 in oracle table

When I import the data in excel it looks Korean as normal ( 3rd column below is NAME field)

1 100 대표 2008-12-18 08:56:54
1 101 안중태 2008-12-18 08:56:54
1 102 천정균 2008-12-18 08:56:54
1 103 이숙자 2008-12-18 08:56:54
1 104 김형래 2008-12-18 08:56:54
1 105 하현준 2008-12-18 08:56:54
1 106 김연규 2008-12-18 08:56:54
1 107 채희권 2008-12-18 08:56:54
1 108 김용애 2008-12-18 08:56:54
1 109 허성 2008-12-18 08:56:54

When I import and run column analyzis on the above table ( first 3 columns) then it does the column analzis just fine but when I view or drill data it does not show Korean characters and also I do not believe it gives the cardinality accurately for the Name coulmn

Is there a specific process to be followeed for profiling foriegn langauge data stored as UTF8

We do have NLS installed on the server.

In fact this data file is created by a job using the UTF-8 character map

Please advice

ray.wurlod · Post by **ray.wurlod** » Thu Dec 18, 2008 3:20 pm

There is not a single "UTF-8" encoding. Unicode Transformation Format (8-bit) is implemented differently in different places. See Unicode Consortium website for more information.

How was the source file created? Are you sure it wasn't one of the Korean-specific character maps? Have you tried any of these with DataStage?

sigma · Post by **sigma** » Fri Dec 19, 2008 9:11 am

How was the source file created?

Thanks Ray.
The Korean customers gave us a excel file with sample data
1) the excel sheet was saved as unicode text.( tab seperated)
2) The unicode.text file was converted into a utf-8 encoding using .NET classes
3) The new utf-8 file was read by datastage job using UTF-8 nls-map and only a few fields were passed on.( first 3)
4) The datastage job loaded oracle table and created a flat file with 3 fields
5) When I import the data from oracle table I see Korean characters just fine
6) But in IA it does not look Korean at all

 Question 
When I do view data of the utf-8 fle created by step 2 I do not see Korean characters. I am assuming datastage is still converting okay as it loads into oracle just fine

sigma · Post by **sigma** » Fri Dec 19, 2008 9:12 am

Sorry about the Bold characters. My apologies,I only intended to questions to be bolded but must have missed the ending tag /

chulett · Post by **chulett** » Fri Dec 19, 2008 9:37 am

So, go back and edit it.

sigma · Post by **sigma** » Fri Dec 19, 2008 10:10 am

Thanks, I have edited so it is not bold anymore

Any suggestions for my real problem