Viewing Korean Characters in Information Analyzer

This forum contains ProfileStage posts and now focuses at newer versions Infosphere Information Analyzer.

Moderators: chulett, rschirm

Post Reply
sigma
Premium Member
Premium Member
Posts: 83
Joined: Thu Aug 07, 2008 1:22 pm

Viewing Korean Characters in Information Analyzer

Post by sigma »

I have a Information Analyzer issue. The data is from Oracle table and is Korean characters stored as UTF 8 in oracle table

When I import the data in excel it looks Korean as normal ( 3rd column below is NAME field)

1 100 대표 2008-12-18 08:56:54
1 101 안중태 2008-12-18 08:56:54
1 102 천정균 2008-12-18 08:56:54
1 103 이숙자 2008-12-18 08:56:54
1 104 김형래 2008-12-18 08:56:54
1 105 하현준 2008-12-18 08:56:54
1 106 김연규 2008-12-18 08:56:54
1 107 채희권 2008-12-18 08:56:54
1 108 김용애 2008-12-18 08:56:54
1 109 허성 2008-12-18 08:56:54

When I import and run column analyzis on the above table ( first 3 columns) then it does the column analzis just fine but when I view or drill data it does not show Korean characters and also I do not believe it gives the <b>cardinality accurately for the Name coulmn</b>


Is there a specific process to be followeed for profiling foriegn langauge data stored as UTF8


We do have NLS installed on the server.

In fact this data file is created by a job using the UTF-8 character map

Please advice
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There is not a single "UTF-8" encoding. Unicode Transformation Format (8-bit) is implemented differently in different places. See Unicode Consortium website for more information.

How was the source file created? Are you sure it wasn't one of the Korean-specific character maps? Have you tried any of these with DataStage?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sigma
Premium Member
Premium Member
Posts: 83
Joined: Thu Aug 07, 2008 1:22 pm

Post by sigma »

<B>How was the source file created?</B>

Thanks Ray.
The Korean customers gave us a excel file with sample data
1) the excel sheet was saved as unicode text.( tab seperated)
2) The unicode.text file was converted into a utf-8 encoding using .NET classes
3) The new utf-8 file was read by datastage job using UTF-8 nls-map and only a few fields were passed on.( first 3)
4) The datastage job loaded oracle table and created a flat file with 3 fields
5) When I import the data from oracle table I see Korean characters just fine
6) But in IA it does not look Korean at all

<B> Question </b>
When I do view data of the utf-8 fle created by step 2 I do not see Korean characters. I am assuming datastage is still converting okay as it loads into oracle just fine
Last edited by sigma on Fri Dec 19, 2008 10:09 am, edited 1 time in total.
sigma
Premium Member
Premium Member
Posts: 83
Joined: Thu Aug 07, 2008 1:22 pm

Post by sigma »

Sorry about the Bold characters. My apologies,I only intended to questions to be bolded but must have missed the ending tag /
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So, go back and edit it.
-craig

"You can never have too many knives" -- Logan Nine Fingers
sigma
Premium Member
Premium Member
Posts: 83
Joined: Thu Aug 07, 2008 1:22 pm

Post by sigma »

Thanks, I have edited so it is not bold anymore

Any suggestions for my real problem
Post Reply