Page 1 of 1

double byte problem

Posted: Sun Mar 30, 2003 8:46 pm
by Bala
Hi,
We are planning for Regional DWH.
My company is having many branches in the Asia pacific region and head office is at Singapore.
The countries like China, Korea and Taiwan are having double byte info in their data.
My questions are below:
1>. Is it possible to have single target database after transformation?
AS/400 DB --> ETL --> Oracle DB ==> for all countries
( OR )
2>. Do I need to have separate ETL server and target database for each double byte countries and other single byte countries?

3> Can I FTP the EBCDIC info into flat files in ASCII format and do the transformation and then Load into Target Oracle DB. Will these idea works???

If anyone experienced the above problem and having answers, please share ideas and knowledge with me.

Thanks and Regards,
Bala

Posted: Sun Mar 30, 2003 11:35 pm
by ray.wurlod
One instance of DataStage will handle all the character sets. The mechanism is to use (a UTF-8 encoding of) Unicode within DataStage, and to position maps on all interfaces between DataStage and the "outside world" (primarily links connecting to passive stage types). Thus, for Korea you might use maps that convert to and from the encoding of Korean, simplified Chinese for China, traditional Chinese for Taiwan.
As for target database, you can employ a similar scheme. Combining character sets in a single table is problematic, unless a combined character set (such as Unicode) is used. Otherwise, you are likely to see a large number of "unmappable characters"; for example most databases use the LANG environment variable, but this can only have one value at any one time.
The advantage of DataStage's approach is that the single-byte character set is simply a subset of Unicode, so both can co-exist happily within DataStage. The same cannot necessarily be guaranteed for all target databases. I do not know Oracle well enough to say either way, and the capability may well be revision-dependent.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518

Posted: Mon Mar 31, 2003 1:44 am
by Bala
Thanks a lot for ur quick and valuable information.

Just one more question regarding this.
As I have very least knowledge about admin part of Datastage, please advice where & how to setup these maps between datastage and double byte character sets.

Thanks and Regards,
Bala

Posted: Mon Mar 31, 2003 5:24 am
by ray.wurlod
The maps are supplied with DataStage.
When you install DataStage server with NLS enabled, you choose a default language (presumably English if you are in Singapore).
In the Administrator client you can configure a default character map and locale for a DataStage project. That character map is used on all connections to that project unless specifically overridden.
When designing a job, you can specify a different set of defaults for that job. Within a job you can override on a per-link or even on a per-column basis, depending on how the external data are encoded, or required to be encoded. In nearly every site in which I've worked in Asia (except Japan, which has very many encodings), the overall project default has not needed to be overridden when dealing with the data of a single country.
In your case you will most likely need BIG5 for mapping Taiwanese ("traditional Chinese") characters, KSC5601 for mapping Korean characters, and GB2312 for mapping simplified Chinese characters (China).
More information can be found in the DataStage 6.0 documentation set, which is installed with your DataStage clients in a folder called Docs. The manual you require is called NLS.pdf ("Ascential DataStage NLS Guide").

Posted: Mon Mar 31, 2003 8:27 pm
by Bala
Thank you very much for your detailed reply.

Posted: Mon Mar 31, 2003 11:06 pm
by ray.wurlod
You can also find a summary of DataStage NLS at www.datastagexchange.com, under the "ETL Experts" section.

Posted: Tue Apr 01, 2003 12:10 am
by Bala
Thank you so much.