double byte problem

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Bala
Participant
Posts: 17
Joined: Mon Oct 14, 2002 8:05 pm

double byte problem

Post by Bala »

Hi,
We are planning for Regional DWH.
My company is having many branches in the Asia pacific region and head office is at Singapore.
The countries like China, Korea and Taiwan are having double byte info in their data.
My questions are below:
1>. Is it possible to have single target database after transformation?
AS/400 DB --> ETL --> Oracle DB ==> for all countries
( OR )
2>. Do I need to have separate ETL server and target database for each double byte countries and other single byte countries?

3> Can I FTP the EBCDIC info into flat files in ASCII format and do the transformation and then Load into Target Oracle DB. Will these idea works???

If anyone experienced the above problem and having answers, please share ideas and knowledge with me.

Thanks and Regards,
Bala
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

One instance of DataStage will handle all the character sets. The mechanism is to use (a UTF-8 encoding of) Unicode within DataStage, and to position maps on all interfaces between DataStage and the "outside world" (primarily links connecting to passive stage types). Thus, for Korea you might use maps that convert to and from the encoding of Korean, simplified Chinese for China, traditional Chinese for Taiwan.
As for target database, you can employ a similar scheme. Combining character sets in a single table is problematic, unless a combined character set (such as Unicode) is used. Otherwise, you are likely to see a large number of "unmappable characters"; for example most databases use the LANG environment variable, but this can only have one value at any one time.
The advantage of DataStage's approach is that the single-byte character set is simply a subset of Unicode, so both can co-exist happily within DataStage. The same cannot necessarily be guaranteed for all target databases. I do not know Oracle well enough to say either way, and the capability may well be revision-dependent.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
Bala
Participant
Posts: 17
Joined: Mon Oct 14, 2002 8:05 pm

Post by Bala »

Thanks a lot for ur quick and valuable information.

Just one more question regarding this.
As I have very least knowledge about admin part of Datastage, please advice where & how to setup these maps between datastage and double byte character sets.

Thanks and Regards,
Bala
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The maps are supplied with DataStage.
When you install DataStage server with NLS enabled, you choose a default language (presumably English if you are in Singapore).
In the Administrator client you can configure a default character map and locale for a DataStage project. That character map is used on all connections to that project unless specifically overridden.
When designing a job, you can specify a different set of defaults for that job. Within a job you can override on a per-link or even on a per-column basis, depending on how the external data are encoded, or required to be encoded. In nearly every site in which I've worked in Asia (except Japan, which has very many encodings), the overall project default has not needed to be overridden when dealing with the data of a single country.
In your case you will most likely need BIG5 for mapping Taiwanese ("traditional Chinese") characters, KSC5601 for mapping Korean characters, and GB2312 for mapping simplified Chinese characters (China).
More information can be found in the DataStage 6.0 documentation set, which is installed with your DataStage clients in a folder called Docs. The manual you require is called NLS.pdf ("Ascential DataStage NLS Guide").
Bala
Participant
Posts: 17
Joined: Mon Oct 14, 2002 8:05 pm

Post by Bala »

Thank you very much for your detailed reply.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You can also find a summary of DataStage NLS at www.datastagexchange.com, under the "ETL Experts" section.
Bala
Participant
Posts: 17
Joined: Mon Oct 14, 2002 8:05 pm

Post by Bala »

Thank you so much.
Post Reply