Page 1 of 1

How to load data with Special Characters

Posted: Thu Nov 06, 2008 8:44 pm
by bmouton
Hi and thanks for reading ...

We have a database that we are extracting from that has the possibility of ALL and ANY special characters that is humanly / computerly (is that a word?) possible. So ... We can't use a pipe (|) delimiter or any other type of delimiter.

If our data was just varchar, we could use fixed length formats and dump the data, then move it to our target platform, then load the data. However, we are using DB2 8.2 and 9.5 which we have CLOB data. So, space is now an issue with fixed length records ...

Long story short, is there an easy method to dump the data in a variable length format with some kind of delimiter? Since our application / database accepts all special characters, I don't know how we can dump the data, then read it ...

We have DS Server 8.0.1 running on SUSE Linux 10. We were thinking we could extract and land the data in XML files. However, we are not having any luck with using the XML approach. We can't get the XSDs right and there is not much documentation, tutorials, or classes that address this that we have sought.

Please help ...

Thanks,

Benny

Posted: Thu Nov 06, 2008 9:53 pm
by John Smith
Can you please clarify "humanly/computerly" ? Do you mean what could be input from the keyboard? Or that your data will contain all possible ASCII characters eg char(167) ?
Producing XML is possible with the XML Output stage. Try the IBM Developerworks website , there are some handy tips/examples in there.

Posted: Thu Nov 06, 2008 10:01 pm
by ray.wurlod
DB2 9.5 can also produce XML, which may be an easier solution. And you could use it to extract the data from the version 8.2 database.

Re: How to load data with Special Characters

Posted: Fri Nov 07, 2008 12:42 am
by chulett
bmouton wrote:We can't get the XSDs right and there is not much documentation, tutorials, or classes that address this that we have sought.
Documentation on XML in DataStage? Have you gotten your hands on the XML Best Practices document that Kim Duke is hosting for us in his Tips section?

Posted: Fri Nov 07, 2008 3:29 am
by bmouton
Thanks for the information so far ... the XML Best Practices information seems to be helping us ...

Here is what we are doing. Based on a specific set of extract criteria, we use a DB2 API stage to pull data and land in a file (whether it be XML or Sequential). Due to the facts that any possible typed or otherwise entered (e.g. Escape Sequences, Chinese Characters, Greek Characters) COULD be in our character-based fields (including char, varchar, CLOB), we have to find a way to land that data in a format that another set of Datastage jobs can read.

It's the typical three-tier data architecture. Extract the source data and land it, clean that file in a staging area, then load it.

We have tried to just use a Sequential File stage with a variety of delimiters. No luck.

The target data structures are a DB2 9.5 relational database. We don't have the option to move toward an XML format due to the BI Tools we are using. Not only that, we are less than 90 days from implementation.

Our team is going to review the XML Best Practices documentation. It seems that our issue has been creating the "right" XSD to land the file and then read it ....

Again thanks for the help so far!!!