Page 1 of 1

Converting Non-Ascii characters

Posted: Fri Mar 25, 2011 3:46 am
by abhilashnair
I have a job where data is being fetched from SQL Server DB and populated into DB2 target. The target is truncated and written every time. I am using ODBC stage in the job(Source as well as target). There are two columns in the source table each of Varchar(8000). These columns are also present in target DB2 and with same data type and same length.

The job used to run fine but from the past week, it started aborting with SIGSEGV. Each time the job aborted it did so after inserting some rows into target. The number of rows inserted each time differed. We tried various things like running job on single node, dropping and recreating the target table etc...it did not work.


Then we tried running the job after removing above two columns from source. It ran fine. So we have concluded that this is a data issue. The source values contain junk characters which are being rejected by DB2.

We had a look at a small subset of source data and found junk characters eg:a box like character. So I need to have a function which converts characters like these to valid values say space or empty string. But the problem is that the source data has millions of records and it is not possible to identify all junk characters. Also it is quite possible that there are other junk characters in addition to the one identified above. So I need a generic validation such that only characters which are acceptable in Varchar is being populated, I agree It is strange that SQL Server Database has no problems storing this as Varchar data type but DB2 has.

Is there a way out ?

Posted: Fri Mar 25, 2011 11:54 am
by ray.wurlod
There's no such thing as junk characters. They are your client's data. If they're in the source database, they're valid, and you have to move them. First step is to find out what they actually are. When you've posted that, we should be able the help further.

Posted: Fri Mar 25, 2011 11:46 pm
by abhilashnair
ray.wurlod wrote:There's no such thing as junk characters. They are your client's data. If they're in the source database, they're valid, and you have to move them. First step is to find out what they actually are. ...
Source Database is different from target. Is that the reason ? Source is SQL Server, Target is DB2

Posted: Sat Mar 26, 2011 7:39 am
by chulett
As noted, the very important first step is to identify them. And that doesn't mean as junk or a 'box' but rather the actual ascii value of the character(s). Enlist the help of your SQL Server DBA if you are unsure how to do that. Then you'll be able to decide if/how they can be accomodated in the characterset of the DB2 database.