Converting Non-Ascii characters
Posted: Fri Mar 25, 2011 3:46 am
I have a job where data is being fetched from SQL Server DB and populated into DB2 target. The target is truncated and written every time. I am using ODBC stage in the job(Source as well as target). There are two columns in the source table each of Varchar(8000). These columns are also present in target DB2 and with same data type and same length.
The job used to run fine but from the past week, it started aborting with SIGSEGV. Each time the job aborted it did so after inserting some rows into target. The number of rows inserted each time differed. We tried various things like running job on single node, dropping and recreating the target table etc...it did not work.
Then we tried running the job after removing above two columns from source. It ran fine. So we have concluded that this is a data issue. The source values contain junk characters which are being rejected by DB2.
We had a look at a small subset of source data and found junk characters eg:a box like character. So I need to have a function which converts characters like these to valid values say space or empty string. But the problem is that the source data has millions of records and it is not possible to identify all junk characters. Also it is quite possible that there are other junk characters in addition to the one identified above. So I need a generic validation such that only characters which are acceptable in Varchar is being populated, I agree It is strange that SQL Server Database has no problems storing this as Varchar data type but DB2 has.
Is there a way out ?
The job used to run fine but from the past week, it started aborting with SIGSEGV. Each time the job aborted it did so after inserting some rows into target. The number of rows inserted each time differed. We tried various things like running job on single node, dropping and recreating the target table etc...it did not work.
Then we tried running the job after removing above two columns from source. It ran fine. So we have concluded that this is a data issue. The source values contain junk characters which are being rejected by DB2.
We had a look at a small subset of source data and found junk characters eg:a box like character. So I need to have a function which converts characters like these to valid values say space or empty string. But the problem is that the source data has millions of records and it is not possible to identify all junk characters. Also it is quite possible that there are other junk characters in addition to the one identified above. So I need a generic validation such that only characters which are acceptable in Varchar is being populated, I agree It is strange that SQL Server Database has no problems storing this as Varchar data type but DB2 has.
Is there a way out ?