Converting Non-Ascii characters

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
abhilashnair
Participant
Posts: 284
Joined: Fri Oct 13, 2006 4:31 am

Converting Non-Ascii characters

Post by abhilashnair »

I have a job where data is being fetched from SQL Server DB and populated into DB2 target. The target is truncated and written every time. I am using ODBC stage in the job(Source as well as target). There are two columns in the source table each of Varchar(8000). These columns are also present in target DB2 and with same data type and same length.

The job used to run fine but from the past week, it started aborting with SIGSEGV. Each time the job aborted it did so after inserting some rows into target. The number of rows inserted each time differed. We tried various things like running job on single node, dropping and recreating the target table etc...it did not work.


Then we tried running the job after removing above two columns from source. It ran fine. So we have concluded that this is a data issue. The source values contain junk characters which are being rejected by DB2.

We had a look at a small subset of source data and found junk characters eg:a box like character. So I need to have a function which converts characters like these to valid values say space or empty string. But the problem is that the source data has millions of records and it is not possible to identify all junk characters. Also it is quite possible that there are other junk characters in addition to the one identified above. So I need a generic validation such that only characters which are acceptable in Varchar is being populated, I agree It is strange that SQL Server Database has no problems storing this as Varchar data type but DB2 has.

Is there a way out ?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There's no such thing as junk characters. They are your client's data. If they're in the source database, they're valid, and you have to move them. First step is to find out what they actually are. When you've posted that, we should be able the help further.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
abhilashnair
Participant
Posts: 284
Joined: Fri Oct 13, 2006 4:31 am

Post by abhilashnair »

ray.wurlod wrote:There's no such thing as junk characters. They are your client's data. If they're in the source database, they're valid, and you have to move them. First step is to find out what they actually are. ...
Source Database is different from target. Is that the reason ? Source is SQL Server, Target is DB2
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

As noted, the very important first step is to identify them. And that doesn't mean as junk or a 'box' but rather the actual ascii value of the character(s). Enlist the help of your SQL Server DBA if you are unsure how to do that. Then you'll be able to decide if/how they can be accomodated in the characterset of the DB2 database.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply