Remove special characters (¿) from the varchar data type

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

This is a little hard to follow.

It is very easy to convert unicode to ascii - 256 and lose the extended unicode characters. If that is useful to you, I can show you how.

It gets more complicated in a hurry if you need to keep it as unicode.
rajudx
Participant
Posts: 45
Joined: Tue Nov 14, 2006 1:58 pm
Location: NJ

Post by rajudx »

Thanks for your help.
if we will convert from Unicode to ascii - 256 that require to change in datastage job metadata on output stage.
Ran
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

To me, this is a classic case of using BYTE syntax for a column like this when you really shouldn't. IMHO it would be worth asking your architect / DBA if it can be modified to use CHAR syntax instead so it stores 4000 characters rather than 4000 bytes in the field:

From
BIG_FIELD VARCHAR2(4000 BYTE)

To
BIG_FIELD VARCHARS(4000 CHAR)
-craig

"You can never have too many knives" -- Logan Nine Fingers
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

rajudx wrote:if we will convert from Unicode to ascii - 256 that require to change in datastage job metadata on output stage.
Ascii byte streams are legal unicode. Depending on what exactly you DO, you may not have to change any metadata. It just depends on your approach.

Are there a few specific values in your data that are causing the trouble? In that case, a couple of nested ereplace might be all you need. If it is more than a few values, a routine can do the same more efficiently. Or you can do the reverse, and only keep certain characters.

The real question is data driven. Are these characters meaningful to your data? What do you expect to go into your target, based off the your input?

The algorithm you cook up depends on your specific data and needs.
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

If that is the only character and its actual value, just directly eliminate it, you can use ereplace. I can't recall... is that the symbol for a generic unknown character?? Or is it actually, really, the inverted question mark symbol?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yeah, it continues to confound me how many people encounter "special" or even better the so-called "junk" characters and simply want to nuke them. Far better IMHO to recognize what they are as your client's data and accommodate them properly, and which ways to do exactly that have been discussed here. Ah well.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply