Page 1 of 1

Posted: Thu Dec 15, 2016 10:39 am
by UCDI
This is a little hard to follow.

It is very easy to convert unicode to ascii - 256 and lose the extended unicode characters. If that is useful to you, I can show you how.

It gets more complicated in a hurry if you need to keep it as unicode.

Posted: Thu Dec 15, 2016 11:11 am
by rajudx
Thanks for your help.
if we will convert from Unicode to ascii - 256 that require to change in datastage job metadata on output stage.

Posted: Thu Dec 15, 2016 11:39 am
by chulett
To me, this is a classic case of using BYTE syntax for a column like this when you really shouldn't. IMHO it would be worth asking your architect / DBA if it can be modified to use CHAR syntax instead so it stores 4000 characters rather than 4000 bytes in the field:

From
BIG_FIELD VARCHAR2(4000 BYTE)

To
BIG_FIELD VARCHARS(4000 CHAR)

Posted: Thu Dec 15, 2016 1:23 pm
by UCDI
rajudx wrote:if we will convert from Unicode to ascii - 256 that require to change in datastage job metadata on output stage.
Ascii byte streams are legal unicode. Depending on what exactly you DO, you may not have to change any metadata. It just depends on your approach.

Are there a few specific values in your data that are causing the trouble? In that case, a couple of nested ereplace might be all you need. If it is more than a few values, a routine can do the same more efficiently. Or you can do the reverse, and only keep certain characters.

The real question is data driven. Are these characters meaningful to your data? What do you expect to go into your target, based off the your input?

The algorithm you cook up depends on your specific data and needs.

Posted: Fri Dec 16, 2016 10:13 am
by UCDI
If that is the only character and its actual value, just directly eliminate it, you can use ereplace. I can't recall... is that the symbol for a generic unknown character?? Or is it actually, really, the inverted question mark symbol?

Posted: Sun Dec 18, 2016 8:50 am
by chulett
Yeah, it continues to confound me how many people encounter "special" or even better the so-called "junk" characters and simply want to nuke them. Far better IMHO to recognize what they are as your client's data and accommodate them properly, and which ways to do exactly that have been discussed here. Ah well.