How to remove non-ascii char from a string

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
NewPXUser
Participant
Posts: 17
Joined: Fri Feb 11, 2005 6:06 am

How to remove non-ascii char from a string

Post by NewPXUser »

Can anyone give me an idea of stripping non-ascii chars from a string. For e.g. removing tab chars.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

You can use the transform derivation CONVERT(CHAR(009),' ',In.Column) in PX, which will transform a (HT) tab into a space in the string.
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

If you are using custom buildops, you can take advantage of some of the C/C++ built-in functions. We wrote a function called stringToGraph that takes a string as input, loops through each character and uses the C macro isgraph(c) to test if character c is a graphic character (letter, number, type of space, punctuation, etc.). All non-graphic characters are converted to spaces.

There are a number of other C-builtin macros that can be used as well:
  • isalpha(c) - c is a letter
    isupper(c) - c is an uppercase letter
    islower(c) - c is a lowercase letter
    isdigit(c) - c is a digit
    isalnum(c) - c is a letter or a digit (alpha-numeric)
    isxdigit(c) - c is a hexadecimal digit
    isspace(c) - c is a whitespace character
    ispunct(c) - c is a punctuation character
    isprint(c) - c is a printable character
    isgraph(c) - c is printable, but not a space
    iscntrl(c) - c is a control character
    isascii(c) - c is an ASCII code
The reason we wrote a function instead of embedding these macros in the buildop code (which also could be done) was because each string has to be searched (a for loop that looks at each individual char in the string) and we use this function a lot (many times per record). By the way, it is very fast - adding this code to our ETL (even if it impacts dozens, if not hundreds, of fields per record) has had neglible impact on performance.

Note: We include the function in our buildop, but I have heard that you can also create custom functions like these and make them accessible within a Transformer (parallel only, not basic). I don't know how to do this, but perhaps someone else does?
jasjad999
Participant
Posts: 3
Joined: Sun Oct 10, 2004 1:26 am

Post by jasjad999 »

Hi bcarlson,

Do you mind show us your C++ code of your stringToGraph function? I am having trouble passing my string from transformer to my function (at least I think this is where my problem is) and seems like you have done so successfully. Hope I am not asking too much. Thanks.
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

Remember, we use it in a buildop (Build/Logic/Definitions). I think there are ways to create your own C/C++ functions to be referenced from Transforms, but I do not know how.

Code: Select all

// Convert any non-graphical characters to ' '
APT_String stringToGraph(APT_String);
APT_String stringToGraph(APT_String str) {

    APT_String new_str; 
    int isNumeric=1;
    int done=0;
    int i=0, max=0;

    // Test input string - if chars Graphical,
    // replace characters with a ' '. 

    max = str.length();
    const char* ptr = str.terminatedContent();
    for (i = 0; i < max; i++) {
        if ( !isgraph(ptr[i]) )
        {
           new_str = new_str + ' ';             
        }
        else 
        {
          new_str = new_str + ptr[i];              
        }
    }
    return new_str;
}
[/code]
Post Reply