Removing Unprintable characters

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Naveen
Premium Member
Premium Member
Posts: 15
Joined: Sat Jan 07, 2006 10:51 pm

Removing Unprintable characters

Post by Naveen »

Hi,

How to remove unprintable characters in PX.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

How do you define unprintable? Is it characters with ASCII values <32 and greater than 126? Any exceptions? How you code this depends upon what you need done.
throbinson
Charter Member
Charter Member
Posts: 299
Joined: Wed Nov 13, 2002 5:38 pm
Location: USA

Post by throbinson »

Could you type in a few examples?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You write a parallel routine. Or you invoke a server routine through a BASIC Transformer stage.

Printable characters are defined as
  • in the ASCII range 32 through 126 if NLS is not enabled (however you may have cause to include the accented characters in the code point range 129 through 255 as well)

    those characters defined as printable in the CTYPE locale category if NLS is enabled
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dspxlearn
Premium Member
Premium Member
Posts: 291
Joined: Sat Sep 10, 2005 1:26 am

Post by dspxlearn »

Naveen,

If you want to remove any control characters(low values) from you input fields just use:

Code: Select all

If InputCol < Char(32)
Then <InputCol>
Else <something>
Thanks and Regards!!
dspxlearn
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

I posted a routine a while back that does this. See if the code in this post helps.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Post by JoshGeorge »

Here is a C++ program you can use as parallel routine to remove non printable characters from an input string. ASCII character set range can be changed according to your preference by changing values for
#define NP_RANGE_START 32 // change 32 to your value
#define NP_RANGE_END 126 // change 126 to your value
Function pxRemoveNPChars can be called with or without a second argument ie. a replace character. If replace character is not specified - remove all happens.

NOTE: This is the first version. Will try to post a modified version soon.

Code: Select all

#include<stdio.h>
#include<unistd.h> // for exit 
#include<stdlib.h>
#include<string.h>

#define NP_RANGE_START  32
#define NP_RANGE_END    126


bool IsCharNP( char InCharToCheck )
{
	bool IsNotPrintable = false;

//#if DEBUG
//	printf("\nChecking %c %d",InCharToCheck, InCharToCheck);
//#endif

   	if( (InCharToCheck > NP_RANGE_START  ) && ( InCharToCheck < NP_RANGE_END ))
	{
		IsNotPrintable = false;
	}
	else
	{
		IsNotPrintable =  true;
	}
	
//#if DEBUG
//  printf("%s printable char", IsNotPrintable ? " NOT a " : " IS a ");
//#endif

	return IsNotPrintable;
}

void Abort(char *MsgToAbortWith)
{
	//printf("[Abort] %s \n", MsgToAbortWith);
      return MsgToAbortWith;
	
}

char *GetNewString(int InLen)
{
	char *StrInMem = (char *) malloc(InLen);
	if( NULL != StrInMem )
		return StrInMem;
	else
		Abort("Insufficient memory - malloc failed");
}
	
char *RemoveNPCharsWithoutRep(char *StrToRemove)
{
	int NewStrIndex = 0;
	char *TmpStr = GetNewString( strlen( StrToRemove) + 1);
	for( int index = 0; StrToRemove[index] ; index++)
	{
		/* Check if it is a Non Printable char ? */
		if(  true == IsCharNP(StrToRemove[index]) )
			continue;

		TmpStr[NewStrIndex++] = StrToRemove[index];
	}
	TmpStr[NewStrIndex] = '\0';
	return TmpStr;
} 


char* RemoveNPCharsWithRep(char *StrToRemove, char CharToReplace)
{
	for( int index = 0; StrToRemove[index] ; index++)
	{
		/* Check if it is a Non Printable char ? */
		if(  true == IsCharNP(StrToRemove[index]) )
		{
			StrToRemove[index] = CharToReplace;
		}
		else
			continue;
	}
	return StrToRemove;
} 

/* 
 * Function 	: 	[b]pxRemoveNPChars[/b] * Input    	: 	char * InStrToRemove - IN - the input string
 * Input    	:	char ReplaceChar     - IN - the char to replace OR * if none
 * Description 	:	This function removes/replaces all the Non printable chars with the given input.
 * Note		:	Should be called with writable memory locations. Calling this function with read-only input like "Abcdefg" would fail.
 * 			Should be called with char Test[100] = "Abcdefg". RemoveNPChars(Test) and not RemoveNPChars("Abcdefg");
 * TODO		:	To be tested thorougly before use
 *
 * */

char *pxRemoveNPChars( char *InStrToRemove , char ReplaceChar = '*')
{
        char *StrToRet = NULL;
        /* First check whether the Input string is valid ? */
        if( NULL == InStrToRemove )
        {
                /* The Input was NULL , do nothing and return NULL indicating error to caller */
                return StrToRet;
        }


        if( '*' == ReplaceChar )
        {
		/* User does not want to replace NP chars with anything */
		char *NewString = RemoveNPCharsWithoutRep(InStrToRemove);	
		StrToRet = NewString;
        }
	else
	{
                StrToRet = RemoveNPCharsWithRep(InStrToRemove, ReplaceChar);
	}
	
	return StrToRet;
}

To test the success of this code, make it stand alone program by changing
#define NP_RANGE_START 70
in the above code and add below code as the last block

Code: Select all

/* The printable chars have been moved from 32 to 70 for testing purposes only */ 
int main()
{
	char TestBuffer[]="ABCDmithAAAAAbcccABCqwwwwweehfsAqAe";
	printf("Input - %s \n", TestBuffer);

	char *TmpStr = RemoveNPChars(TestBuffer,'*');
	printf("\nReturned string = %s\n", TmpStr);

	TmpStr = RemoveNPChars(TestBuffer,'$');
	printf("\nReturned string = %s\n", TmpStr);
}
Not tested properly from parallel routine point. Should work fine :D . Read the function description to understand how to use it. Just change the #define in the beginning to get the code working according to your preference.
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I heard today that Massachusetts residents currently have 37 words for snow, not one of which is printable!
:lol:

(For people encountering this post after later December 2007, the north east USA is presently in the grip of horrendous blizzards, snowstorms and other ugly weather.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Post by JoshGeorge »

Exotic Technologies Finish Road Test On Cosmic Highway. "We've taken these technologies around the test track, and now they're ready for the production line," said Dr. Marc Rayman, deputy mission manager and chief mission engineer for Deep Space 1 at NASA's Jet Propulsion Laboratory, Pasadena, CA.

"Of course, everything hasn't worked perfectly on the first try," Rayman added. "If it had, it would mean that we had not been sufficiently aggressive in selecting the technologies. Diagnosing the behavior of the various technologies is a fundamental part of Deep Space 1's objective of enabling future space science missions."


OOPstuff (.com) OOP! is a contemporary gift gallery featuring a whimsical collection of craft, specialty gifts and fun stuff from here and there. Sounds perfect for this holiday season. What else can be OOP? :lol:

!
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

ray.wurlod wrote:I heard today that Massachusetts residents currently have 37 words for snow, not one of which is printable!
:lol:
And eastern Australians for 'flood' as well, I understand. :shock:
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Nah, just a brief inundation. More your "flooding rains". Floods in other parts of the world are far worse.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Post by JoshGeorge »

Modified and tested. Here is a C++ / Parallel routine which removes/replaces all the non printable chars with the given input. Developed and tested on Linux.
Function : pxRemoveNPChars *
Input : char * InStrToRemove - IN - the input string
Input : char * ReplaceChar - IN - the char to replace OR '' (empty) if none ie. for remove all.
Description : This function removes/replaces all the Non printable chars with the given input.

Code: Select all

#include<stdio.h> 
#include<unistd.h> // for exit 
#include<stdlib.h> 
#include<string.h> 

#define NP_RANGE_START  32 
#define NP_RANGE_END    126 


bool IsCharNP( char InCharToCheck ) 
{ 
   bool IsNotPrintable = false; 

   if( (InCharToCheck > NP_RANGE_START  ) && ( InCharToCheck < NP_RANGE_END )) 
   { 
      IsNotPrintable = false; 
   } 
   else 
   { 
      IsNotPrintable =  true; 
   } 
    
   return IsNotPrintable; 
} 


char *GetNewString(int InLen) 
{ 
   char *StrInMem = (char *) malloc(InLen); 
   if( NULL != StrInMem ) 
      return StrInMem; 
   else 
      abort(); 
} 
    
char *RemoveNPCharsWithoutRep(char *StrToRemove) 
{ 
   int NewStrIndex = 0; 
   char *TmpStr = GetNewString( strlen( StrToRemove) + 1); 
   for( int index = 0; StrToRemove[index] ; index++) 
   { 
      /* Check if it is a Non Printable char ? */ 
      if(  true == IsCharNP(StrToRemove[index]) ) 
         continue; 

      TmpStr[NewStrIndex++] = StrToRemove[index]; 
   } 
   TmpStr[NewStrIndex] = '\0'; 
   return TmpStr; 
} 


char* RemoveNPCharsWithRep(char *StrToRemove, char CharToReplace) 
{ 
   for( int index = 0; StrToRemove[index] ; index++) 
   { 
      /* Check if it is a Non Printable char ? */ 
      if(  true == IsCharNP(StrToRemove[index]) ) 
      { 
         StrToRemove[index] = CharToReplace; 
      } 
      else 
         continue; 
   } 
   return StrToRemove; 
} 


/* 
 * Function    :    [b]pxRemoveNPChars[/b] * Input       :    char * InStrToRemove - IN - the input string 
 * Input       :   char ReplaceChar     - IN - the char to replace OR '' (empty) if none 
 * Description    :   This function removes/replaces all the Non printable chars with the given input. 
 */ 

char *pxRemoveNPChars( char *InStrToRemove , char *ReplaceChar) 
{ 
        char *StrToRet = NULL; 
        /* First check whether the Input string is valid ? */ 
        if (( NULL == InStrToRemove ))  // || (strlen(InStrToRemove) < 1))
        { 
                /* The Input was NULL , do nothing and return NULL indicating error to caller */ 
                return StrToRet; 
        } 
        if(( NULL == ReplaceChar) ) // || (strlen(ReplaceChar) < 1) )
        { 
          /* User does not want to replace NP chars with anything */
          char *NewString = RemoveNPCharsWithoutRep(InStrToRemove);    
          StrToRet = NewString; 
        } 
        else 
         { StrToRet = RemoveNPCharsWithRep(InStrToRemove, ReplaceChar[0]); }
    
   return StrToRet; 
}
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
Post Reply