Page 1 of 1

Control Character

Posted: Tue Feb 24, 2004 6:01 pm
by mmkhan
I have a set of validation to be performed on a sequential file. Among which one of them is a removing of control characters i.e non-printable char. Here is what i am trying to do.

stage1----->Transformation--------->stage2
SeqFile SeqFile

in the Columns i just specify a single column of type varchar and since i specify unix new line char as line terminator, i have my complete row/line as my var. As u can see the transformation in between were i drag and drop the single column field to out put file stage again with single column. While i am doing this transformation i want to remove all the control characters in my input file. Do we have a routine or method which can be called in this transormation which will take my input with the control char and return me without control char.
If i don't have any such method what would be the best possible solution for this.
Thanks

Posted: Tue Feb 24, 2004 7:39 pm
by ray.wurlod
Preprocessing the file with the stream editor sed is probably the most efficient way to accomplish removal of non-printing characters.

Within a BASIC Transformer stage you could take advantage of Oconv() with the "MCP" conversion specification, which converts non-printing characters to periods (for non-American speakers, this is the "." character) and comparing to the original.

It didn't work

Posted: Tue Feb 24, 2004 8:08 pm
by mmkhan
ray.wurlod wrote:Preprocessing the file with the stream editor sed is probably the most efficient way to accomplish removal of non-printing characters.

Within a BASIC Transformer stage you could take advantage of Oconv() with the "MCP" conversion specification, which converts non-printing characters to periods (for non-American speakers, this is the "." character) and comparing to the original.
Ray i tried Oconv("John^CSmith^X1-234", "MCP") this in my transformation stage (e.g Oconv(DSLink1.row,"MCP"). But it gives me an error. I was wondering wether or not i am taking a right stage or not. Since u mentioned basic tansformation but on the stages palette i saw only transformer. When i enter the above method the column turn red which mean error.
Regards

Posted: Wed Feb 25, 2004 12:21 am
by ray.wurlod
You specified Parallel.
For the parallel canvas at 7.x there are two Transformer stage types, called the Transformer stage and the BASIC Transformer stage.
You need the latter. You'll find it in the Processing group on the Palette or in your Repository browser.

I just can't find it

Posted: Wed Feb 25, 2004 11:02 am
by vzmz
ray.wurlod wrote:You specified Parallel.
For the parallel canvas at 7.x there are two Transformer stage types, called the Transformer stage and the BASIC Transformer stage.
You need the latter. You'll find it in the Processing group on the Palette or in your Repository browser.
Hi
In my document i see the about the BASIC TRANSFORMER but for some reason i just can't see in my processing group on the Palette. I right clicked on the group and then opt for CUSTOMIZATION and then on customize i see all the avaliable stage under processing but i just can't see the BASIC one. Do i have to load it from a specific location
Thanks

Posted: Wed Feb 25, 2004 3:02 pm
by ray.wurlod
No, it should be there automatically.
In the Repository browser (tree view) open the Parallel branch. Under there you should see a Processing category. Expand that and you should find it.

Thanks

Posted: Wed Feb 25, 2004 4:35 pm
by mmkhan
ray.wurlod wrote:No, it should be there automatically.
In the Repository browser (tree view) open the Parallel branch. Under there you should see a Processing category. Expand that and you should find it.
Thank i did get it Thanks a lot
One last thing is thier a Possibility that i can check wether i have a non-printable char in the string. If yes then i should be able to take an action

Posted: Wed Feb 25, 2004 7:35 pm
by ray.wurlod
If inlink.colname <> Oconv(inlink.colname,"MCP")
Then you have a non-printing character
Else you don't have a non-printing character

Thanks a lot ray

Posted: Thu Feb 26, 2004 11:06 am
by mmkhan
ray.wurlod wrote:If inlink.colname <> Oconv(inlink.colname,"MCP")
Then you have a non-printing character
Else you don't have a non-printing character
It worked thanks,
I am almost there. At the end of this validation i have to generate a report saying this many control char were founds and these many records are were with control char.
Can i have a variable declare that can be incremented when ever i find one and crtl char. or some after routine which will count the no of records in a good file and bad file.
Thanks

Posted: Thu Feb 26, 2004 12:19 pm
by mhester
Constrain the input stream on what Ray gave you. This will give you the number of rows with control characters (rows processed on the link).

To get the number of control characters you can use either the LEN or BYTELEN functions. Use the BYTELEN if you have NLS enabled and LEN if not (BYTELEN works the same as LEN if NLS is not enabled)

I'm not going to outline exactly what you should do since what I have given you should be enough to get your creative juices flowing.

I have used this type of logic before and it works pretty well.

Regards,

Michael Hester

Posted: Thu Feb 26, 2004 3:13 pm
by ray.wurlod
Determining whether there is a control character (the second requirement) is the same logic you already have.
Counting how may you will need a routine. If you're not using NLS, something like the following.

Code: Select all

FUNCTION CountControlChars(TheString)
Ans = 0
* Control characters are outside the ASCII range 32-127
CharCount = Len(TheString)
For i = 1 To CharCount
   AsciiCode = Seq(TheString[i,1])
   If AsciiCode < 32 Or AsciiCode > 127 Then Ans += 1
Next i
RETURN(Ans)
Use the Routine in the derivation of the output column ControlCharCount.

If you do have NLS enabled, then the solution will depend on whether you truly mean control characters (those in the C0 and C1 control sets) or whether you mean any non-printing character. You need to determine, from your locale setting or current character map, which characters are in the class you need.