Control Character
Moderators: chulett, rschirm, roy
Control Character
I have a set of validation to be performed on a sequential file. Among which one of them is a removing of control characters i.e non-printable char. Here is what i am trying to do.
stage1----->Transformation--------->stage2
SeqFile SeqFile
in the Columns i just specify a single column of type varchar and since i specify unix new line char as line terminator, i have my complete row/line as my var. As u can see the transformation in between were i drag and drop the single column field to out put file stage again with single column. While i am doing this transformation i want to remove all the control characters in my input file. Do we have a routine or method which can be called in this transormation which will take my input with the control char and return me without control char.
If i don't have any such method what would be the best possible solution for this.
Thanks
stage1----->Transformation--------->stage2
SeqFile SeqFile
in the Columns i just specify a single column of type varchar and since i specify unix new line char as line terminator, i have my complete row/line as my var. As u can see the transformation in between were i drag and drop the single column field to out put file stage again with single column. While i am doing this transformation i want to remove all the control characters in my input file. Do we have a routine or method which can be called in this transormation which will take my input with the control char and return me without control char.
If i don't have any such method what would be the best possible solution for this.
Thanks
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Preprocessing the file with the stream editor sed is probably the most efficient way to accomplish removal of non-printing characters.
Within a BASIC Transformer stage you could take advantage of Oconv() with the "MCP" conversion specification, which converts non-printing characters to periods (for non-American speakers, this is the "." character) and comparing to the original.
Within a BASIC Transformer stage you could take advantage of Oconv() with the "MCP" conversion specification, which converts non-printing characters to periods (for non-American speakers, this is the "." character) and comparing to the original.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
It didn't work
Ray i tried Oconv("John^CSmith^X1-234", "MCP") this in my transformation stage (e.g Oconv(DSLink1.row,"MCP"). But it gives me an error. I was wondering wether or not i am taking a right stage or not. Since u mentioned basic tansformation but on the stages palette i saw only transformer. When i enter the above method the column turn red which mean error.ray.wurlod wrote:Preprocessing the file with the stream editor sed is probably the most efficient way to accomplish removal of non-printing characters.
Within a BASIC Transformer stage you could take advantage of Oconv() with the "MCP" conversion specification, which converts non-printing characters to periods (for non-American speakers, this is the "." character) and comparing to the original.
Regards
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You specified Parallel.
For the parallel canvas at 7.x there are two Transformer stage types, called the Transformer stage and the BASIC Transformer stage.
You need the latter. You'll find it in the Processing group on the Palette or in your Repository browser.
For the parallel canvas at 7.x there are two Transformer stage types, called the Transformer stage and the BASIC Transformer stage.
You need the latter. You'll find it in the Processing group on the Palette or in your Repository browser.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
I just can't find it
Hiray.wurlod wrote:You specified Parallel.
For the parallel canvas at 7.x there are two Transformer stage types, called the Transformer stage and the BASIC Transformer stage.
You need the latter. You'll find it in the Processing group on the Palette or in your Repository browser.
In my document i see the about the BASIC TRANSFORMER but for some reason i just can't see in my processing group on the Palette. I right clicked on the group and then opt for CUSTOMIZATION and then on customize i see all the avaliable stage under processing but i just can't see the BASIC one. Do i have to load it from a specific location
Thanks
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
No, it should be there automatically.
In the Repository browser (tree view) open the Parallel branch. Under there you should see a Processing category. Expand that and you should find it.
In the Repository browser (tree view) open the Parallel branch. Under there you should see a Processing category. Expand that and you should find it.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Thanks
Thank i did get it Thanks a lotray.wurlod wrote:No, it should be there automatically.
In the Repository browser (tree view) open the Parallel branch. Under there you should see a Processing category. Expand that and you should find it.
One last thing is thier a Possibility that i can check wether i have a non-printable char in the string. If yes then i should be able to take an action
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Thanks a lot ray
It worked thanks,ray.wurlod wrote:If inlink.colname <> Oconv(inlink.colname,"MCP")
Then you have a non-printing character
Else you don't have a non-printing character
I am almost there. At the end of this validation i have to generate a report saying this many control char were founds and these many records are were with control char.
Can i have a variable declare that can be incremented when ever i find one and crtl char. or some after routine which will count the no of records in a good file and bad file.
Thanks
Constrain the input stream on what Ray gave you. This will give you the number of rows with control characters (rows processed on the link).
To get the number of control characters you can use either the LEN or BYTELEN functions. Use the BYTELEN if you have NLS enabled and LEN if not (BYTELEN works the same as LEN if NLS is not enabled)
I'm not going to outline exactly what you should do since what I have given you should be enough to get your creative juices flowing.
I have used this type of logic before and it works pretty well.
Regards,
Michael Hester
To get the number of control characters you can use either the LEN or BYTELEN functions. Use the BYTELEN if you have NLS enabled and LEN if not (BYTELEN works the same as LEN if NLS is not enabled)
I'm not going to outline exactly what you should do since what I have given you should be enough to get your creative juices flowing.
I have used this type of logic before and it works pretty well.
Regards,
Michael Hester
Mike Hester
mhester@petra-ps.com
mhester@petra-ps.com
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Determining whether there is a control character (the second requirement) is the same logic you already have.
Counting how may you will need a routine. If you're not using NLS, something like the following.
Use the Routine in the derivation of the output column ControlCharCount.
If you do have NLS enabled, then the solution will depend on whether you truly mean control characters (those in the C0 and C1 control sets) or whether you mean any non-printing character. You need to determine, from your locale setting or current character map, which characters are in the class you need.
Counting how may you will need a routine. If you're not using NLS, something like the following.
Code: Select all
FUNCTION CountControlChars(TheString)
Ans = 0
* Control characters are outside the ASCII range 32-127
CharCount = Len(TheString)
For i = 1 To CharCount
AsciiCode = Seq(TheString[i,1])
If AsciiCode < 32 Or AsciiCode > 127 Then Ans += 1
Next i
RETURN(Ans)
If you do have NLS enabled, then the solution will depend on whether you truly mean control characters (those in the C0 and C1 control sets) or whether you mean any non-printing character. You need to determine, from your locale setting or current character map, which characters are in the class you need.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.