Control Character

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
mmkhan
Participant
Posts: 12
Joined: Tue Nov 04, 2003 9:44 am

Control Character

Post by mmkhan »

I have a set of validation to be performed on a sequential file. Among which one of them is a removing of control characters i.e non-printable char. Here is what i am trying to do.

stage1----->Transformation--------->stage2
SeqFile SeqFile

in the Columns i just specify a single column of type varchar and since i specify unix new line char as line terminator, i have my complete row/line as my var. As u can see the transformation in between were i drag and drop the single column field to out put file stage again with single column. While i am doing this transformation i want to remove all the control characters in my input file. Do we have a routine or method which can be called in this transormation which will take my input with the control char and return me without control char.
If i don't have any such method what would be the best possible solution for this.
Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Preprocessing the file with the stream editor sed is probably the most efficient way to accomplish removal of non-printing characters.

Within a BASIC Transformer stage you could take advantage of Oconv() with the "MCP" conversion specification, which converts non-printing characters to periods (for non-American speakers, this is the "." character) and comparing to the original.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mmkhan
Participant
Posts: 12
Joined: Tue Nov 04, 2003 9:44 am

It didn't work

Post by mmkhan »

ray.wurlod wrote:Preprocessing the file with the stream editor sed is probably the most efficient way to accomplish removal of non-printing characters.

Within a BASIC Transformer stage you could take advantage of Oconv() with the "MCP" conversion specification, which converts non-printing characters to periods (for non-American speakers, this is the "." character) and comparing to the original.
Ray i tried Oconv("John^CSmith^X1-234", "MCP") this in my transformation stage (e.g Oconv(DSLink1.row,"MCP"). But it gives me an error. I was wondering wether or not i am taking a right stage or not. Since u mentioned basic tansformation but on the stages palette i saw only transformer. When i enter the above method the column turn red which mean error.
Regards
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You specified Parallel.
For the parallel canvas at 7.x there are two Transformer stage types, called the Transformer stage and the BASIC Transformer stage.
You need the latter. You'll find it in the Processing group on the Palette or in your Repository browser.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vzmz
Participant
Posts: 36
Joined: Sun Nov 23, 2003 12:10 pm
Location: Dallas

I just can't find it

Post by vzmz »

ray.wurlod wrote:You specified Parallel.
For the parallel canvas at 7.x there are two Transformer stage types, called the Transformer stage and the BASIC Transformer stage.
You need the latter. You'll find it in the Processing group on the Palette or in your Repository browser.
Hi
In my document i see the about the BASIC TRANSFORMER but for some reason i just can't see in my processing group on the Palette. I right clicked on the group and then opt for CUSTOMIZATION and then on customize i see all the avaliable stage under processing but i just can't see the BASIC one. Do i have to load it from a specific location
Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

No, it should be there automatically.
In the Repository browser (tree view) open the Parallel branch. Under there you should see a Processing category. Expand that and you should find it.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mmkhan
Participant
Posts: 12
Joined: Tue Nov 04, 2003 9:44 am

Thanks

Post by mmkhan »

ray.wurlod wrote:No, it should be there automatically.
In the Repository browser (tree view) open the Parallel branch. Under there you should see a Processing category. Expand that and you should find it.
Thank i did get it Thanks a lot
One last thing is thier a Possibility that i can check wether i have a non-printable char in the string. If yes then i should be able to take an action
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If inlink.colname <> Oconv(inlink.colname,"MCP")
Then you have a non-printing character
Else you don't have a non-printing character
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mmkhan
Participant
Posts: 12
Joined: Tue Nov 04, 2003 9:44 am

Thanks a lot ray

Post by mmkhan »

ray.wurlod wrote:If inlink.colname <> Oconv(inlink.colname,"MCP")
Then you have a non-printing character
Else you don't have a non-printing character
It worked thanks,
I am almost there. At the end of this validation i have to generate a report saying this many control char were founds and these many records are were with control char.
Can i have a variable declare that can be incremented when ever i find one and crtl char. or some after routine which will count the no of records in a good file and bad file.
Thanks
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

Constrain the input stream on what Ray gave you. This will give you the number of rows with control characters (rows processed on the link).

To get the number of control characters you can use either the LEN or BYTELEN functions. Use the BYTELEN if you have NLS enabled and LEN if not (BYTELEN works the same as LEN if NLS is not enabled)

I'm not going to outline exactly what you should do since what I have given you should be enough to get your creative juices flowing.

I have used this type of logic before and it works pretty well.

Regards,

Michael Hester
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Determining whether there is a control character (the second requirement) is the same logic you already have.
Counting how may you will need a routine. If you're not using NLS, something like the following.

Code: Select all

FUNCTION CountControlChars(TheString)
Ans = 0
* Control characters are outside the ASCII range 32-127
CharCount = Len(TheString)
For i = 1 To CharCount
   AsciiCode = Seq(TheString[i,1])
   If AsciiCode < 32 Or AsciiCode > 127 Then Ans += 1
Next i
RETURN(Ans)
Use the Routine in the derivation of the output column ControlCharCount.

If you do have NLS enabled, then the solution will depend on whether you truly mean control characters (those in the C0 and C1 control sets) or whether you mean any non-printing character. You need to determine, from your locale setting or current character map, which characters are in the class you need.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply