Page 1 of 1

How to remove control characters from source data

Posted: Sun Mar 27, 2005 8:07 pm
by Ananda
Hi,

I am running a DataStage Server job, it is failing at the record where there are control characters.

Earlier when I figured out the Line Feed control characters I used Exchange Function in Transformer Stage and could resolve the problem by replacing it with a space.

Exchange(InputData,'Oa','20')

But now in my data there are other control characters which I am not able to figure out which control character it is.
I could search the same in vi editor by giving the following search pattern : /\\$

Please tell me which control character it is and how should I handle the same in Transformer stage.

Thanks in advance
Regards
Anand

Posted: Mon Mar 28, 2005 12:40 am
by ray.wurlod
I'd be more inclined to use the UNIX tr command as a filter in a Sequential File stage, assuming that's what your source is. Use the tr class for characters in Control Set 0 (those with ASCII codes below 32) and translate them all to space or "".

If you must do it in a Transformer stage, construct a stage variable. Let's call it svControlSet0. Initialize this using and expression concatenating all the characters in Control Set 0, for example

Code: Select all

Char(1) : Char(2) : ... : Char(31)
(you will need 31 Char() functions in all). Do not include Char(0) - it will have no effect.

Create another stage variable. Let's call it svSpaces. Initialize this using the function Space(31).

In your derivation expression use the Convert() function (it's more efficient than 31 Change(), Exchange() or Ereplace() functions in this case), for example:

Code: Select all

Convert(svControlSet0, svSpaces, inlink.Columnname)

How to remove control characters from source data

Posted: Sat Apr 23, 2005 4:52 pm
by Ananda
I could replace the control characters present in my input file with a space, except for the Line Feed control character (^J).
In my input sequential file I had incorporated a Line Feed character in the first column. But the job got aborted with the following log:

controlcharacter..Sequential_File_0.DSLink3: read_delimited() - row 1, column description, required column missing.

I found that instead of having sequential file stage as my input, if the input is directly from ODBC Stage (input table), then Exchange() works just fine to replace the Line Feed Characters with a space.

Exchange(Input.column,'Oa','20')

Any inputs on how to replace Line Feed characters from an input text file.

Thanks
anand