How to remove control characters from source data

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Ananda
Participant
Posts: 29
Joined: Mon Sep 20, 2004 12:05 am

How to remove control characters from source data

Post by Ananda »

Hi,

I am running a DataStage Server job, it is failing at the record where there are control characters.

Earlier when I figured out the Line Feed control characters I used Exchange Function in Transformer Stage and could resolve the problem by replacing it with a space.

Exchange(InputData,'Oa','20')

But now in my data there are other control characters which I am not able to figure out which control character it is.
I could search the same in vi editor by giving the following search pattern : /\\$

Please tell me which control character it is and how should I handle the same in Transformer stage.

Thanks in advance
Regards
Anand
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I'd be more inclined to use the UNIX tr command as a filter in a Sequential File stage, assuming that's what your source is. Use the tr class for characters in Control Set 0 (those with ASCII codes below 32) and translate them all to space or "".

If you must do it in a Transformer stage, construct a stage variable. Let's call it svControlSet0. Initialize this using and expression concatenating all the characters in Control Set 0, for example

Code: Select all

Char(1) : Char(2) : ... : Char(31)
(you will need 31 Char() functions in all). Do not include Char(0) - it will have no effect.

Create another stage variable. Let's call it svSpaces. Initialize this using the function Space(31).

In your derivation expression use the Convert() function (it's more efficient than 31 Change(), Exchange() or Ereplace() functions in this case), for example:

Code: Select all

Convert(svControlSet0, svSpaces, inlink.Columnname)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Ananda
Participant
Posts: 29
Joined: Mon Sep 20, 2004 12:05 am

How to remove control characters from source data

Post by Ananda »

I could replace the control characters present in my input file with a space, except for the Line Feed control character (^J).
In my input sequential file I had incorporated a Line Feed character in the first column. But the job got aborted with the following log:

controlcharacter..Sequential_File_0.DSLink3: read_delimited() - row 1, column description, required column missing.

I found that instead of having sequential file stage as my input, if the input is directly from ODBC Stage (input table), then Exchange() works just fine to replace the Line Feed Characters with a space.

Exchange(Input.column,'Oa','20')

Any inputs on how to replace Line Feed characters from an input text file.

Thanks
anand
Post Reply