Replace SUB character in sequential file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
gagan8877
Premium Member
Premium Member
Posts: 77
Joined: Mon Jun 19, 2006 1:30 pm

Replace SUB character in sequential file

Post by gagan8877 »

Reading from a fixed-length sequential file stage using Read Methos = File Pattern, to read multiple files. Final Delimiter = END, Field Delimiter = None

Getting the following warnings:

SRC_pharmacy_inputfile,0: Missing record delimiter "\r\n", saw EOF instead
SRC_pharmacy_inputfile,0: Import warning at record 48.
SRC_pharmacy_inputfile,0: Input buffer overrun at field "ADJCOD", at offset: 0

Tried the following properties:

Recod Delimiter = UNIX Newline
Record Delimiter String = DOS Format.

Neither helped :(

Tried rejecting the record - no change, as expected - the rejected record it empty. Don't want to demote the warning.

The file has a substitute character at the end (OCTAL = 032 / Hex = 1A) visible as sub in notepad++.

Tried sed 's/'`echo "\032"`'//g' filename in Exec Cmd activity before reading the file, as mentioned in one of the posts:

http://dsxchange.com/viewtopic.php?t=12 ... 4483455cf6

but I still get the warning and the character remains in the file. DS EE 8 is on Windows, so some might wonder why am I using sed? But in Execute Cmd Activity - ls, rm, mv all work - so I thought this should too.

Another interesting find:
If I change the Read Method = Specific File and set the FILTER property to sed - no warnings. But then I will have to loop through the files to read 'em all.
Even if I replace another character like 'a' with 'aa' or anything - there are no warnings. regardless of whether the file contains substitute character at the end.

Questions:

1. Is there a way to eliminate substitute character from the file before reading it with Read Method = File Pattern?

2. Why does Filter remove the warning even without replacing the substitute character?

Thanks
vjonnala1516
Participant
Posts: 18
Joined: Fri Jan 04, 2008 5:28 am
Location: Bangalore

Post by vjonnala1516 »

hi ,

please use tr -d command in the sequential file stage..that could resolve the problem.. for the verification check the same command in unix using the following command.

tr -d '\r' <oldfile> newfile.

thanks..
VJ
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The last line of at least one of your source files is missing its line terminator. The "eof" that was encountered unexpectedly is the operating system's "end of file" marker.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
gagan8877
Premium Member
Premium Member
Posts: 77
Joined: Mon Jun 19, 2006 1:30 pm

tr -d

Post by gagan8877 »

vjonnala1516 wrote:hi ,

please use tr -d command in the sequential file stage..that could resolve the problem.. for the verification check the same command in unix using the following command.

tr -d '\r' <oldfile> newfile.

thanks..
1. So are you saying I can use tr -d in the file pattern property of Sequential File Stage?

2. How can we remove the eof character in the same file instead of outputting to a new file i.e. instead of :

tr -d '\r' <oldfile> newfile

I tried

tr -d '\r' <oldfile> oldfile

and the oldfile's contents were deleted.

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You can not remove the eof character - the operating system needs this to tell it when to stop reading the file.

Make sure that every line in the file, including the last line, has a correct line terminator. Your metadata expects it to.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
gagan8877
Premium Member
Premium Member
Posts: 77
Joined: Mon Jun 19, 2006 1:30 pm

Post by gagan8877 »

ray.wurlod wrote:You can not remove the eof character - the operating system needs this to tell it when to stop reading the file.

Make sure that every line in the file, including the last line, has a correct line terminator. Your metadata expects it to.
Hi Ray

Unfortunately I don't have control over that - the file comes from a third party vendor and we can't change the terminator.

Please let me rephrase the question - how can we strip off the '\032' character in the file?

If I use the FILTER with sed /s - I start getting:

"Conversion error calling conversion routine decimal_from_string data may have been lost" warnings.

After I remove the filter the above goes away. Filter with tr-d has no warnings and works ok with the sample file I have. But because the warning caused by filter with sed was unexpected and that's why I don't wanna use it.

Instead I want to remove the \032 with tr -d before reading in cmd exec activity and I want to remove it in the same file without outputting to a second one - how can I do that?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

The way you remove things from a file without outputting to a second is while reading it via the Filter option in the Sequential File stage, and whatever you can do from an Execute Command stage you can do there.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply