Page 1 of 1

Replace SUB character in sequential file

Posted: Sat Nov 14, 2009 4:23 pm
by gagan8877
Reading from a fixed-length sequential file stage using Read Methos = File Pattern, to read multiple files. Final Delimiter = END, Field Delimiter = None

Getting the following warnings:

SRC_pharmacy_inputfile,0: Missing record delimiter "\r\n", saw EOF instead
SRC_pharmacy_inputfile,0: Import warning at record 48.
SRC_pharmacy_inputfile,0: Input buffer overrun at field "ADJCOD", at offset: 0

Tried the following properties:

Recod Delimiter = UNIX Newline
Record Delimiter String = DOS Format.

Neither helped :(

Tried rejecting the record - no change, as expected - the rejected record it empty. Don't want to demote the warning.

The file has a substitute character at the end (OCTAL = 032 / Hex = 1A) visible as sub in notepad++.

Tried sed 's/'`echo "\032"`'//g' filename in Exec Cmd activity before reading the file, as mentioned in one of the posts:

http://dsxchange.com/viewtopic.php?t=12 ... 4483455cf6

but I still get the warning and the character remains in the file. DS EE 8 is on Windows, so some might wonder why am I using sed? But in Execute Cmd Activity - ls, rm, mv all work - so I thought this should too.

Another interesting find:
If I change the Read Method = Specific File and set the FILTER property to sed - no warnings. But then I will have to loop through the files to read 'em all.
Even if I replace another character like 'a' with 'aa' or anything - there are no warnings. regardless of whether the file contains substitute character at the end.

Questions:

1. Is there a way to eliminate substitute character from the file before reading it with Read Method = File Pattern?

2. Why does Filter remove the warning even without replacing the substitute character?

Thanks

Posted: Sat Nov 14, 2009 7:33 pm
by vjonnala1516
hi ,

please use tr -d command in the sequential file stage..that could resolve the problem.. for the verification check the same command in unix using the following command.

tr -d '\r' <oldfile> newfile.

thanks..

Posted: Sat Nov 14, 2009 11:33 pm
by ray.wurlod
The last line of at least one of your source files is missing its line terminator. The "eof" that was encountered unexpectedly is the operating system's "end of file" marker.

tr -d

Posted: Sun Nov 15, 2009 4:53 pm
by gagan8877
vjonnala1516 wrote:hi ,

please use tr -d command in the sequential file stage..that could resolve the problem.. for the verification check the same command in unix using the following command.

tr -d '\r' <oldfile> newfile.

thanks..
1. So are you saying I can use tr -d in the file pattern property of Sequential File Stage?

2. How can we remove the eof character in the same file instead of outputting to a new file i.e. instead of :

tr -d '\r' <oldfile> newfile

I tried

tr -d '\r' <oldfile> oldfile

and the oldfile's contents were deleted.

Thanks

Posted: Sun Nov 15, 2009 4:57 pm
by ray.wurlod
You can not remove the eof character - the operating system needs this to tell it when to stop reading the file.

Make sure that every line in the file, including the last line, has a correct line terminator. Your metadata expects it to.

Posted: Sun Nov 15, 2009 5:17 pm
by gagan8877
ray.wurlod wrote:You can not remove the eof character - the operating system needs this to tell it when to stop reading the file.

Make sure that every line in the file, including the last line, has a correct line terminator. Your metadata expects it to.
Hi Ray

Unfortunately I don't have control over that - the file comes from a third party vendor and we can't change the terminator.

Please let me rephrase the question - how can we strip off the '\032' character in the file?

If I use the FILTER with sed /s - I start getting:

"Conversion error calling conversion routine decimal_from_string data may have been lost" warnings.

After I remove the filter the above goes away. Filter with tr-d has no warnings and works ok with the sample file I have. But because the warning caused by filter with sed was unexpected and that's why I don't wanna use it.

Instead I want to remove the \032 with tr -d before reading in cmd exec activity and I want to remove it in the same file without outputting to a second one - how can I do that?

Posted: Sun Nov 15, 2009 6:30 pm
by chulett
The way you remove things from a file without outputting to a second is while reading it via the Filter option in the Sequential File stage, and whatever you can do from an Execute Command stage you can do there.