Page 1 of 2

Handling ^M

Posted: Wed Oct 19, 2005 12:44 pm
by dsrules
Hi,
We are trying to load data from a DB2 table after some transformations. The problem we are facing is the data when exported from the DB2 table to the unix box using the DB2 export utility is
2,"Failure to submit Audited ^M
Financials with report of ^M
Internal Controls"
3,"Failure to submit Audited^M
Financials with report of^M
Internal Control"
Trying to read this through a Seq Stage with meta data defined for 2 columns as CHAR 2 and CHAR 255 leads to errors because of the CRLF. When reading through the DB2 table itself I replaced all the special characters using

Code: Select all

EREPLACE(EREPLACE(OCONV(EREPLACE(DSLink20.DESC_OTHR_RSN,'.','|||'),"MCP"),'.',' '),'|||','.')
Its a pretty bad hack. Just wondering how we can take care of the ^M in the flat file lying on Unix.
Looks like we have Text area where end users are entering the values into it and adding new line which is going as a field into DB2.

Thanks

Posted: Wed Oct 19, 2005 12:53 pm
by kcbland
Change the radio button on the Seq stage to "DOS" instead of Unix.

Or run dostounix on every export file when finished.

Posted: Wed Oct 19, 2005 1:16 pm
by ArndW
Doesn't clicking on the column attribute 'column contains embedded terminators' take care of this problem?

Posted: Wed Oct 19, 2005 4:03 pm
by roy
Hi,
I prefer Ken's sugestion for the dos radio button in the file format (just for stating the fact it is just that)
But hey that's only my humble opinion :)

Posted: Wed Oct 19, 2005 5:38 pm
by dsrules
All the options have been tried, DOS, UNIX. Using 'column contains embedded terminators' takes care of the problem of reading it, but the removal of ^M has to be done. The code which was posted basically strips all the non printable chars to '.', before doing that I am converting any valid '.' to a unique string, '|||' in this case and again converting the '|||' back to '.'
A little more digging reveals we can take care of it in UNIX by the following commands

Code: Select all

sed 's/.$//' infile.txt > outfile.txt
or by converting all the ^M followed by a carraige return to nothing and all the ^M by a carriage return.

Code: Select all

:%s/^M$//g
:%s/^M/ /g 
And yeah we can use dos2unix tool to prepare the file before ETL.

Thanks

dsrules

Posted: Wed Oct 19, 2005 5:49 pm
by ray.wurlod

Code: Select all

Convert(Char(13), "", InLink.TheString)
Even more efficient would be to initialize a stage variable, let's call it svCtrlM, to Char(13) and not derive it for every row. This way the Char(13) function is evaluated only once.

Code: Select all

Convert(svCtrlM, "", InLink.TheString)
But I like the preprocessing technique best of all. That's less work for DataStage to do, and sed is remarkably efficient. Why not just use the sed command as a filter in a Sequential File stage, then you don't have to muck about with redirection; the Sequential File stage will read stdout of the sed commmand.

Posted: Wed Oct 19, 2005 6:05 pm
by dsrules
Thanks Ray. I will try the

Code: Select all

Convert(Char(13), "", InLink.TheString)

tomorrow.
Also what do you feel about the code I posted , we can modify it to strip all the non printable chars to nulls rather than space, but the CR was also being stripped off and we were getting concatenated words. It works for anything else though

Thanks

dsrules

Posted: Wed Oct 19, 2005 6:17 pm
by ray.wurlod
Can you convert the residual ^M to \n (explicit newline)? Or explicit space?

Posted: Wed Oct 19, 2005 6:39 pm
by dsrules
yeah we can. I was thinking more of a generic solution for such EBCDIC ASCII, non printable characters problems, like

How about checking for high end bit is true then we strip the character to null, what would be the repercussions !

Posted: Wed Oct 19, 2005 7:18 pm
by ray.wurlod
Probably nothing. Most of the characters that appear to concern you appear to be in control set 0 (Char(1) through Char(31)), none of which has its high bit set.

If they're EBCDIC, you need to convert to ASCII (or Unicode) before you can do anything sensible with them in DataStage.

Posted: Wed Oct 19, 2005 7:28 pm
by dsrules
Ah ! I forgot to mention the lower end (1 to 31) ! :)

dsrules

Posted: Wed Oct 19, 2005 9:10 pm
by kcbland
dos2unix supports streaming input and output I believe so it's really an easy choice. It would take care of those trailing ^M's on the file. As to high/low chars, well, that's a different story.

Still don't like that DOS radio button solution, eh?

Posted: Wed Oct 19, 2005 9:26 pm
by chulett
Really - why go through all of these shenanigans to remove something that a simple change of record terminator in the sequential file stage will automatically handle for you? :?

Posted: Sun Oct 23, 2005 4:30 pm
by roy
That was what I tried saying Craig :)
In any case if you can't do it for any reason putting a simple filter command with sed should resolve this as well

Posted: Sun Oct 23, 2005 8:06 pm
by kcbland
Why use sed when dos2unix does the trick without fuss?

http://linuxcommand.org/man_pages/dos2unix1.html