Handling ^M

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

dsrules
Premium Member
Premium Member
Posts: 76
Joined: Sun Nov 28, 2004 8:56 pm

Handling ^M

Post by dsrules »

Hi,
We are trying to load data from a DB2 table after some transformations. The problem we are facing is the data when exported from the DB2 table to the unix box using the DB2 export utility is
2,"Failure to submit Audited ^M
Financials with report of ^M
Internal Controls"
3,"Failure to submit Audited^M
Financials with report of^M
Internal Control"
Trying to read this through a Seq Stage with meta data defined for 2 columns as CHAR 2 and CHAR 255 leads to errors because of the CRLF. When reading through the DB2 table itself I replaced all the special characters using

Code: Select all

EREPLACE(EREPLACE(OCONV(EREPLACE(DSLink20.DESC_OTHR_RSN,'.','|||'),"MCP"),'.',' '),'|||','.')
Its a pretty bad hack. Just wondering how we can take care of the ^M in the flat file lying on Unix.
Looks like we have Text area where end users are entering the values into it and adding new line which is going as a field into DB2.

Thanks
______________________________________
"Everytime I close the door on reality, it comes in through the windows." - Jennifer Yane
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Change the radio button on the Seq stage to "DOS" instead of Unix.

Or run dostounix on every export file when finished.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Doesn't clicking on the column attribute 'column contains embedded terminators' take care of this problem?
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
I prefer Ken's sugestion for the dos radio button in the file format (just for stating the fact it is just that)
But hey that's only my humble opinion :)
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
dsrules
Premium Member
Premium Member
Posts: 76
Joined: Sun Nov 28, 2004 8:56 pm

Post by dsrules »

All the options have been tried, DOS, UNIX. Using 'column contains embedded terminators' takes care of the problem of reading it, but the removal of ^M has to be done. The code which was posted basically strips all the non printable chars to '.', before doing that I am converting any valid '.' to a unique string, '|||' in this case and again converting the '|||' back to '.'
A little more digging reveals we can take care of it in UNIX by the following commands

Code: Select all

sed 's/.$//' infile.txt > outfile.txt
or by converting all the ^M followed by a carraige return to nothing and all the ^M by a carriage return.

Code: Select all

:%s/^M$//g
:%s/^M/ /g 
And yeah we can use dos2unix tool to prepare the file before ETL.

Thanks

dsrules
______________________________________
"Everytime I close the door on reality, it comes in through the windows." - Jennifer Yane
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Code: Select all

Convert(Char(13), "", InLink.TheString)
Even more efficient would be to initialize a stage variable, let's call it svCtrlM, to Char(13) and not derive it for every row. This way the Char(13) function is evaluated only once.

Code: Select all

Convert(svCtrlM, "", InLink.TheString)
But I like the preprocessing technique best of all. That's less work for DataStage to do, and sed is remarkably efficient. Why not just use the sed command as a filter in a Sequential File stage, then you don't have to muck about with redirection; the Sequential File stage will read stdout of the sed commmand.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dsrules
Premium Member
Premium Member
Posts: 76
Joined: Sun Nov 28, 2004 8:56 pm

Post by dsrules »

Thanks Ray. I will try the

Code: Select all

Convert(Char(13), "", InLink.TheString)

tomorrow.
Also what do you feel about the code I posted , we can modify it to strip all the non printable chars to nulls rather than space, but the CR was also being stripped off and we were getting concatenated words. It works for anything else though

Thanks

dsrules
______________________________________
"Everytime I close the door on reality, it comes in through the windows." - Jennifer Yane
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Can you convert the residual ^M to \n (explicit newline)? Or explicit space?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dsrules
Premium Member
Premium Member
Posts: 76
Joined: Sun Nov 28, 2004 8:56 pm

Post by dsrules »

yeah we can. I was thinking more of a generic solution for such EBCDIC ASCII, non printable characters problems, like

How about checking for high end bit is true then we strip the character to null, what would be the repercussions !
______________________________________
"Everytime I close the door on reality, it comes in through the windows." - Jennifer Yane
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Probably nothing. Most of the characters that appear to concern you appear to be in control set 0 (Char(1) through Char(31)), none of which has its high bit set.

If they're EBCDIC, you need to convert to ASCII (or Unicode) before you can do anything sensible with them in DataStage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dsrules
Premium Member
Premium Member
Posts: 76
Joined: Sun Nov 28, 2004 8:56 pm

Post by dsrules »

Ah ! I forgot to mention the lower end (1 to 31) ! :)

dsrules
______________________________________
"Everytime I close the door on reality, it comes in through the windows." - Jennifer Yane
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

dos2unix supports streaming input and output I believe so it's really an easy choice. It would take care of those trailing ^M's on the file. As to high/low chars, well, that's a different story.

Still don't like that DOS radio button solution, eh?
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Really - why go through all of these shenanigans to remove something that a simple change of record terminator in the sequential file stage will automatically handle for you? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

That was what I tried saying Craig :)
In any case if you can't do it for any reason putting a simple filter command with sed should resolve this as well
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Why use sed when dos2unix does the trick without fuss?

http://linuxcommand.org/man_pages/dos2unix1.html
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply