Page 1 of 1

incoming .csv file with embedded line breaks

Posted: Thu Jul 14, 2011 8:16 am
by anval52
Hello,
my incoming .csv file has embedded line breaks surrounded by double quotes. It looks like:

Account|User ID|Notes
1234567|123456|"This is a test note on account 1234567."

2345678|123456|"Hello, this is Toly from ABC company.

I am calling to let you know that we have registered the disputes for you
on the following invoices:

Invoice Date Invoice # Open Amount Due Date

Do you have any further information that might help us with these disputes?

Thank you
Toly"


I cannot read this file when the notes have line breaks and go to the next line. Is there any way to read it?
Your help is greatly appreciated.
Toly

Posted: Thu Jul 14, 2011 12:35 pm
by arunkumarmm
I assume that your file will have a " and Line termonator at end of each record. If so, Say if your file has DOS terminator, try to read the whole file using a folder stage, pass it to a transformer, replace all the " and DOS line terminator to " and UNIX line terminator and pass it to another transfomer where you can replace all other DOS terminators to a space/comma/pipe or whatever and write to a file. Now read the file as defining it to have UNIX line terminator. This should give all your records in a single line.

Let me know if I did not understand your requirement properly.

Posted: Thu Jul 14, 2011 12:59 pm
by chulett
In a Server job you can scroll the Sequential File stage over and make use of the "Contains Terminator" property for that column.

Posted: Thu Jul 14, 2011 3:17 pm
by ray.wurlod
What Craig suggested is done in the grid on the Columns tab.

It requires that the strings containing the newline characters are quoted.

Posted: Fri Jul 15, 2011 8:43 am
by anval52
Thank you for your suggestions.
As this file comes to us from vendor via sFTP from mainframe I inserted additional step on mainframe to reformat input file and make it having the same record layout for all rows before sending to server for datastage.
toly

Posted: Fri Jul 15, 2011 8:49 am
by chulett
As noted, you didn't need to do this as the file is perfectly readable as is (or "as was") in DataStage.