incoming .csv file with embedded line breaks

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
anval52
Participant
Posts: 4
Joined: Tue May 05, 2009 12:42 pm

incoming .csv file with embedded line breaks

Post by anval52 »

Hello,
my incoming .csv file has embedded line breaks surrounded by double quotes. It looks like:

Account|User ID|Notes
1234567|123456|"This is a test note on account 1234567."

2345678|123456|"Hello, this is Toly from ABC company.

I am calling to let you know that we have registered the disputes for you
on the following invoices:

Invoice Date Invoice # Open Amount Due Date

Do you have any further information that might help us with these disputes?

Thank you
Toly"


I cannot read this file when the notes have line breaks and go to the next line. Is there any way to read it?
Your help is greatly appreciated.
Toly
arunkumarmm
Participant
Posts: 246
Joined: Mon Jun 30, 2008 3:22 am
Location: New York
Contact:

Post by arunkumarmm »

I assume that your file will have a " and Line termonator at end of each record. If so, Say if your file has DOS terminator, try to read the whole file using a folder stage, pass it to a transformer, replace all the " and DOS line terminator to " and UNIX line terminator and pass it to another transfomer where you can replace all other DOS terminators to a space/comma/pipe or whatever and write to a file. Now read the file as defining it to have UNIX line terminator. This should give all your records in a single line.

Let me know if I did not understand your requirement properly.
Arun
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

In a Server job you can scroll the Sequential File stage over and make use of the "Contains Terminator" property for that column.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What Craig suggested is done in the grid on the Columns tab.

It requires that the strings containing the newline characters are quoted.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
anval52
Participant
Posts: 4
Joined: Tue May 05, 2009 12:42 pm

Post by anval52 »

Thank you for your suggestions.
As this file comes to us from vendor via sFTP from mainframe I inserted additional step on mainframe to reformat input file and make it having the same record layout for all rows before sending to server for datastage.
toly
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

As noted, you didn't need to do this as the file is perfectly readable as is (or "as was") in DataStage.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply