Page 1 of 1

Sequential File (CSV) and quoting/escaping

Posted: Tue Jul 20, 2010 12:26 pm
by DSUser2000
I tried to export a CSV file. I've set the delimiter to , and the quote char to ". If there's a " in the string supplied from the source, Datastage seems to not handle it correctly: The file is then read wrong (for example with view data) and in my opinion it is made wrong (no escaping done). However, I can't simply change the quote char to another one as I can't make sure that this character isn't in the input.

Doesn't Datastage support escaping of the quote chars? For example, Excel escapes the double quote with another one. So if you have a field with a

Code: Select all

a"a
you get

Code: Select all

"a""a"
out of Excel CSV export. But Datastage just exports

Code: Select all

"a"a"
On import, neither the Datastage output nor the Excel output is correctly read into Datastage. Is there any solution?

Posted: Tue Jul 20, 2010 12:53 pm
by anbu
In your case you have to do preprocessing before using the CSV file in sequential stage.

Use this command to remove quotes around a string and reduce two consecutive double quotes to single double quote and remove Quote setting from Field defaults

Code: Select all

sed 's/"\([^,]*\)",/\1,/g;s/""/"/g'
Or Remove Quote from Field defaults and remove the quotes from your input field which has quotes in transformer

Posted: Tue Jul 20, 2010 2:13 pm
by kris007
Did you try specifying the quote character specifically at the column level? Can you try doing that? I remember doing something similar but can't recollect the exact steps.

Posted: Tue Jul 20, 2010 2:36 pm
by anbu
I have tried setting quote at column level. But that didn't work. If you remember let us know.

Posted: Tue Jul 20, 2010 3:42 pm
by ray.wurlod
Parallel jobs (at least up to the current version) are notoriously bad at handling embedded quote characters. Server jobs manage it much better.

Posted: Wed Jul 21, 2010 2:35 am
by PhilHibbs
anbu wrote:Use this command to remove quotes around a string and reduce two consecutive double quotes to single double quote and remove Quote setting from Field defaults

Code: Select all

sed 's/"\([^,]*\)",/\1,/g;s/""/"/g'
Or Remove Quote from Field defaults and remove the quotes from your input field which has quotes in transformer
CSV cannot be parsed by a regular expression. The following is perfectly valid CSV:

Code: Select all

"Nail, rusty",6"
Of course, you may be expecting a reliable subset of the myriad quirks that various CSV implementations generate, in which case the above regex may be acceptable. Otherwise, if you want to preproces your file, you could use a Perl script with the module Text::CSV as a preprocessor.

Posted: Wed Jul 21, 2010 6:09 am
by chulett
Ah... Phil and his rusty six inch nail. I was just about to link them to your old post on the subject. :wink:

Posted: Wed Jul 21, 2010 8:40 am
by anbu
Agreed to Phil that sed wont work if the string has embedded commas. Thanks for the info about Text::CSV in Perl