Sequential File (CSV) and quoting/escaping

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
DSUser2000
Participant
Posts: 42
Joined: Tue Oct 20, 2009 8:36 am

Sequential File (CSV) and quoting/escaping

Post by DSUser2000 »

I tried to export a CSV file. I've set the delimiter to , and the quote char to ". If there's a " in the string supplied from the source, Datastage seems to not handle it correctly: The file is then read wrong (for example with view data) and in my opinion it is made wrong (no escaping done). However, I can't simply change the quote char to another one as I can't make sure that this character isn't in the input.

Doesn't Datastage support escaping of the quote chars? For example, Excel escapes the double quote with another one. So if you have a field with a

Code: Select all

a"a
you get

Code: Select all

"a""a"
out of Excel CSV export. But Datastage just exports

Code: Select all

"a"a"
On import, neither the Datastage output nor the Excel output is correctly read into Datastage. Is there any solution?
anbu
Premium Member
Premium Member
Posts: 596
Joined: Sat Feb 18, 2006 2:25 am
Location: india

Post by anbu »

In your case you have to do preprocessing before using the CSV file in sequential stage.

Use this command to remove quotes around a string and reduce two consecutive double quotes to single double quote and remove Quote setting from Field defaults

Code: Select all

sed 's/"\([^,]*\)",/\1,/g;s/""/"/g'
Or Remove Quote from Field defaults and remove the quotes from your input field which has quotes in transformer
You are the creator of your destiny - Swami Vivekananda
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

Did you try specifying the quote character specifically at the column level? Can you try doing that? I remember doing something similar but can't recollect the exact steps.
Kris

Where's the "Any" key?-Homer Simpson
anbu
Premium Member
Premium Member
Posts: 596
Joined: Sat Feb 18, 2006 2:25 am
Location: india

Post by anbu »

I have tried setting quote at column level. But that didn't work. If you remember let us know.
You are the creator of your destiny - Swami Vivekananda
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Parallel jobs (at least up to the current version) are notoriously bad at handling embedded quote characters. Server jobs manage it much better.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

anbu wrote:Use this command to remove quotes around a string and reduce two consecutive double quotes to single double quote and remove Quote setting from Field defaults

Code: Select all

sed 's/"\([^,]*\)",/\1,/g;s/""/"/g'
Or Remove Quote from Field defaults and remove the quotes from your input field which has quotes in transformer
CSV cannot be parsed by a regular expression. The following is perfectly valid CSV:

Code: Select all

"Nail, rusty",6"
Of course, you may be expecting a reliable subset of the myriad quirks that various CSV implementations generate, in which case the above regex may be acceptable. Otherwise, if you want to preproces your file, you could use a Perl script with the module Text::CSV as a preprocessor.
Phil Hibbs | Capgemini
Technical Consultant
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ah... Phil and his rusty six inch nail. I was just about to link them to your old post on the subject. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
anbu
Premium Member
Premium Member
Posts: 596
Joined: Sat Feb 18, 2006 2:25 am
Location: india

Post by anbu »

Agreed to Phil that sed wont work if the string has embedded commas. Thanks for the info about Text::CSV in Perl
You are the creator of your destiny - Swami Vivekananda
Post Reply