Double quotes in data

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
aramachandra
Participant
Posts: 55
Joined: Tue Sep 20, 2005 10:58 am

Double quotes in data

Post by aramachandra »

Hi All

My sample file qtest.csv in unix has two lines ( though it is CSV it really is delimited by a |)


"arvind"|"tes"ting"
"newtest"|"testing"


I am trying to read it via a sequential file stage in a PX job

The record delimiter is set to be a Unix new line

Delimiter is set to |
Quote is set to double


When i view the data in the sequential stage i only can see as follows

arvind|tes
newtest|testing


In other words How can i specify to datastage that the field is enclosed in double quotes ( and can contain a double quote as part of the data ) and fields are delimited by |

I want to avoid manually massaging the data outsite to replace " in the data with some other character/s

Am I missing something basic here.

Has someone else ran into this problem before

arvind
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

The stage is behaving exactly how you asked it to behave. The very first double quote it finds, after the initial quote, is where the data ends. I don't see any other way, other than massaging the file prehand. Or set the quote character to none and handle it within the job.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Your strings are badly formed. You can read the strings, declared without quotes, and then trim the leading and trailing quote characters in a transform stage, as DSGuru2B suggests.
aramachandra
Participant
Posts: 55
Joined: Tue Sep 20, 2005 10:58 am

Thanks for your suggestions

Post by aramachandra »

Hi

Thanks for your responses

Actually in our situation since we have | as a delimiter and the fact that we can get pipe's in our data means, I cannot set the quotes to none

The quotes attribute set to double allows us to have pipes in the data

But it looks like i have to do some transformation of the datafile before it hits datastage to handle our situation or change the delimiter to something other than pipe

Thanks again for your help

consider this post as closed
arvind
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

As its originator, you should mark it Resolved.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jstrobel
Participant
Posts: 14
Joined: Thu Aug 23, 2007 2:07 pm
Location: Chicago
Contact:

Post by jstrobel »

Just ran into the same situation, although the embedded double quote is escaped by another double quote (which by the way, is CSV standard).

http://en.wikipedia.org/wiki/Comma-separated_values

DSEE does not seem to be able to handle an escaped, double-quote situation like

"1234","1XYEU74GB6UA51754","1599","Great car, New 18"" Chrome Wheels!","60606"

If you say quoted none, then you have to ensure you have no embedded commas in text fields (which is a long-shot).
Post Reply