Reading CSV file with comma and double quotes in Datastage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
syedmuhammedmehdi
Participant
Posts: 43
Joined: Wed Feb 12, 2014 12:34 pm
Location: Hyderabad, India

Reading CSV file with comma and double quotes in Datastage

Post by syedmuhammedmehdi »

I'm getting input csv file as 1,"2,2.1",3,"4,4.1". This sample record is having 4 columns with value in first column 1, second column 2,2.1, third column 3 & fourth column 4,4.1, i.e. columns having comma is enclosed in double quotes. Double quotes can be on any column if it's value is having comma. Can anyone please suggest is there any simple way to read this kind of csv file.

Another requirement is, if a column value is having double quotes then it will be enclosed with another double quotes. For example, if fifth column is having value 5,"5.1" then whole record will look like 1,"2,2.1",3,"4,4.1","5,""5.1""".
SyedMuhammadMehdi
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I'm curious - have you tried to read it yet? If so, what happened? The double-quotes should 'hide' the extra commas and treat them as data. If a Parallel job is having issues, try a Server job as they can handle flat files of that nature better. If for some reason that is not an option, try the Sequential File stage inside a Server Shared Container in your Parallel job.
-craig

"You can never have too many knives" -- Logan Nine Fingers
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

Hi,

Try to read the file with comma delimited and do not mention double quote in the job in file definition.

Later in transformer you can handle quote character.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

prasson_ibm wrote:Try to read the file with comma delimited and do not mention double quote in the job in file definition.
So their four columns will read / be parsed as six? Not really sure how that will help. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
syedmuhammedmehdi
Participant
Posts: 43
Joined: Wed Feb 12, 2014 12:34 pm
Location: Hyderabad, India

Post by syedmuhammedmehdi »

I had tried using unstructured stage but it is not working for csv and in no way parallel job sequential file is working for this. As you said server sequential will work will try it but I'm thinking how we can handle quotes in a particular column as in some case we will get quotes for a column and in some not based on comma.
SyedMuhammadMehdi
syedmuhammedmehdi
Participant
Posts: 43
Joined: Wed Feb 12, 2014 12:34 pm
Location: Hyderabad, India

Post by syedmuhammedmehdi »

chulett wrote:
prasson_ibm wrote:Try to read the file with comma delimited and do not mention double quote in the job in file definition.
So their four columns will read / be parsed as six? Not really sure how that will help. :?
Yes, I agree this will not work.
SyedMuhammadMehdi
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

First I'd suggest you try the Server version of the Sequential File stage and see how it behaves, what you get. Document that here and then we can see where to go next if you need more help after that.
-craig

"You can never have too many knives" -- Logan Nine Fingers
syedmuhammedmehdi
Participant
Posts: 43
Joined: Wed Feb 12, 2014 12:34 pm
Location: Hyderabad, India

Post by syedmuhammedmehdi »

It is working, thanks.
SyedMuhammadMehdi
Post Reply