Page 1 of 1

Reading CSV file with comma and double quotes in Datastage

Posted: Mon Apr 06, 2015 6:26 am
by syedmuhammedmehdi
I'm getting input csv file as 1,"2,2.1",3,"4,4.1". This sample record is having 4 columns with value in first column 1, second column 2,2.1, third column 3 & fourth column 4,4.1, i.e. columns having comma is enclosed in double quotes. Double quotes can be on any column if it's value is having comma. Can anyone please suggest is there any simple way to read this kind of csv file.

Another requirement is, if a column value is having double quotes then it will be enclosed with another double quotes. For example, if fifth column is having value 5,"5.1" then whole record will look like 1,"2,2.1",3,"4,4.1","5,""5.1""".

Posted: Mon Apr 06, 2015 6:56 am
by chulett
I'm curious - have you tried to read it yet? If so, what happened? The double-quotes should 'hide' the extra commas and treat them as data. If a Parallel job is having issues, try a Server job as they can handle flat files of that nature better. If for some reason that is not an option, try the Sequential File stage inside a Server Shared Container in your Parallel job.

Posted: Mon Apr 06, 2015 7:29 am
by prasson_ibm
Hi,

Try to read the file with comma delimited and do not mention double quote in the job in file definition.

Later in transformer you can handle quote character.

Posted: Mon Apr 06, 2015 7:54 am
by chulett
prasson_ibm wrote:Try to read the file with comma delimited and do not mention double quote in the job in file definition.
So their four columns will read / be parsed as six? Not really sure how that will help. :?

Posted: Mon Apr 06, 2015 1:01 pm
by syedmuhammedmehdi
I had tried using unstructured stage but it is not working for csv and in no way parallel job sequential file is working for this. As you said server sequential will work will try it but I'm thinking how we can handle quotes in a particular column as in some case we will get quotes for a column and in some not based on comma.

Posted: Mon Apr 06, 2015 1:03 pm
by syedmuhammedmehdi
chulett wrote:
prasson_ibm wrote:Try to read the file with comma delimited and do not mention double quote in the job in file definition.
So their four columns will read / be parsed as six? Not really sure how that will help. :?
Yes, I agree this will not work.

Posted: Mon Apr 06, 2015 2:56 pm
by chulett
First I'd suggest you try the Server version of the Sequential File stage and see how it behaves, what you get. Document that here and then we can see where to go next if you need more help after that.

Posted: Tue Apr 07, 2015 4:30 am
by syedmuhammedmehdi
It is working, thanks.