Page 1 of 1

CSV file with embedded quotes and commas

Posted: Thu Jun 05, 2014 11:49 pm
by eli.nawas_AUS
I have looked around on google and this forum, as well as the documentation, and experimenting for myself, and not found a solution. If there is no DataStage native solution, I would like confirmation that that is the case. I do not want to see a command-line scripting solution to this problem.

It seems like the Datastage parallel job file input step cannot handle general Excel CSV files. For instance, the below.

30-40 ABC+C,TV,MXN,0.08026,160701,,,SPACE,"20""",,,Cable / Satellite,,"97,253.20","7,805.13",,0,"97,253.20","7,805.13",2,1/6/14,1/12/14,January,"24,313.30","24,313.30","1,951.28","1,951.28",2014
Delimiter for field "Burst_Spend_NNMC_USD" not found; input: {2 0 1 4}, at offset: 368

There are commas embedded in the fields, and also double quotes. Setting the file as double-quoted will fail on the embedded quotes, setting it as not double-quoted will fail on the embedded commas. How can I read a line like this in correctly?

Posted: Fri Jun 06, 2014 5:39 am
by ray.wurlod
Had you searched more diligently you would have found that the parallel Sequential File stage is really bad at this whereas the server Sequential File stage is brilliant with these and other aberrations (such as missing columns).

Therefore best advice is to use a server Sequential File stage, either in a server job or in a server Shared Container in a parallel job.

Posted: Mon Jun 09, 2014 1:17 am
by ssnegi
I would suggest that you read the file as varchar for all the fields.
So this will read the " and , also.
Then in transformer stage use loop variable to loop through the line and convert the ", to ''(empty).
This will remove these characters.

Posted: Mon Jun 09, 2014 6:29 am
by chulett
Or just do as Ray suggests. Heck, there may not be anything about this Excel reading job that really "needs" to be done in Parallel and a Server job could have it done while PX was still getting its pants on. :wink:

Posted: Mon Jun 09, 2014 5:32 pm
by ray.wurlod
Maybe they don't want to remove the characters which are, after all, part of the data.