Parsing 'Comma' delimited file containing commas in data

yaminids · Post by **yaminids** » Sat Aug 08, 2009 7:40 pm

Friends,

I am trying to design a job which populates a staging table with the data present in a Sequential file. The design itself is very simple

File ==> Transformer ==> ODBC

The file we receive is comma delimited. The problem is that the data in the file contains commas (3,890,454)

After defining the meta data in the Sequential file stage, I can view the data ("View Data") but when I execute the job it fails with "Too many columns in the file" error

Now I'm confused. The fact that I can view the data in the source file means that the stage is able to parse the data.

Am I doing something wrong which is causing the job to fail

Thanks
Yamini

chulett · Post by **chulett** » Sat Aug 08, 2009 8:41 pm

PX is notorious for having problems with commas in the data of a csv file, even if the fields are quote delimited. You may find you'll need to use a Server job for this task as it can handle them just fine... assuming they are properly quoted: xxx,"3,890,454",xxx,

chulett · Post by **chulett** » Sat Aug 08, 2009 8:42 pm

Whoops, right when I clicked on Submit I see we are talking about a Server job, so... I would wager your fields are not properly quote delimited and you in fact have an invalid csv file. Can you post an actual example or two so we can see?

yaminids · Post by **yaminids** » Sun Aug 09, 2009 1:40 am

Craig,

The data looks like this in the csv file
"Jul 31, 2009","956,611","15,907,101","2,577,527","794,461","21,736","696,030","85,138","93,019","$5,757,194","41,808","275,639"

I have chosen , (comma) as the delimiter and (") as the Quote character. Again, I'm able to view the data in the source file but after executing the job, the data is all distorted in the destination

Thanks
Yamini

chulett · Post by **chulett** » Sun Aug 09, 2009 7:35 am

OK, so which is it?

when I execute the job it fails with "Too many columns in the file" error
-or-
after executing the job, the data is all distorted in the destination

You shouldn't have any problems reading that, which the 'View Data' would show, especially if you can scroll through the entire contents of it without issue. So, any issue with the data being "all distorted" would probably come from how you are handling the data in the job.

What datatype did you use for the columns in the Sequential File stage? Give us details of what the job is doing, transformation wise, and an example or three of what "distorted" means.

yaminids · Post by **yaminids** » Sun Aug 09, 2009 11:04 am

Craig,

Sorry for the confusion. The following are the results of the latest execution

Scenario 1) Columns in the destination table were defined as 'Numeric'
Result: All the input rows were rejected with "Non-numeric data was found where numeric required" error

Scenario 2) Modified the datatype in the destination table to 'Varchar2'
Result: All the input rows were successfully loaded into the table

Conclusion (please correct me if I am wrong): Datatype of the columns in the destination table should be 'Varchar' if the input data has 'commas' in the data

Thanks
Yamini

P.S. The datatype of the column defined in DataStage doesn't seem to matter as along as the column has correct datatype defined on the db

chulett · Post by **chulett** » Sun Aug 09, 2009 11:16 am

No, that conclusion is incorrect. From what I recall, the commas shouldn't bother Oracle when loading into a NUMBER field. However, if every record had one column like this: "$5,757,194" it won't like the dollar sign and that should be your "non-numeric data". And of course a string won't care.

So first thing to try would be to strip the "$" using Convert("$","",YourField) and see if that works more better.

yaminids · Post by **yaminids** » Sun Aug 09, 2009 12:01 pm

It didn't work

. I even tried removing the commas using Trim function

yaminids · Post by **yaminids** » Sun Aug 09, 2009 2:08 pm

It worked after I made some changes. Apparently we cannot store commas in a numeric column in Oracle. I used convert function to strip the commas as well as '$'

chulett · Post by **chulett** » Sun Aug 09, 2009 4:18 pm

They certainly wouldn't be stored but I honestly thought it would ignore them during the process. If I get some spare time I'll double-check from my end.

ray.wurlod · Post by **ray.wurlod** » Sun Aug 09, 2009 5:10 pm

The "MD" conversion with Iconv() will remove currency symbols AND thousands delimiters.