Parsing 'Comma' delimited file containing commas in data
Moderators: chulett, rschirm, roy
Parsing 'Comma' delimited file containing commas in data
Friends,
I am trying to design a job which populates a staging table with the data present in a Sequential file. The design itself is very simple
File ==> Transformer ==> ODBC
The file we receive is comma delimited. The problem is that the data in the file contains commas (3,890,454)
After defining the meta data in the Sequential file stage, I can view the data ("View Data") but when I execute the job it fails with "Too many columns in the file" error
Now I'm confused. The fact that I can view the data in the source file means that the stage is able to parse the data.
Am I doing something wrong which is causing the job to fail
Thanks
Yamini
I am trying to design a job which populates a staging table with the data present in a Sequential file. The design itself is very simple
File ==> Transformer ==> ODBC
The file we receive is comma delimited. The problem is that the data in the file contains commas (3,890,454)
After defining the meta data in the Sequential file stage, I can view the data ("View Data") but when I execute the job it fails with "Too many columns in the file" error
Now I'm confused. The fact that I can view the data in the source file means that the stage is able to parse the data.
Am I doing something wrong which is causing the job to fail
Thanks
Yamini
PX is notorious for having problems with commas in the data of a csv file, even if the fields are quote delimited. You may find you'll need to use a Server job for this task as it can handle them just fine... assuming they are properly quoted: xxx,"3,890,454",xxx,
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Whoops, right when I clicked on Submit I see we are talking about a Server job, so... I would wager your fields are not properly quote delimited and you in fact have an invalid csv file. Can you post an actual example or two so we can see?
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Craig,
The data looks like this in the csv file
"Jul 31, 2009","956,611","15,907,101","2,577,527","794,461","21,736","696,030","85,138","93,019","$5,757,194","41,808","275,639"
I have chosen , (comma) as the delimiter and (") as the Quote character. Again, I'm able to view the data in the source file but after executing the job, the data is all distorted in the destination
Thanks
Yamini
The data looks like this in the csv file
"Jul 31, 2009","956,611","15,907,101","2,577,527","794,461","21,736","696,030","85,138","93,019","$5,757,194","41,808","275,639"
I have chosen , (comma) as the delimiter and (") as the Quote character. Again, I'm able to view the data in the source file but after executing the job, the data is all distorted in the destination
Thanks
Yamini
OK, so which is it?
when I execute the job it fails with "Too many columns in the file" error
-or-
after executing the job, the data is all distorted in the destination
You shouldn't have any problems reading that, which the 'View Data' would show, especially if you can scroll through the entire contents of it without issue. So, any issue with the data being "all distorted" would probably come from how you are handling the data in the job.
What datatype did you use for the columns in the Sequential File stage? Give us details of what the job is doing, transformation wise, and an example or three of what "distorted" means.
when I execute the job it fails with "Too many columns in the file" error
-or-
after executing the job, the data is all distorted in the destination
You shouldn't have any problems reading that, which the 'View Data' would show, especially if you can scroll through the entire contents of it without issue. So, any issue with the data being "all distorted" would probably come from how you are handling the data in the job.
What datatype did you use for the columns in the Sequential File stage? Give us details of what the job is doing, transformation wise, and an example or three of what "distorted" means.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Craig,
Sorry for the confusion. The following are the results of the latest execution
Scenario 1) Columns in the destination table were defined as 'Numeric'
Result: All the input rows were rejected with "Non-numeric data was found where numeric required" error
Scenario 2) Modified the datatype in the destination table to 'Varchar2'
Result: All the input rows were successfully loaded into the table
Conclusion (please correct me if I am wrong): Datatype of the columns in the destination table should be 'Varchar' if the input data has 'commas' in the data
Thanks
Yamini
P.S. The datatype of the column defined in DataStage doesn't seem to matter as along as the column has correct datatype defined on the db
Sorry for the confusion. The following are the results of the latest execution
Scenario 1) Columns in the destination table were defined as 'Numeric'
Result: All the input rows were rejected with "Non-numeric data was found where numeric required" error
Scenario 2) Modified the datatype in the destination table to 'Varchar2'
Result: All the input rows were successfully loaded into the table
Conclusion (please correct me if I am wrong): Datatype of the columns in the destination table should be 'Varchar' if the input data has 'commas' in the data
Thanks
Yamini
P.S. The datatype of the column defined in DataStage doesn't seem to matter as along as the column has correct datatype defined on the db
No, that conclusion is incorrect. From what I recall, the commas shouldn't bother Oracle when loading into a NUMBER field. However, if every record had one column like this: "$5,757,194" it won't like the dollar sign and that should be your "non-numeric data". And of course a string won't care.
So first thing to try would be to strip the "$" using Convert("$","",YourField) and see if that works more better.
So first thing to try would be to strip the "$" using Convert("$","",YourField) and see if that works more better.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: