Parsing 'Comma' delimited file containing commas in data

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
yaminids
Premium Member
Premium Member
Posts: 387
Joined: Mon Oct 18, 2004 1:04 pm

Parsing 'Comma' delimited file containing commas in data

Post by yaminids »

Friends,

I am trying to design a job which populates a staging table with the data present in a Sequential file. The design itself is very simple

File ==> Transformer ==> ODBC

The file we receive is comma delimited. The problem is that the data in the file contains commas (3,890,454)

After defining the meta data in the Sequential file stage, I can view the data ("View Data") but when I execute the job it fails with "Too many columns in the file" error

Now I'm confused. The fact that I can view the data in the source file means that the stage is able to parse the data.

Am I doing something wrong which is causing the job to fail

Thanks
Yamini
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

PX is notorious for having problems with commas in the data of a csv file, even if the fields are quote delimited. You may find you'll need to use a Server job for this task as it can handle them just fine... assuming they are properly quoted: xxx,"3,890,454",xxx,
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Whoops, right when I clicked on Submit I see we are talking about a Server job, so... I would wager your fields are not properly quote delimited and you in fact have an invalid csv file. Can you post an actual example or two so we can see?
-craig

"You can never have too many knives" -- Logan Nine Fingers
yaminids
Premium Member
Premium Member
Posts: 387
Joined: Mon Oct 18, 2004 1:04 pm

Post by yaminids »

Craig,

The data looks like this in the csv file
"Jul 31, 2009","956,611","15,907,101","2,577,527","794,461","21,736","696,030","85,138","93,019","$5,757,194","41,808","275,639"

I have chosen , (comma) as the delimiter and (") as the Quote character. Again, I'm able to view the data in the source file but after executing the job, the data is all distorted in the destination

Thanks
Yamini
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

OK, so which is it?

when I execute the job it fails with "Too many columns in the file" error
-or-
after executing the job, the data is all distorted in the destination

You shouldn't have any problems reading that, which the 'View Data' would show, especially if you can scroll through the entire contents of it without issue. So, any issue with the data being "all distorted" would probably come from how you are handling the data in the job.

What datatype did you use for the columns in the Sequential File stage? Give us details of what the job is doing, transformation wise, and an example or three of what "distorted" means.
-craig

"You can never have too many knives" -- Logan Nine Fingers
yaminids
Premium Member
Premium Member
Posts: 387
Joined: Mon Oct 18, 2004 1:04 pm

Post by yaminids »

Craig,

Sorry for the confusion. The following are the results of the latest execution

Scenario 1) Columns in the destination table were defined as 'Numeric'
Result: All the input rows were rejected with "Non-numeric data was found where numeric required" error

Scenario 2) Modified the datatype in the destination table to 'Varchar2'
Result: All the input rows were successfully loaded into the table

Conclusion (please correct me if I am wrong): Datatype of the columns in the destination table should be 'Varchar' if the input data has 'commas' in the data

Thanks
Yamini

P.S. The datatype of the column defined in DataStage doesn't seem to matter as along as the column has correct datatype defined on the db
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

No, that conclusion is incorrect. From what I recall, the commas shouldn't bother Oracle when loading into a NUMBER field. However, if every record had one column like this: "$5,757,194" it won't like the dollar sign and that should be your "non-numeric data". And of course a string won't care.

So first thing to try would be to strip the "$" using Convert("$","",YourField) and see if that works more better.
-craig

"You can never have too many knives" -- Logan Nine Fingers
yaminids
Premium Member
Premium Member
Posts: 387
Joined: Mon Oct 18, 2004 1:04 pm

Post by yaminids »

It didn't work :( . I even tried removing the commas using Trim function
yaminids
Premium Member
Premium Member
Posts: 387
Joined: Mon Oct 18, 2004 1:04 pm

Post by yaminids »

It worked after I made some changes. Apparently we cannot store commas in a numeric column in Oracle. I used convert function to strip the commas as well as '$'
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

They certainly wouldn't be stored but I honestly thought it would ignore them during the process. If I get some spare time I'll double-check from my end.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The "MD" conversion with Iconv() will remove currency symbols AND thousands delimiters.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply