CSV Data File with embedded quote

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
datastagedummy
Participant
Posts: 56
Joined: Thu Feb 13, 2003 6:08 pm
Location: USA

CSV Data File with embedded quote

Post by datastagedummy »

I have a comma separated source file which looks like

"10001257", "SUCCINIMIDE (ORDER ON"B")", "KG","S","B","N","A","Y","N","S","","","I","TJB","TJB","N",3,365.230000000,0.000000000,0.000000000,0.000000000,0.000000000,0.000000000,0.000000000,0.000000000,"0101-R032-12501","P052","","",0.000,"0","1",18.669158000,0.000000000,0.000000000,0.000000000,0.000000000,18.669158000,0.000000000,0.000000000,0.000000000,0.000000000,"20021231",18.669158000,"20021231","C","","","Y","C","Y",11415.410000000,0.000000000,"39","217361","O",54.230000000,"","19800101","SPGS237","20011119","T"

If you notice in the second column the text has an embedded quote in it.

In the Sequential file stage that i read this file I have the format set to Delimiter=, (comma) and Quote character = " (quote)

But when I run my job I get the following error

Is there a way to collect this kind of invalid data into a reject file rather than the whole job aborting with a fatal error.

Thanks to all of you
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What you have is an invalidly-formatted CSV file.

Scroll to the right in your Sequential File stage, and you will discover some extra columns that allow you to specify the rules for handling incomplete rows and for rows containing extraneous data.

Rather than have me type it all here, click on Help and read all about it there. [8D]

You can generate information from the Sequential File stage that causes a subsequent Transformer stage to redirect this row wherever you will.


Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
datastagedummy
Participant
Posts: 56
Joined: Thu Feb 13, 2003 6:08 pm
Location: USA

Post by datastagedummy »

I did try changing the Incomplete Columns to "Discard and Warn" but it keeps telling me that "The Incomplete Column col2 must have values in ascending order"
I cannot set it to anything other than the "Error"

Like you suggested where would I put the rule to trap my type of errors as I know that it's only the 2nd column in the file that could give me the problem.

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

These rules form a hierarchy; you can't have a higher-level rule (such as Error) below a lower-level rule (such as Discard). This is all explained in the Help for the SequentialFile stage.
Once you have that right, you can then put your rule in a following Transformer stage.
datastagedummy
Participant
Posts: 56
Joined: Thu Feb 13, 2003 6:08 pm
Location: USA

Post by datastagedummy »

Thanks Manjula but that wouldn't help as a "000" for quote character brings in the " as a text and it crashes if I have , (comma) in the columns.

Anybody else has any ideas ?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Do you have any way to either preprocess the file or have the source correct the improper quoting?

Turning off the quotes should work fine, you would just strip them out in a transform. Unless, as you stated, there was a comma in a quoted string - then you need the quotes so it can tell the difference between field separators and embedded commas. Can't have it both ways. Bottom line seems to be a need to fix the data before it can be properly read.

On your source system, the (ORDER ON"B") should really be (ORDER ON'B') if you are going to dump it to CSV. Either that or quote the embedded quotes - replace each " with "" and I think you'll be ok then. It would take some playing around to get a good solution, what platform are you running on?

Also, you never mentioned what your original error was - probably had what the board thought was tags around it and it ate it. I don't believe your job should have fatally aborted, it should have logged some warnings and not processed the bad rows but only aborted if the warning count exceeded your threshold. What was your fatal error?

-craig
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

To clarify:

Switch off the quote character, by setting it to 000.
All your quotes characters will now be read as part of the data. You may need to fiddle with data types and so on, but there's only one data type ("text") in a text file anyway.

To strip leading and trailing quote characters use the extended form of the Trim function in column derivation expressions in a Transformer stage. For example:
Trim(InLink.ColumnName, '"', 'B')
You may like to store this as a Transform (maybe called StripQuotes), since you'll be using it over and over.
datastagedummy
Participant
Posts: 56
Joined: Thu Feb 13, 2003 6:08 pm
Location: USA

Post by datastagedummy »

Does anybody knows of any document which speaks about the risks of working with different Source System specially CSV.

Would appreciate if you could tell me the Link to that
Post Reply