character separator

dodda · Post by **dodda** » Thu Aug 13, 2009 10:58 am

Hello

I have the below record

"acegfgh";"ghkj;ij12";";dfd"fd";"ghfhdd"gkjdgfd";"dff3##;"

each field is enclosed in double quotes and separated by ; (semi colon)
each field can have extra special characters. i used col import stage to parse the record and wanted to seaparate into individual fileds

under the properties in the col import stage i have mentioned
delimter =;
quote=double

i am not getting the data as desired

i want the data as

col1 col2 col3 col4
acegfgh ghkj;ij12 ;dfd"fd ghfhdd"gkjdgfd

please help

ArndW · Post by **ArndW** » Thu Aug 13, 2009 11:12 am

I think your problem is illustrated by the field

"ghfhdd"gkjdgfd"

In order to be parsed correctly, it would need to be formatted as

"ghfhdd""gkjdgfd"

dodda · Post by **dodda** » Thu Aug 13, 2009 11:19 am

Hello

This kind of data is expected from the file. Is there any optioin

thanks

ArndW · Post by **ArndW** » Thu Aug 13, 2009 11:21 am

No options, since the quote character is used to shift in and shift out of a column; only a double occurrence of the quote character tells the parser that it is part of the data and not a control character.

chulett · Post by **chulett** » Thu Aug 13, 2009 12:50 pm

Couldn't you tell it you have no quotes and then trim only the leading and trailing quotes from the data?

jgreve · Post by **jgreve** » Thu Aug 13, 2009 1:21 pm

chulett wrote:Couldn't you tell it you have no quotes and then trim only the leading and trailing quotes from the data?

Yeah, I thought of that, too.
It would work provided there are no embedded field delims. But if I'm reading the example right, some of the data will break that.

dodda wrote:This kind of data is expected from the file. Is there any optioin

Dodda, let's clean up your example a little to make it easier to see the field boundaries:

original: "acegfgh";"ghkj;ij12";";dfd"fd";"ghfhdd"gkjdgfd";"dff3##;"

Did I get the field boundaries correct?
If so, this is a nasty problem.
I don't think it is machine-solvable.

modified: "A";"B;b";";C"c";"D"d";"E;"
I think they would parse like this:

Code: Select all

  DESIRED-VALUES    QUOTED-VALUES       QUOTES-AS-DATA
|field | value  | |field | value  |   |field | value  |
|------|--------| |------|--------|   |------|--------|
|  1   |  A     | |  1   |  A     |   |  1   | "A"    |
|  2   |  B;b   | |  2   |  B;b   |   |  2   | "B     |
|  3   |  ;C"c  | |  3   |  ;C"*ERR*  |  3   | b"     |
|  4   |  D"d   | |  4   |            |  4   | "      |
|  5   |  E;    | |  5   |            |  5   | C"c"   |
                                      |  6   | "D"d"  |
                                      |  7   | "E     |
                                      |  8   | "      |

QUOTED-VALUES would be parsing fields with delim=; and quote="
This hits a formatting error at *ERR*, and should say
something like: *format error* expected ; after " instead of c*

QUOTES-AS-DATA would be parsing fields just with delim=;
(ignore spaces in modified(2) above)
No format error... but certainly more fields than you want.

Dodda, do you have a constant number of fields per line?
Is it possible that only a small percentage of your lines
have delims & quotes as data?
If so maybe you can pull those out and send the rejects back.
But even then, you won't be able to tell the difference between
false positives and false negatives.

If your data is at all important, I would talk to whoever "expected that kind of data from the file" and ask them to how they would untangle it. If the example you gave is accurate, I will be surprised if they can untangle it.

(This could be one of those times where fixed-width data is a good idea.)

DSXchange

character separator

character separator

The data seems kind of broken - r.e. no quotes....