Hello
I have the below record
"acegfgh";"ghkj;ij12";";dfd"fd";"ghfhdd"gkjdgfd";"dff3##;"
each field is enclosed in double quotes and separated by ; (semi colon)
each field can have extra special characters. i used col import stage to parse the record and wanted to seaparate into individual fileds
under the properties in the col import stage i have mentioned
delimter =;
quote=double
i am not getting the data as desired
i want the data as
col1 col2 col3 col4
acegfgh ghkj;ij12 ;dfd"fd ghfhdd"gkjdgfd
please help
character separator
Moderators: chulett, rschirm, roy
I think your problem is illustrated by the field
In order to be parsed correctly, it would need to be formatted as"ghfhdd"gkjdgfd"
"ghfhdd""gkjdgfd"
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
No options, since the quote character is used to shift in and shift out of a column; only a double occurrence of the quote character tells the parser that it is part of the data and not a control character.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
The data seems kind of broken - r.e. no quotes....
Yeah, I thought of that, too.chulett wrote:Couldn't you tell it you have no quotes and then trim only the leading and trailing quotes from the data?
It would work provided there are no embedded field delims. But if I'm reading the example right, some of the data will break that.
Dodda, let's clean up your example a little to make it easier to see the field boundaries:dodda wrote:This kind of data is expected from the file. Is there any optioin
original: "acegfgh";"ghkj;ij12";";dfd"fd";"ghfhdd"gkjdgfd";"dff3##;"
Did I get the field boundaries correct?
If so, this is a nasty problem.
I don't think it is machine-solvable.
modified: "A";"B;b";";C"c";"D"d";"E;"
I think they would parse like this:
Code: Select all
DESIRED-VALUES QUOTED-VALUES QUOTES-AS-DATA
|field | value | |field | value | |field | value |
|------|--------| |------|--------| |------|--------|
| 1 | A | | 1 | A | | 1 | "A" |
| 2 | B;b | | 2 | B;b | | 2 | "B |
| 3 | ;C"c | | 3 | ;C"*ERR* | 3 | b" |
| 4 | D"d | | 4 | | 4 | " |
| 5 | E; | | 5 | | 5 | C"c" |
| 6 | "D"d" |
| 7 | "E |
| 8 | " |
QUOTED-VALUES would be parsing fields with delim=; and quote="
This hits a formatting error at *ERR*, and should say
something like: *format error* expected ; after " instead of c*
QUOTES-AS-DATA would be parsing fields just with delim=;
(ignore spaces in modified(2) above)
No format error... but certainly more fields than you want.
Dodda, do you have a constant number of fields per line?
Is it possible that only a small percentage of your lines
have delims & quotes as data?
If so maybe you can pull those out and send the rejects back.
But even then, you won't be able to tell the difference between
false positives and false negatives.
If your data is at all important, I would talk to whoever "expected that kind of data from the file" and ask them to how they would untangle it. If the example you gave is accurate, I will be surprised if they can untangle it.
(This could be one of those times where fixed-width data is a good idea.)