character separator

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dodda
Premium Member
Premium Member
Posts: 244
Joined: Tue May 29, 2007 11:31 am

character separator

Post by dodda »

Hello

I have the below record

"acegfgh";"ghkj;ij12";";dfd"fd";"ghfhdd"gkjdgfd";"dff3##;"

each field is enclosed in double quotes and separated by ; (semi colon)
each field can have extra special characters. i used col import stage to parse the record and wanted to seaparate into individual fileds

under the properties in the col import stage i have mentioned
delimter =;
quote=double

i am not getting the data as desired

i want the data as

col1 col2 col3 col4
acegfgh ghkj;ij12 ;dfd"fd ghfhdd"gkjdgfd

please help
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I think your problem is illustrated by the field
"ghfhdd"gkjdgfd"
In order to be parsed correctly, it would need to be formatted as
"ghfhdd""gkjdgfd"
dodda
Premium Member
Premium Member
Posts: 244
Joined: Tue May 29, 2007 11:31 am

Post by dodda »

Hello

This kind of data is expected from the file. Is there any optioin

thanks
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

No options, since the quote character is used to shift in and shift out of a column; only a double occurrence of the quote character tells the parser that it is part of the data and not a control character.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Couldn't you tell it you have no quotes and then trim only the leading and trailing quotes from the data?
-craig

"You can never have too many knives" -- Logan Nine Fingers
jgreve
Premium Member
Premium Member
Posts: 107
Joined: Mon Sep 25, 2006 4:25 pm

The data seems kind of broken - r.e. no quotes....

Post by jgreve »

chulett wrote:Couldn't you tell it you have no quotes and then trim only the leading and trailing quotes from the data?
Yeah, I thought of that, too.
It would work provided there are no embedded field delims. But if I'm reading the example right, some of the data will break that.
dodda wrote:This kind of data is expected from the file. Is there any optioin
Dodda, let's clean up your example a little to make it easier to see the field boundaries:

original: "acegfgh";"ghkj;ij12";";dfd"fd";"ghfhdd"gkjdgfd";"dff3##;"

Did I get the field boundaries correct?
If so, this is a nasty problem.
I don't think it is machine-solvable.

modified: "A";"B;b";";C"c";"D"d";"E;"
I think they would parse like this:

Code: Select all

  DESIRED-VALUES    QUOTED-VALUES       QUOTES-AS-DATA
|field | value  | |field | value  |   |field | value  |
|------|--------| |------|--------|   |------|--------|
|  1   |  A     | |  1   |  A     |   |  1   | "A"    |
|  2   |  B;b   | |  2   |  B;b   |   |  2   | "B     |
|  3   |  ;C"c  | |  3   |  ;C"*ERR*  |  3   | b"     |
|  4   |  D"d   | |  4   |            |  4   | "      |
|  5   |  E;    | |  5   |            |  5   | C"c"   |
                                      |  6   | "D"d"  |
                                      |  7   | "E     |
                                      |  8   | "      |

QUOTED-VALUES would be parsing fields with delim=; and quote="
This hits a formatting error at *ERR*, and should say
something like: *format error* expected ; after " instead of c*

QUOTES-AS-DATA would be parsing fields just with delim=;
(ignore spaces in modified(2) above)
No format error... but certainly more fields than you want.

Dodda, do you have a constant number of fields per line?
Is it possible that only a small percentage of your lines
have delims & quotes as data?
If so maybe you can pull those out and send the rejects back.
But even then, you won't be able to tell the difference between
false positives and false negatives.

If your data is at all important, I would talk to whoever "expected that kind of data from the file" and ask them to how they would untangle it. If the example you gave is accurate, I will be surprised if they can untangle it.

(This could be one of those times where fixed-width data is a good idea.)
Post Reply