confounding problem with QUOTES in schema file

hsahay · Post by **hsahay** » Fri Dec 13, 2013 12:02 pm

Hi

I am at the end of my wits trying to figure this out. So Help me please.

I have a simple variable length text file that uses pipe (|) as field delimiter and double quotes(") as text qualifier.

Here is a sample record -

"DMX MUSIC"|"Ch 01 S"|"OVERTURE to DER FREISCHUTZ (1817-21)"|"KUHN/STAATSKAPELLE DRESDEN/WEBER"|"CARL MARIA VON WEBER: OUVERTUREN"|"cpo"|"10052 1985"|"cpo RECORDS"|"07/01/2012"|"00:00:00"|"cpo RECORDS"|""|"0018950"

I am using the below schema file to read the text file and my job design is simple -

seq.filestage >>>>>>> odbc_connector stage.

//EO DB DMX detail table load
//Creatd On : 12/2013
//Creatd By : EJF
record nullable
{intact, record_delim='\n', final_delim=end,delim='|', quote=none}
(ServiceName:string[max=20];
Channel:string[max=20];
Title:string[max=2000];
Artist:string[max=2000];
Album:string[max=2000];
JunkRecordLableCode:string[max=20];
SuppCatalogID:string[max=200];
JunkRecordLableName:string[max=2000];
JunkPerfSttDt:string[max=20];
Duration:string[max=20];
LableName:string[max=2000];
LableParent:string[max=2000];
PlaylistFrequency:string[max=200]

Note that i am using QUOTE = NONE in the schema file.

If i do that it works and the records are parsed correctly however, as expected, the leading and trailing double quotes become part of the data that is loaded in the table. Which is not what i want. I obviously want the data loaded in the table to be free of those infernal double quotes.

So, i change the QUOTE property in the schema file to quotes=double

But now the sequential file stage starts rejecting the row and seems to be unable to parse it.

I replaced all double quotes in my input file with the caret (^) and changed the schema file to say quotes='^' - same result. Record is rejected.

So the question is what do i need to do, what gods i need to pray to, what animal i need to sacrifice in order for me to specify a QUOTE in the schema file ?

hsahay · Post by **hsahay** » Fri Dec 13, 2013 1:26 pm

ugh ...

Removed INTACT from the schema file and problem solved ....
Marking the thread as resolved.

If anybody has any idea why that solved the problem let me know. The documentation on schema file is very sparse especially on variable length files.

chulett · Post by **chulett** » Fri Dec 13, 2013 4:14 pm

You only use the INTACT keyword for partial schemas. Changing quotes to "double" is what fixed the problem in your subject line, removing the "intact" property is what let the fields be parsed properly. How did you end up with that option in the schema file?

hsahay · Post by **hsahay** » Fri Dec 13, 2013 4:24 pm

Chulette

quote=double and quote='^' both work now that i have removed INTACT.

With INTACT i was getting the following error message (see bold) -

^DMX MUSIC^|^Ch 01 S^|^OVERTURE to DER FREISCHUTZ (1817-21)^|^KUHN/STAATSKAPELLE DRESDEN/WEBER^|^CARL MARIA VON WEBER: OUVERTUREN^|^cpo^|^10052 1985^|^cpo RECORDS^|^07/01/2012^|^00:00:00^|^cpo RECORDS^|^^|^0018950^,Field "_r_" with 'delim=end' did not consume entire input, at offset: 11

I put intact there because we will not be using all the fields in the file (those prefixed with "Junk" in the schema file.

I was trying different options trying to understand how schema files work. I am still not quite clear about how the partial schemas are defined. Some examples will be nice but they are hard to come by.

But for now i am good with what i have working.