Rogue quotation mark appearing mid-file

jackdaw · Post by **jackdaw** » Tue May 20, 2008 6:26 am

The DS job runs successfully, with no warnings to the log, but when I view one of the output (csv) files in DS I get mrg_all_chips_set_1..noDupes_tmds_spi_file.olk_tmdsspi_noDupes_pif_file: read_delimited() - invalid quotes, row 14874 column rfaind = "N"".

Sure enough in the output csv this is true. But this isn't present in the input csv, nor in another output with the same rows plus some duplicates, which has the same file format and line end characters.

What's going on ??

chulett · Post by **chulett** » Tue May 20, 2008 6:32 am

Hmmm... that's near to impossible to say without eyes on the target. Or more details.

What's different about this link from the other output that doesn't have the issue? What kind of transformations are you doing to this field, any? Do other records populate rfaind with a solitary 'N' without issue? It might also help to describe your job design.

ArndW · Post by **ArndW** » Tue May 20, 2008 6:33 am

What does the input row l14784 ook like - does it have doubled double quotes correctly represented? Do you specify the double-quote as the quote character for both input and output?

jackdaw · Post by **jackdaw** » Tue May 20, 2008 6:53 am

It shows (in Textpad - it's a csv file) as:

"N""

when it should be

"N"

All other values output for this column are "N"

The double quote character is specified on the input file and output files.

They all have DOS-style line termination (it's the terminal column that's where the problem occurs), but when I change them to UNIX style it makes no difference.

I can't view it in DS because of the error.

Any thoughts ?

ArndW wrote:What does the input row l14784 ook like - does it have doubled double quotes correctly represented? Do you specify the double-quote as the quote character for both input and output?

jackdaw · Post by **jackdaw** » Tue May 20, 2008 7:09 am

Thanks.

The difference is in the constraint - one has duplicates and the other doesn't.

The duplicates are identified by using stage variables to compare the record key of the previous row, and if different to set a value as "NoDupe" or "Dupe". The rows with "NoDupe" are written to the file which has the error (B).

The other (successful (A)) file is constrained differently to have all rows including duplicates.

The constraints are: A:

Code: Select all

upcase(slk_trf_final_rules.pif) <> "PIF"

B:

Code: Select all

svNewPif="NoDupe" and  upcase(slk_trf_final_rules.pif) <> "PIF"

Why would it occur on one row only ?

Puzzled.

chulett wrote:Hmmm... that's near to impossible to say without eyes on the target. Or more details.

What's different about this link from the other output that doesn't have the issue? What kind of transformations are you doing to this field, any? Do other records populate rfaind with a solitary 'N' without issue? It might also help to describe your job design.

ArndW · Post by **ArndW** » Tue May 20, 2008 7:14 am

Is [quote]"N""[quote] value represented that way in your source or the target. If the former, then you don't have a well-formed input file and cannot process it correctly in CSV varying length format. If in the output, then you have exposed a bug in the sequential file write stage. In either case, the representation should be [quote]"N"""[quote]

jackdaw · Post by **jackdaw** » Tue May 20, 2008 7:32 am

Thanks

The source is "N", and the target is "N"", for one row, mid file.

Bizarre ?! What next ?

ArndW wrote:Is
"N""
value represented that way in your source or the target. If the former, then you don't have a well-formed input file and cannot process it correctly in CSV varying length format. If in the output, then you have exposed a bug in the sequential file write stage. In either case, the representation should be