DSXchange

Posted: **Mon Apr 01, 2013 11:00 am**

Hi All,
We have a scenario as below:

Source file:

"1","AAA","BBB","CCC"
"2","AA",A","BBB","CCC"

In the above records the 1st record wil be processed succesfully if we set " and , as delimiter in the sequential file stage but for the second record we have both double quotes(") and comma(,) in the data in 2nd column (highlighted in green), so the record will be dropped. Is there any way to fetch these kind of records through DS job?? I have tried by making either " or , as delimiter, but in both cases the data is truncated or dropped. Kindly requesting ur help fr solving this.

Posted: **Mon Apr 01, 2013 11:58 am**

What happens if you change the delimiter to delimiter string and pass delimiter as "," and quote=double?

Posted: **Mon Apr 01, 2013 1:11 pm**

You'd stand a better chance of reading this in a Server job or failing that, a Server Shared Container in your PX job with that Sequential File stage in it. It is more forgiving of issues of that nature.

Posted: **Mon Apr 01, 2013 1:34 pm**

Line 2 is badly-formed. It has an odd number of quote characters. Demand well-formed data from your provider.

Posted: **Mon Apr 01, 2013 2:35 pm**

One way is to remove all the double quotes in Before-job subroutine and use , as delimiter.

Posted: **Mon Apr 01, 2013 2:43 pm**

No, the comma is part of the data in the second field. Doing that would make the second record parse out as five fields instead of four, I'm afraid.

Posted: **Tue Apr 02, 2013 12:20 am**

Ray,

The is generated from CDC when there is an update or insert in source DB2 table. By default the CDC will create files with comma(,) & double quotes(") as delimiter. Is there any way to change the delimiter in CDC?

Posted: **Tue Apr 02, 2013 12:51 am**

Gotta be. Research it and let us know.

Posted: **Tue Apr 02, 2013 8:32 am**

Some of our data was apparently coming from web forms with free-form text fields that weren't being sanitized. A shell script was used to parse the file and locate extraneous newlines beforehand. The same approach could be used in this case to locate errant quote characters.

Posted: **Thu Apr 04, 2013 4:30 am**

Thanks mobashshar for the information.

We have fixed this issue by changing the delimiter to pipe(|) from CDC. We have created a PMR with IBM and they provided a java program to change the delimiter to pipe while generating files from CDC. Now its working fine.

Thanks for all your help. !!

DSXchange

Delimiter Issue in extracting file

Delimiter Issue in extracting file

@Ray

We used a preprocess shell script to strip unwanted newlines

@ mobashshar