Page 1 of 1

Delimiter Issue in extracting file

Posted: Mon Apr 01, 2013 11:00 am
by nveejas
Hi All,
We have a scenario as below:

Source file:

"1","AAA","BBB","CCC"
"2","AA",A","BBB","CCC"

In the above records the 1st record wil be processed succesfully if we set " and , as delimiter in the sequential file stage but for the second record we have both double quotes(") and comma(,) in the data in 2nd column (highlighted in green), so the record will be dropped. Is there any way to fetch these kind of records through DS job?? I have tried by making either " or , as delimiter, but in both cases the data is truncated or dropped. Kindly requesting ur help fr solving this.

Posted: Mon Apr 01, 2013 11:58 am
by priyadarshikunal
What happens if you change the delimiter to delimiter string and pass delimiter as "," and quote=double?

Posted: Mon Apr 01, 2013 1:11 pm
by chulett
You'd stand a better chance of reading this in a Server job or failing that, a Server Shared Container in your PX job with that Sequential File stage in it. It is more forgiving of issues of that nature.

Posted: Mon Apr 01, 2013 1:34 pm
by ray.wurlod
Line 2 is badly-formed. It has an odd number of quote characters. Demand well-formed data from your provider.

Posted: Mon Apr 01, 2013 2:35 pm
by mobashshar
One way is to remove all the double quotes in Before-job subroutine and use , as delimiter.

Posted: Mon Apr 01, 2013 2:43 pm
by chulett
No, the comma is part of the data in the second field. Doing that would make the second record parse out as five fields instead of four, I'm afraid.

@Ray

Posted: Tue Apr 02, 2013 12:20 am
by nveejas
Ray,

The is generated from CDC when there is an update or insert in source DB2 table. By default the CDC will create files with comma(,) & double quotes(") as delimiter. Is there any way to change the delimiter in CDC?

Posted: Tue Apr 02, 2013 12:51 am
by ray.wurlod
Gotta be. Research it and let us know.

We used a preprocess shell script to strip unwanted newlines

Posted: Tue Apr 02, 2013 8:32 am
by wwilliamson
Some of our data was apparently coming from web forms with free-form text fields that weren't being sanitized. A shell script was used to parse the file and locate extraneous newlines beforehand. The same approach could be used in this case to locate errant quote characters.

@ mobashshar

Posted: Thu Apr 04, 2013 4:30 am
by nveejas
Thanks mobashshar for the information.

We have fixed this issue by changing the delimiter to pipe(|) from CDC. We have created a PMR with IBM and they provided a java program to change the delimiter to pipe while generating files from CDC. Now its working fine.

Thanks for all your help. !!