Page 1 of 1

reading from sequential file

Posted: Thu Aug 21, 2008 2:31 pm
by pavan_test
Hi All,

I am trying to read a file through sequential stage. The file is a pipe delimited file. with quote double charcater.

my data is coming in the following requirment.
If an attribute value must contain both a pipe and double quotes then the entire attribute should be enclosed in double quotes and each double quote that is a part of the value should be prefixed with another double quote.

can anyone please suggest me how can i accomplish reading such an file from sequential stage.

Regards
Mark

Re: reading from sequential file

Posted: Thu Aug 21, 2008 2:46 pm
by bolingo
Could you retrieve here one ligne from your file for example

Re: reading from sequential file

Posted: Thu Aug 21, 2008 2:55 pm
by pavan_test
here it is ;

pipe delimited file.
if any data for a column has embdded pipe in it then the entire column will be in double quotes such as |"ZZXXT|05157"| However this is not my problem.

The records coming like this are causing the problem.

XXXX|413862|"ZZXXT|05157"|"ZZXX1|GN""FK""130183333"|062120|

this is data in 1 column: |"ZZXX1|GN""FK""130183333"|

Thanks
Mark

Re: reading from sequential file

Posted: Thu Aug 21, 2008 3:17 pm
by bolingo
You could use fixed length if your datas have the same length

XXXX
413862
ZZXXT|05157
ZZXX1|GN""FK""130183333
062120

Re: reading from sequential file

Posted: Thu Aug 21, 2008 3:23 pm
by pavan_test
It is a variable length record i am receiving from my client. is there any way to remove those embedded double quotes and read the entire file.

Thanks
Mark

Posted: Thu Aug 21, 2008 3:38 pm
by ray.wurlod
Parallel jobs don't handle this. You could solve it by pre-processing the file (for example using sed or awk command) to convert the double double-quote characters to something else, then converting these back to single double-quote character within the job.

reading from sequential file

Posted: Thu Aug 21, 2008 7:24 pm
by pavan_test
[quote="ray.wurlod"]Parallel jobs don't handle this. You could solve it by pre-processing the file (for example using [b]sed [/b]or [b]awk [/b]command) to convert the double double-quote characters to something else, th ...[/quote]

can anyone please suggest me how do i accomplish this with awk or sed.

thanks a lot in advance.

Thanks
Mark

Re: reading from sequential file

Posted: Fri Aug 22, 2008 8:46 am
by bolingo
if all your 5 fields have the same length in all records;
you could process like this:

you create a schema file like this:
<b>

record
{record_delim='\r', record_length=fixed, delim=none}
(
A:nullable string[4] {width=4};
B:nullable string[1] {width=1};
C:nullable string[6] {width=6};
D:nullable string[1] {width=1};
E:nullable string[13] {width=13};
F:nullable string[1] {width=1};
G:nullable string[25] {width=25};
H:nullable string[1] {width=1};
I:nullable string[6] {width=6}
)

</b>
and use it in a sequental stage, in the next stage (for example transformer stage), you retrieve only the fields A, C, E, G and I;
I think, this will work;

Re: reading from sequential file

Posted: Fri Aug 22, 2008 9:30 am
by bolingo
pavan_test wrote:
ray.wurlod wrote:Parallel jobs don't handle this. You could solve it by pre-processing the file (for example using sed or awk command) to convert the double double-quote characters to something else, th ...
can anyone please suggest me how do i accomplish this with awk or sed.

thanks a lot in advance.

Thanks
Mark
To substitute " to # by sed;
use this command:

$ sed s/'"'/'#'/g yourfile.txt >newfile.txt

if the content of yourfile.txt is :
XXXX|413862|"ZZXXT|05157"|"ZZXX1|GN""FK""130183333"|062120|
the content of newfile.txt will be:
XXXX|413862|#ZZXXT|05157#|#ZZXX1|GN##FK##130183333#|062120|

This will fix your issue