Page 1 of 1
reading from sequential file
Posted: Thu Aug 21, 2008 2:31 pm
by pavan_test
Hi All,
I am trying to read a file through sequential stage. The file is a pipe delimited file. with quote double charcater.
my data is coming in the following requirment.
If an attribute value must contain both a pipe and double quotes then the entire attribute should be enclosed in double quotes and each double quote that is a part of the value should be prefixed with another double quote.
can anyone please suggest me how can i accomplish reading such an file from sequential stage.
Regards
Mark
Re: reading from sequential file
Posted: Thu Aug 21, 2008 2:46 pm
by bolingo
Could you retrieve here one ligne from your file for example
Re: reading from sequential file
Posted: Thu Aug 21, 2008 2:55 pm
by pavan_test
here it is ;
pipe delimited file.
if any data for a column has embdded pipe in it then the entire column will be in double quotes such as |"ZZXXT|05157"| However this is not my problem.
The records coming like this are causing the problem.
XXXX|413862|"ZZXXT|05157"|"ZZXX1|GN""FK""130183333"|062120|
this is data in 1 column: |"ZZXX1|GN""FK""130183333"|
Thanks
Mark
Re: reading from sequential file
Posted: Thu Aug 21, 2008 3:17 pm
by bolingo
You could use fixed length if your datas have the same length
XXXX
413862
ZZXXT|05157
ZZXX1|GN""FK""130183333
062120
Re: reading from sequential file
Posted: Thu Aug 21, 2008 3:23 pm
by pavan_test
It is a variable length record i am receiving from my client. is there any way to remove those embedded double quotes and read the entire file.
Thanks
Mark
Posted: Thu Aug 21, 2008 3:38 pm
by ray.wurlod
Parallel jobs don't handle this. You could solve it by pre-processing the file (for example using sed or awk command) to convert the double double-quote characters to something else, then converting these back to single double-quote character within the job.
reading from sequential file
Posted: Thu Aug 21, 2008 7:24 pm
by pavan_test
[quote="ray.wurlod"]Parallel jobs don't handle this. You could solve it by pre-processing the file (for example using [b]sed [/b]or [b]awk [/b]command) to convert the double double-quote characters to something else, th ...[/quote]
can anyone please suggest me how do i accomplish this with awk or sed.
thanks a lot in advance.
Thanks
Mark
Re: reading from sequential file
Posted: Fri Aug 22, 2008 8:46 am
by bolingo
if all your 5 fields have the same length in all records;
you could process like this:
you create a schema file like this:
<b>
record
{record_delim='\r', record_length=fixed, delim=none}
(
A:nullable string[4] {width=4};
B:nullable string[1] {width=1};
C:nullable string[6] {width=6};
D:nullable string[1] {width=1};
E:nullable string[13] {width=13};
F:nullable string[1] {width=1};
G:nullable string[25] {width=25};
H:nullable string[1] {width=1};
I:nullable string[6] {width=6}
)
</b>
and use it in a sequental stage, in the next stage (for example transformer stage), you retrieve only the fields A, C, E, G and I;
I think, this will work;
Re: reading from sequential file
Posted: Fri Aug 22, 2008 9:30 am
by bolingo
pavan_test wrote:ray.wurlod wrote:Parallel jobs don't handle this. You could solve it by pre-processing the file (for example using sed or awk command) to convert the double double-quote characters to something else, th ...
can anyone please suggest me how do i accomplish this with awk or sed.
thanks a lot in advance.
Thanks
Mark
To substitute " to # by sed;
use this command:
$ sed s/'"'/'#'/g yourfile.txt >newfile.txt
if the content of yourfile.txt is :
XXXX|413862|"ZZXXT|05157"|"ZZXX1|GN""FK""130183333"|062120|
the content of newfile.txt will be:
XXXX|413862|#ZZXXT|05157#|#ZZXX1|GN##FK##130183333#|062120|
This will fix your issue