reading from sequential file

pavan_test · Post by **pavan_test** » Thu Aug 21, 2008 2:31 pm

Hi All,

I am trying to read a file through sequential stage. The file is a pipe delimited file. with quote double charcater.

my data is coming in the following requirment.
If an attribute value must contain both a pipe and double quotes then the entire attribute should be enclosed in double quotes and each double quote that is a part of the value should be prefixed with another double quote.

can anyone please suggest me how can i accomplish reading such an file from sequential stage.

Regards
Mark

bolingo · Post by **bolingo** » Thu Aug 21, 2008 2:46 pm

Could you retrieve here one ligne from your file for example

pavan_test · Post by **pavan_test** » Thu Aug 21, 2008 2:55 pm

bolingo · Post by **bolingo** » Thu Aug 21, 2008 3:17 pm

You could use fixed length if your datas have the same length

XXXX
413862
ZZXXT|05157
ZZXX1|GN""FK""130183333
062120

pavan_test · Post by **pavan_test** » Thu Aug 21, 2008 3:23 pm

It is a variable length record i am receiving from my client. is there any way to remove those embedded double quotes and read the entire file.

Thanks
Mark

ray.wurlod · Post by **ray.wurlod** » Thu Aug 21, 2008 3:38 pm

Parallel jobs don't handle this. You could solve it by pre-processing the file (for example using sed or awk command) to convert the double double-quote characters to something else, then converting these back to single double-quote character within the job.

pavan_test · Post by **pavan_test** » Thu Aug 21, 2008 7:24 pm

[quote="ray.wurlod"]Parallel jobs don't handle this. You could solve it by pre-processing the file (for example using [b]sed [/b]or [b]awk [/b]command) to convert the double double-quote characters to something else, th ...[/quote]

can anyone please suggest me how do i accomplish this with awk or sed.

thanks a lot in advance.

Thanks
Mark

bolingo · Post by **bolingo** » Fri Aug 22, 2008 8:46 am

if all your 5 fields have the same length in all records;
you could process like this:

you create a schema file like this:
<b>

record
{record_delim='\r', record_length=fixed, delim=none}
(
A:nullable string[4] {width=4};
B:nullable string[1] {width=1};
C:nullable string[6] {width=6};
D:nullable string[1] {width=1};
E:nullable string[13] {width=13};
F:nullable string[1] {width=1};
G:nullable string[25] {width=25};
H:nullable string[1] {width=1};
I:nullable string[6] {width=6}
)

</b>
and use it in a sequental stage, in the next stage (for example transformer stage), you retrieve only the fields A, C, E, G and I;
I think, this will work;

bolingo · Post by **bolingo** » Fri Aug 22, 2008 9:30 am

pavan_test wrote:
ray.wurlod wrote:Parallel jobs don't handle this. You could solve it by pre-processing the file (for example using sed or awk command) to convert the double double-quote characters to something else, th ...
can anyone please suggest me how do i accomplish this with awk or sed.

thanks a lot in advance.

Thanks
Mark

DSXchange