reading from sequential file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pavan_test
Premium Member
Premium Member
Posts: 263
Joined: Fri Sep 23, 2005 6:49 am

reading from sequential file

Post by pavan_test »

Hi All,

I am trying to read a file through sequential stage. The file is a pipe delimited file. with quote double charcater.

my data is coming in the following requirment.
If an attribute value must contain both a pipe and double quotes then the entire attribute should be enclosed in double quotes and each double quote that is a part of the value should be prefixed with another double quote.

can anyone please suggest me how can i accomplish reading such an file from sequential stage.

Regards
Mark
bolingo
Premium Member
Premium Member
Posts: 22
Joined: Fri Nov 24, 2006 5:19 am

Re: reading from sequential file

Post by bolingo »

Could you retrieve here one ligne from your file for example
pavan_test
Premium Member
Premium Member
Posts: 263
Joined: Fri Sep 23, 2005 6:49 am

Re: reading from sequential file

Post by pavan_test »

here it is ;

pipe delimited file.
if any data for a column has embdded pipe in it then the entire column will be in double quotes such as |"ZZXXT|05157"| However this is not my problem.

The records coming like this are causing the problem.

XXXX|413862|"ZZXXT|05157"|"ZZXX1|GN""FK""130183333"|062120|

this is data in 1 column: |"ZZXX1|GN""FK""130183333"|

Thanks
Mark
bolingo
Premium Member
Premium Member
Posts: 22
Joined: Fri Nov 24, 2006 5:19 am

Re: reading from sequential file

Post by bolingo »

You could use fixed length if your datas have the same length

XXXX
413862
ZZXXT|05157
ZZXX1|GN""FK""130183333
062120
pavan_test
Premium Member
Premium Member
Posts: 263
Joined: Fri Sep 23, 2005 6:49 am

Re: reading from sequential file

Post by pavan_test »

It is a variable length record i am receiving from my client. is there any way to remove those embedded double quotes and read the entire file.

Thanks
Mark
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Parallel jobs don't handle this. You could solve it by pre-processing the file (for example using sed or awk command) to convert the double double-quote characters to something else, then converting these back to single double-quote character within the job.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pavan_test
Premium Member
Premium Member
Posts: 263
Joined: Fri Sep 23, 2005 6:49 am

reading from sequential file

Post by pavan_test »

[quote="ray.wurlod"]Parallel jobs don't handle this. You could solve it by pre-processing the file (for example using [b]sed [/b]or [b]awk [/b]command) to convert the double double-quote characters to something else, th ...[/quote]

can anyone please suggest me how do i accomplish this with awk or sed.

thanks a lot in advance.

Thanks
Mark
bolingo
Premium Member
Premium Member
Posts: 22
Joined: Fri Nov 24, 2006 5:19 am

Re: reading from sequential file

Post by bolingo »

if all your 5 fields have the same length in all records;
you could process like this:

you create a schema file like this:
<b>

record
{record_delim='\r', record_length=fixed, delim=none}
(
A:nullable string[4] {width=4};
B:nullable string[1] {width=1};
C:nullable string[6] {width=6};
D:nullable string[1] {width=1};
E:nullable string[13] {width=13};
F:nullable string[1] {width=1};
G:nullable string[25] {width=25};
H:nullable string[1] {width=1};
I:nullable string[6] {width=6}
)

</b>
and use it in a sequental stage, in the next stage (for example transformer stage), you retrieve only the fields A, C, E, G and I;
I think, this will work;
bolingo
Premium Member
Premium Member
Posts: 22
Joined: Fri Nov 24, 2006 5:19 am

Re: reading from sequential file

Post by bolingo »

pavan_test wrote:
ray.wurlod wrote:Parallel jobs don't handle this. You could solve it by pre-processing the file (for example using sed or awk command) to convert the double double-quote characters to something else, th ...
can anyone please suggest me how do i accomplish this with awk or sed.

thanks a lot in advance.

Thanks
Mark
To substitute " to # by sed;
use this command:

$ sed s/'"'/'#'/g yourfile.txt >newfile.txt

if the content of yourfile.txt is :
XXXX|413862|"ZZXXT|05157"|"ZZXX1|GN""FK""130183333"|062120|
the content of newfile.txt will be:
XXXX|413862|#ZZXXT|05157#|#ZZXX1|GN##FK##130183333#|062120|

This will fix your issue
Post Reply