Reading from Variable length text file

Gen1715 · Post by **Gen1715** » Fri Jan 04, 2013 4:31 am

Say I have a variable length file -

NA,ASSESSMENT,197436,860,2012-09-18 16:05:47,195652,ASSESSMENT,OPRTNL,PENDING,2011-12-31,Beth,Norris,XZFXB1,,12,,,,RTNG_DLR,,109517
NA,ASSESSMENT,218152,860,,193893,ASSESSMENT,OPRTNL,APPROVED,2011-09-30,Wing
NA,ASSESSMENT,220790,860,2012-09-20 04:32:23,195809,,OPRTNL,PENDING,2011-12-31,Winnie,Jin,109604
NA,ASSESSMENT,222046,860,2012-09-05 22:35:20,193902,ASSESSMENT,,APPROVED,2012-03-31,Sherry,Feng,cz7mdr,Overriding with the group ,3,,,,RTNG_DLR,381,108779
NA,,196563,860,2012-09-05 19:26:38,193891,ASSESSMENT,OPRTNL,APPROVED,2012-05-31,Wing,Che,,,,RTNG_DLR,381,108774

And need to read 3rd, 6th, and 9th column. Can this be done only in sequential file (I mean without using any other stage), i.e, output of Seq file are only 3,6 n 9th column. what will be minimum stg design.

prasson_ibm · Post by **prasson_ibm** » Fri Jan 04, 2013 5:02 am

Hi,
Yes it is possible.Design a job with source as Sequential file stage and in column tab define Varchar(2000).
Remove delimiter options from seqential file stage,so stage will read entire row as a one single string.
In transformer apply field function to get 3ed,6th and 9th field.

Thanks
Prasoon

chulett · Post by **chulett** » Fri Jan 04, 2013 8:30 am

Gen1715 wrote:Can this be done only in sequential file

No, not directly. You could however leverage an O/S command like awk in the Filter property to pass in only those three columns to the stage.

ray.wurlod · Post by **ray.wurlod** » Fri Jan 04, 2013 2:27 pm

Another choice is to define the structure completely in metadata, and apply the Drop On Import property to all fields that you don't require to be read.

rameshrr3 · Post by **rameshrr3** » Fri Jan 04, 2013 4:17 pm

You can also use external source stage

Command/Program :

Code: Select all

cat textfile.dat | awk -F',' '{ print $3","$6","$9 }'

Read this into a single long varchar field and parse it with a column Import stage or Transformer with Field() function .

ray.wurlod · Post by **ray.wurlod** » Sat Jan 05, 2013 1:20 am

... or the same as a Filter command in the Sequential File stage.

Isn't technology wonderful?

chulett · Post by **chulett** » Sat Jan 05, 2013 9:20 am

ray.wurlod wrote:Another choice is to define the structure completely in metadata, and apply the Drop On Import property to all fields that you don't require to be read.

Interesting. Wouldn't that technically be "that you don't require to be output" rather than "read" as all columns must actually be read by the stage. Technically.

ray.wurlod · Post by **ray.wurlod** » Sat Jan 05, 2013 3:19 pm

Yes. Technically.

For such is the nature of a sequential access file - you must read past every byte to get to the next byte.

chulett · Post by **chulett** » Sat Jan 05, 2013 3:20 pm

Exactamundo.

ray.wurlod · Post by **ray.wurlod** » Sat Jan 05, 2013 3:22 pm

But of course that's also true for the cat command.

chulett · Post by **chulett** » Sat Jan 05, 2013 3:45 pm

Of course... everything must be read, it's the nature of sequential media.

Gen1715 · Post by **Gen1715** » Tue Jan 08, 2013 2:32 am

chulett wrote:
Gen1715 wrote:Can this be done only in sequential file
No, not directly. You could however leverage an O/S command like awk in the Filter property to pass in only those three column ...

Nice approach

That makes complete sense, thanks for spending time on problem, Though I have one more query now..

Say I have a text file without any delimiter - which was suppose to come as fixed length, and now having some deviation and have some variable length records.

Earlier approach that I followed to read this file was -
SEQ --> Copy --> SEQ
In 1st Seq file read record and provided start position for each column at Column level property. and used copy stage to remove unrequired columns.

Now with the same approach following sample file -

Sample File - Say Max length is 10 records
ABCDEFGHIJ
ABC
ABCDIJ

can read only 1st column in which start position for all records are defined, remaining two records will be dropped.

Can somehow I incorporate reading this kind of variable length text file in Sequential file stage?

ray.wurlod · Post by **ray.wurlod** » Tue Jan 08, 2013 3:43 pm

The Sequential File stage for server jobs is much better equipped for handling missing columns.

Get this working in a Server job then, if you feel it's necessary for other reasons, encapsulate the server Sequential File stage and its output link in a server Shared Container, which you can use in a parallel job.

DSXchange

Reading from Variable length text file

Reading from Variable length text file

Re: Reading from Variable length text file

Re: Reading from Variable length text file

Re: Reading from Variable length text file