Page 1 of 1

Reading from Variable length text file

Posted: Fri Jan 04, 2013 4:31 am
by Gen1715
Say I have a variable length file -

NA,ASSESSMENT,197436,860,2012-09-18 16:05:47,195652,ASSESSMENT,OPRTNL,PENDING,2011-12-31,Beth,Norris,XZFXB1,,12,,,,RTNG_DLR,,109517
NA,ASSESSMENT,218152,860,,193893,ASSESSMENT,OPRTNL,APPROVED,2011-09-30,Wing
NA,ASSESSMENT,220790,860,2012-09-20 04:32:23,195809,,OPRTNL,PENDING,2011-12-31,Winnie,Jin,109604
NA,ASSESSMENT,222046,860,2012-09-05 22:35:20,193902,ASSESSMENT,,APPROVED,2012-03-31,Sherry,Feng,cz7mdr,Overriding with the group ,3,,,,RTNG_DLR,381,108779
NA,,196563,860,2012-09-05 19:26:38,193891,ASSESSMENT,OPRTNL,APPROVED,2012-05-31,Wing,Che,,,,RTNG_DLR,381,108774

And need to read 3rd, 6th, and 9th column. Can this be done only in sequential file (I mean without using any other stage), i.e, output of Seq file are only 3,6 n 9th column. what will be minimum stg design.

Re: Reading from Variable length text file

Posted: Fri Jan 04, 2013 5:02 am
by prasson_ibm
Hi,
Yes it is possible.Design a job with source as Sequential file stage and in column tab define Varchar(2000).
Remove delimiter options from seqential file stage,so stage will read entire row as a one single string.
In transformer apply field function to get 3ed,6th and 9th field.

Thanks
Prasoon

Re: Reading from Variable length text file

Posted: Fri Jan 04, 2013 8:30 am
by chulett
Gen1715 wrote:Can this be done only in sequential file
No, not directly. You could however leverage an O/S command like awk in the Filter property to pass in only those three columns to the stage.

Posted: Fri Jan 04, 2013 2:27 pm
by ray.wurlod
Another choice is to define the structure completely in metadata, and apply the Drop On Import property to all fields that you don't require to be read.

Posted: Fri Jan 04, 2013 4:17 pm
by rameshrr3
You can also use external source stage

Command/Program :

Code: Select all

cat textfile.dat | awk -F',' '{ print $3","$6","$9 }'
Read this into a single long varchar field and parse it with a column Import stage or Transformer with Field() function .

Posted: Sat Jan 05, 2013 1:20 am
by ray.wurlod
... or the same as a Filter command in the Sequential File stage.

Isn't technology wonderful?

Posted: Sat Jan 05, 2013 9:20 am
by chulett
ray.wurlod wrote:Another choice is to define the structure completely in metadata, and apply the Drop On Import property to all fields that you don't require to be read.
Interesting. Wouldn't that technically be "that you don't require to be output" rather than "read" as all columns must actually be read by the stage. Technically. :wink:

Posted: Sat Jan 05, 2013 3:19 pm
by ray.wurlod
Yes. Technically.

For such is the nature of a sequential access file - you must read past every byte to get to the next byte.

Posted: Sat Jan 05, 2013 3:20 pm
by chulett
Exactamundo.

Posted: Sat Jan 05, 2013 3:22 pm
by ray.wurlod
But of course that's also true for the cat command.

Posted: Sat Jan 05, 2013 3:45 pm
by chulett
Of course... everything must be read, it's the nature of sequential media.

Re: Reading from Variable length text file

Posted: Tue Jan 08, 2013 2:32 am
by Gen1715
chulett wrote:
Gen1715 wrote:Can this be done only in sequential file
No, not directly. You could however leverage an O/S command like awk in the Filter property to pass in only those three column ...
Nice approach :) That makes complete sense, thanks for spending time on problem, Though I have one more query now.. :)
Say I have a text file without any delimiter - which was suppose to come as fixed length, and now having some deviation and have some variable length records.

Earlier approach that I followed to read this file was -
SEQ --> Copy --> SEQ
In 1st Seq file read record and provided start position for each column at Column level property. and used copy stage to remove unrequired columns.

Now with the same approach following sample file -

Sample File - Say Max length is 10 records
ABCDEFGHIJ
ABC
ABCDIJ

can read only 1st column in which start position for all records are defined, remaining two records will be dropped.

Can somehow I incorporate reading this kind of variable length text file in Sequential file stage?

Posted: Tue Jan 08, 2013 3:43 pm
by ray.wurlod
The Sequential File stage for server jobs is much better equipped for handling missing columns.

Get this working in a Server job then, if you feel it's necessary for other reasons, encapsulate the server Sequential File stage and its output link in a server Shared Container, which you can use in a parallel job.