Reading from Variable length text file
Moderators: chulett, rschirm, roy
Reading from Variable length text file
Say I have a variable length file -
NA,ASSESSMENT,197436,860,2012-09-18 16:05:47,195652,ASSESSMENT,OPRTNL,PENDING,2011-12-31,Beth,Norris,XZFXB1,,12,,,,RTNG_DLR,,109517
NA,ASSESSMENT,218152,860,,193893,ASSESSMENT,OPRTNL,APPROVED,2011-09-30,Wing
NA,ASSESSMENT,220790,860,2012-09-20 04:32:23,195809,,OPRTNL,PENDING,2011-12-31,Winnie,Jin,109604
NA,ASSESSMENT,222046,860,2012-09-05 22:35:20,193902,ASSESSMENT,,APPROVED,2012-03-31,Sherry,Feng,cz7mdr,Overriding with the group ,3,,,,RTNG_DLR,381,108779
NA,,196563,860,2012-09-05 19:26:38,193891,ASSESSMENT,OPRTNL,APPROVED,2012-05-31,Wing,Che,,,,RTNG_DLR,381,108774
And need to read 3rd, 6th, and 9th column. Can this be done only in sequential file (I mean without using any other stage), i.e, output of Seq file are only 3,6 n 9th column. what will be minimum stg design.
NA,ASSESSMENT,197436,860,2012-09-18 16:05:47,195652,ASSESSMENT,OPRTNL,PENDING,2011-12-31,Beth,Norris,XZFXB1,,12,,,,RTNG_DLR,,109517
NA,ASSESSMENT,218152,860,,193893,ASSESSMENT,OPRTNL,APPROVED,2011-09-30,Wing
NA,ASSESSMENT,220790,860,2012-09-20 04:32:23,195809,,OPRTNL,PENDING,2011-12-31,Winnie,Jin,109604
NA,ASSESSMENT,222046,860,2012-09-05 22:35:20,193902,ASSESSMENT,,APPROVED,2012-03-31,Sherry,Feng,cz7mdr,Overriding with the group ,3,,,,RTNG_DLR,381,108779
NA,,196563,860,2012-09-05 19:26:38,193891,ASSESSMENT,OPRTNL,APPROVED,2012-05-31,Wing,Che,,,,RTNG_DLR,381,108774
And need to read 3rd, 6th, and 9th column. Can this be done only in sequential file (I mean without using any other stage), i.e, output of Seq file are only 3,6 n 9th column. what will be minimum stg design.
Gaurav Martha
-
- Premium Member
- Posts: 536
- Joined: Thu Oct 11, 2007 1:48 am
- Location: Bangalore
Re: Reading from Variable length text file
Hi,
Yes it is possible.Design a job with source as Sequential file stage and in column tab define Varchar(2000).
Remove delimiter options from seqential file stage,so stage will read entire row as a one single string.
In transformer apply field function to get 3ed,6th and 9th field.
Thanks
Prasoon
Yes it is possible.Design a job with source as Sequential file stage and in column tab define Varchar(2000).
Remove delimiter options from seqential file stage,so stage will read entire row as a one single string.
In transformer apply field function to get 3ed,6th and 9th field.
Thanks
Prasoon
Re: Reading from Variable length text file
No, not directly. You could however leverage an O/S command like awk in the Filter property to pass in only those three columns to the stage.Gen1715 wrote:Can this be done only in sequential file
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You can also use external source stage
Command/Program :
Read this into a single long varchar field and parse it with a column Import stage or Transformer with Field() function .
Command/Program :
Code: Select all
cat textfile.dat | awk -F',' '{ print $3","$6","$9 }'
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Interesting. Wouldn't that technically be "that you don't require to be output" rather than "read" as all columns must actually be read by the stage. Technically.ray.wurlod wrote:Another choice is to define the structure completely in metadata, and apply the Drop On Import property to all fields that you don't require to be read.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Re: Reading from Variable length text file
Nice approach That makes complete sense, thanks for spending time on problem, Though I have one more query now..chulett wrote:No, not directly. You could however leverage an O/S command like awk in the Filter property to pass in only those three column ...Gen1715 wrote:Can this be done only in sequential file
Say I have a text file without any delimiter - which was suppose to come as fixed length, and now having some deviation and have some variable length records.
Earlier approach that I followed to read this file was -
SEQ --> Copy --> SEQ
In 1st Seq file read record and provided start position for each column at Column level property. and used copy stage to remove unrequired columns.
Now with the same approach following sample file -
Sample File - Say Max length is 10 records
ABCDEFGHIJ
ABC
ABCDIJ
can read only 1st column in which start position for all records are defined, remaining two records will be dropped.
Can somehow I incorporate reading this kind of variable length text file in Sequential file stage?
Gaurav Martha
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The Sequential File stage for server jobs is much better equipped for handling missing columns.
Get this working in a Server job then, if you feel it's necessary for other reasons, encapsulate the server Sequential File stage and its output link in a server Shared Container, which you can use in a parallel job.
Get this working in a Server job then, if you feel it's necessary for other reasons, encapsulate the server Sequential File stage and its output link in a server Shared Container, which you can use in a parallel job.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.