Reading from Variable length text file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Gen1715
Participant
Posts: 25
Joined: Mon Nov 05, 2012 5:42 am
Location: India

Reading from Variable length text file

Post by Gen1715 »

Say I have a variable length file -

NA,ASSESSMENT,197436,860,2012-09-18 16:05:47,195652,ASSESSMENT,OPRTNL,PENDING,2011-12-31,Beth,Norris,XZFXB1,,12,,,,RTNG_DLR,,109517
NA,ASSESSMENT,218152,860,,193893,ASSESSMENT,OPRTNL,APPROVED,2011-09-30,Wing
NA,ASSESSMENT,220790,860,2012-09-20 04:32:23,195809,,OPRTNL,PENDING,2011-12-31,Winnie,Jin,109604
NA,ASSESSMENT,222046,860,2012-09-05 22:35:20,193902,ASSESSMENT,,APPROVED,2012-03-31,Sherry,Feng,cz7mdr,Overriding with the group ,3,,,,RTNG_DLR,381,108779
NA,,196563,860,2012-09-05 19:26:38,193891,ASSESSMENT,OPRTNL,APPROVED,2012-05-31,Wing,Che,,,,RTNG_DLR,381,108774

And need to read 3rd, 6th, and 9th column. Can this be done only in sequential file (I mean without using any other stage), i.e, output of Seq file are only 3,6 n 9th column. what will be minimum stg design.
Gaurav Martha
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Re: Reading from Variable length text file

Post by prasson_ibm »

Hi,
Yes it is possible.Design a job with source as Sequential file stage and in column tab define Varchar(2000).
Remove delimiter options from seqential file stage,so stage will read entire row as a one single string.
In transformer apply field function to get 3ed,6th and 9th field.

Thanks
Prasoon
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Reading from Variable length text file

Post by chulett »

Gen1715 wrote:Can this be done only in sequential file
No, not directly. You could however leverage an O/S command like awk in the Filter property to pass in only those three columns to the stage.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Another choice is to define the structure completely in metadata, and apply the Drop On Import property to all fields that you don't require to be read.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rameshrr3
Premium Member
Premium Member
Posts: 609
Joined: Mon May 10, 2004 3:32 am
Location: BRENTWOOD, TN

Post by rameshrr3 »

You can also use external source stage

Command/Program :

Code: Select all

cat textfile.dat | awk -F',' '{ print $3","$6","$9 }'
Read this into a single long varchar field and parse it with a column Import stage or Transformer with Field() function .
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

... or the same as a Filter command in the Sequential File stage.

Isn't technology wonderful?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

ray.wurlod wrote:Another choice is to define the structure completely in metadata, and apply the Drop On Import property to all fields that you don't require to be read.
Interesting. Wouldn't that technically be "that you don't require to be output" rather than "read" as all columns must actually be read by the stage. Technically. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes. Technically.

For such is the nature of a sequential access file - you must read past every byte to get to the next byte.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Exactamundo.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

But of course that's also true for the cat command.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Of course... everything must be read, it's the nature of sequential media.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Gen1715
Participant
Posts: 25
Joined: Mon Nov 05, 2012 5:42 am
Location: India

Re: Reading from Variable length text file

Post by Gen1715 »

chulett wrote:
Gen1715 wrote:Can this be done only in sequential file
No, not directly. You could however leverage an O/S command like awk in the Filter property to pass in only those three column ...
Nice approach :) That makes complete sense, thanks for spending time on problem, Though I have one more query now.. :)
Say I have a text file without any delimiter - which was suppose to come as fixed length, and now having some deviation and have some variable length records.

Earlier approach that I followed to read this file was -
SEQ --> Copy --> SEQ
In 1st Seq file read record and provided start position for each column at Column level property. and used copy stage to remove unrequired columns.

Now with the same approach following sample file -

Sample File - Say Max length is 10 records
ABCDEFGHIJ
ABC
ABCDIJ

can read only 1st column in which start position for all records are defined, remaining two records will be dropped.

Can somehow I incorporate reading this kind of variable length text file in Sequential file stage?
Gaurav Martha
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The Sequential File stage for server jobs is much better equipped for handling missing columns.

Get this working in a Server job then, if you feel it's necessary for other reasons, encapsulate the server Sequential File stage and its output link in a server Shared Container, which you can use in a parallel job.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply