Variable record length file read

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
harshada
Premium Member
Premium Member
Posts: 92
Joined: Tue May 29, 2007 8:40 am

Variable record length file read

Post by harshada »

Hi,

I have a requirement where I have to read a file having variable record length.

My input is like given below:

Date Published: 01/12/2006
"PD","ENO","Status","Surname-Su","Forenam","01/01/2008 ","Franchise Flag","Separate ","1 Separ 2","Address 3","Address
4","Separate ","Separate ","Separate ","Separate ","Separate ","PO576OD ","Opt Out"
"AA","1","","Sur","abc","def"
"AA","2","","Surname"
Date Published: 01/12/2006
"PD","ENO","Status","Surname-Su","Forenam","01/01/2008 ","Franchise Flag","Separate ","1 Separ 2","Address 3","Address
4","Separate ","Separate ","Separate ","Separate ","Separate ","PO576OD ","Opt Out"
"AA","1","","Sur"
"AA","2","","Surname"


There are 2 headers. Min number of fields are 4 and 6 max fields in each record in the body.Headers repeat after some records which is not fixed exactly. I just need the file body. Headers are easily removed using a filter. I also removed the "" in data using the filter.I am reading each record in single column and then splitting the data in each record depending on the delimiter.Then I was able to retrieve the data in all 6 fields using the transformer functions(field function). But the problem comes when there is a ',' as a part of data as i was using the delimiter in the field function as ','.

So I believe I should not remove " " in the filter. But I am unable to split the data for each of the 6 fields after that.

Can anyone help me seperate out each field's data?

Thanks in advance.
Harshada
dsnovice
Participant
Posts: 38
Joined: Thu Jul 22, 2004 11:56 pm
Location: Mclean, VA
Contact:

Post by dsnovice »

You can do multiple things.. one is when you read the file using sed replace '","' the combination of Quotes, comma and quotes to a different delimiter like ~ or tab. then you can use your field function to separate the records. Remove the remaining Quotes " etc..

or if you do not want to use sed you can create a routine as given by DSGuru Ereplace and convert the delimiters.
http://dsxchange.com/viewtopic.php?t=10 ... 8045672805

Also use convert function to replace quotes etc.. Good luck,

HTH

a novice
Post Reply