How to handle reading files with variable lengthed records?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
MMarietta
Participant
Posts: 6
Joined: Tue Sep 04, 2007 7:39 am

How to handle reading files with variable lengthed records?

Post by MMarietta »

I have a file with 10 fields. When the fields exist the file record is 100 bytes. Sometimes the last field is missing, sometimes the last 7 fields are missing and everything in between.

I don't care if these are missing but it's causing problems for the Sequential file stage.

What can I do to read this file and process data if it exists?
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Re: How to handle reading files with variable lengthed recor

Post by vivekgadwal »

MMarietta wrote:I have a file with 10 fields. When the fields exist the file record is 100 bytes. Sometimes the last field is missing, sometimes the last 7 fields are missing and everything in between.

I don't care if these are missing but it's causing problems for the Sequential file stage.

What can I do to read this file and process data if it exists?
If it is not a fixed width file, then what is the delimiter between the fields in your file? Sequential file stage differentiates fields according to the delimiter that you specify. If the fields are missing, you still would have the delimiters in the file and the stage picks up NULLS for those missing fields.
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
sud
Premium Member
Premium Member
Posts: 366
Joined: Fri Dec 02, 2005 5:00 am
Location: Here I Am

Re: How to handle reading files with variable lengthed recor

Post by sud »

Not sure exactly what you are trying to do, but there can be two broad ways of tackling this.

1> Create a custom stage --- find out how many delimiters are there and accordingly pass out the data. plus you can do a lot more ...

2> Read each record into a single column (whole record) and then counting the number of delimiters figure out what to do ... all this logic goes inside a transformer

Approach 1 gives you the liberty to do literally whatever you want to do with the data.
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Are the fields AND their intermediate delimiters missing, or are all the delimiters there and the fields between them empty? You handle the two cases differently.

In the first case read each line (record) in the file as a single VarChar field and parse it subsequently.

In the second case adjust the individual field properties so that "" is recognized as representing NULL. Perhaps also provide a default value with which this can be replaced.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply