Page 1 of 1

How to handle reading files with variable lengthed records?

Posted: Thu Jan 03, 2008 10:07 am
by MMarietta
I have a file with 10 fields. When the fields exist the file record is 100 bytes. Sometimes the last field is missing, sometimes the last 7 fields are missing and everything in between.

I don't care if these are missing but it's causing problems for the Sequential file stage.

What can I do to read this file and process data if it exists?

Re: How to handle reading files with variable lengthed recor

Posted: Thu Jan 03, 2008 11:01 am
by vivekgadwal
MMarietta wrote:I have a file with 10 fields. When the fields exist the file record is 100 bytes. Sometimes the last field is missing, sometimes the last 7 fields are missing and everything in between.

I don't care if these are missing but it's causing problems for the Sequential file stage.

What can I do to read this file and process data if it exists?
If it is not a fixed width file, then what is the delimiter between the fields in your file? Sequential file stage differentiates fields according to the delimiter that you specify. If the fields are missing, you still would have the delimiters in the file and the stage picks up NULLS for those missing fields.

Re: How to handle reading files with variable lengthed recor

Posted: Thu Jan 03, 2008 2:09 pm
by sud
Not sure exactly what you are trying to do, but there can be two broad ways of tackling this.

1> Create a custom stage --- find out how many delimiters are there and accordingly pass out the data. plus you can do a lot more ...

2> Read each record into a single column (whole record) and then counting the number of delimiters figure out what to do ... all this logic goes inside a transformer

Approach 1 gives you the liberty to do literally whatever you want to do with the data.

Posted: Thu Jan 03, 2008 3:41 pm
by ray.wurlod
Are the fields AND their intermediate delimiters missing, or are all the delimiters there and the fields between them empty? You handle the two cases differently.

In the first case read each line (record) in the file as a single VarChar field and parse it subsequently.

In the second case adjust the individual field properties so that "" is recognized as representing NULL. Perhaps also provide a default value with which this can be replaced.