Page 1 of 1

handling a (sort of) fixed length file

Posted: Mon Jun 16, 2008 2:29 pm
by hobocamp
Wondering if anyone has run into anything similar to this -

We need to bring in a new vendor file through ETL. It's basically a fixed length file, with the exception that any empty or null fields have been replaced by a tilde (~) in a one-byte field. So of course a tilde could represent anything from 1 to X bytes. Any suggestions would be appreciated.

Thanks in advance.

Tom

Posted: Mon Jun 16, 2008 4:33 pm
by ray.wurlod
Then you have to treat it as not fixed width. Read it as a single VarChar column, and parse it within a Transformer stage. Or pre-process the file to insert delimiter characters, using either a shell script or a BASIC routine.

Posted: Tue Jun 17, 2008 7:18 am
by hobocamp
Ray -

Thanks for your response.

I thought it might involve reading each row as a single string.

As far as parsing within a transformer, do you have any suggestions how this might be done? I'm having trouble coming up with how I would keep my place in the row, and match that to the file schema. For example, once the first tilda is found, the record length is shortened by FieldLength -1, and each successive tilda (if any) will do the same thing. (Sorry if this description is fuzzy - I'm having almost as much trouble describing it as thinking about it.)

Thanks again.
Tom

Posted: Tue Jun 17, 2008 7:33 am
by chulett
So... you've got records with no field delimiters but not all fields are fully populated? Who comes up with crap formats like this? :evil:

I don't see how you would do something like this outside of a C or C++ program. Or, I guess, a BASIC routine that does the same, but I imagine that would be... slower. Stream in a character at a time and decide what to do with it based on record number, current position and if it is a tilde or not. I'd probably try to build a 'normal' delimited file (or record in a routine) from that output and then process it... well, normally.

I know what my first reaction would be if someone tried to pawn this off on me - forget it. Push back. Insist on a proper fixed-width file with no 'sort of' involved or get a proper delimited file. Easier said then done, I know, but I would certainly explore that as my first 'option'. :wink:

Posted: Tue Jun 17, 2008 7:34 am
by ArndW
Unless you have a column separator character you won't be able to do this without a lot of work. You would then need to know the starting position of each column and match that to the position of the tilde; then insert the appropriate number of dummy characters to shift everything to the right of the tilde to the correct position. If multiple tildes exist per line you would need to put this into a loop.

Posted: Tue Jun 17, 2008 12:53 pm
by hobocamp
Thanks everyone for your suggestions. I was able to come up with a Basic routine to handle the file. I use a counter as a positional reference to keep track of where I am in the file. It's not really reusable (we have several of this file type and I'll have to customize the routine for each file), but at least it works!

Thanks again.

I'll mark this thread as Resolved.

Tom