Fixed Width, variable length ascii file

asyed · Post by **asyed** » Fri Mar 07, 2014 2:57 am

Hi,

I have to read a sequential file, which could be either 500 bytes or 800 bytes (either of them). All fields are char fields.

Is there a way to implement the above in a single datastage job such as specifying 800 bytes and specifying to check new line either at 500 or 800.

chulett · Post by **chulett** » Fri Mar 07, 2014 8:32 am

Hard to say but the first thing that comes to mind is read it was a single 800 byte string then check the actual length or each record. From there you can go down either a 500 or 800 byte path to parse it appropriately.

Clarify something for grins - is the 500 layout a subset of the 800 or are they different? Meaning, are the first 500 the same between the two and one just carries an extra trailing 300?

FranklinE · Post by **FranklinE** » Fri Mar 07, 2014 9:22 am

Your situation is a basic scenario for multiple record types in one file. The Cobol standard (absent the different record lengths) is the first column is a record-type indicator.

Craig's suggestion works on its own. If you had something other than length to indicate the different record type, you might use CFF with your logic set to the record type.

asyed · Post by **asyed** » Fri Mar 07, 2014 9:52 am

Clarify something for grins - is the 500 layout a subset of the 800 or are they different? Meaning, are the first 500 the same between the two and one just carries an extra trailing 300?

yes the 500 byte layout is a subset of 800 bye layout

If you had something other than length to indicate the different record type, you might use CFF with your logic set to the record type.

The length is the only indicator for the record type.

Is there any way other than parsing the entire record, as it is quite a huge file with large number of fields. [800 byte, 500 byte was an example, we might have more number of bytes per record]

chulett · Post by **chulett** » Fri Mar 07, 2014 10:36 am

You could try defining the first 500 as individual fields and then just leave the last 300 (or whatever) as an optional post-read parse. Or just read it as a single string and parse it later using the Column Export stage (unless, of course, it's the Column Import stage that goes 'one to many' - I never remember which dang one is which without looking).

Got to be parsed regardless and I don't believe you'll find a critical difference between the Sequential File stage doing that versus the other stage. It's probably the same operator under the covers.

ray.wurlod · Post by **ray.wurlod** » Fri Mar 07, 2014 2:44 pm

Read a single line as VarChar(800) and manage its length and its parsing in a downstream Transformer stage.

This gives the beneficial side effect that your reading is a simple stream (which is as fast as possible) and that your parsing is done in parallel.

asyed · Post by **asyed** » Sat Mar 08, 2014 5:53 pm

Hi

You could try defining the first 500 as individual fields and then just leave the last 300 (or whatever) as an optional post-read parse.

Could you tell me how we could specify "optional post-read parse", because if I add any a field [varchar 300] after 500 bytes and then the job is dropping records

chulett · Post by **chulett** » Sun Mar 09, 2014 9:01 am

Which specific records are being dropped?

Keep in mind it was just a thought... me, I'd stick with reading the file as a single string and do the parsing inside the job.