How to process Variable Length File

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rohit.agarwalin
Participant
Posts: 11
Joined: Mon Feb 04, 2013 8:03 am

How to process Variable Length File

Post by rohit.agarwalin »

Hi,

I have a requirement to read the variable length file (ASCII - having packed decimals in each record type) which does not have any record delimiter.
There are different types of records present in file and all they have different length. Each record has 2 byte field (RDW) which has record length and after this field it has entire record of that length plus some offset at the end of record to make entire record length multiple of 4.

e.g. [RDW(2 byte field)=12][12 byte record][2 byte padding]
since record length including RDW is 14 byte so there is padding of 2 byte to make it multiple of 4 byte.

Could any one help me to suggest how to read this file. Can I read this file using CFF stage or I need to write some program to read it.

I have gone through different topics in this forum but did not get answer I am looking for. In case I missed any post giving solution of this please let me know the link.

Thanks,
Rohit
Thanks & Regards,
Rohit
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Well... the variable length record part with packed decimals and a record length field at the front would work in the CFF stage - up until the 'padded to a multiple of 4' part, which throws a damper on things. I'm thinking you'd have to go 'BuildOp' (or some other custom solution) here but curious what others think.

How will you recognize each of the 'record types' in the file, do they each have a unique record length? Or some record type identifier field? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
rohit.agarwalin
Participant
Posts: 11
Joined: Mon Feb 04, 2013 8:03 am

Post by rohit.agarwalin »

Record type will be recognised by the values in RDW field. Each record type have different length so the value of RDW field.
There is no record delimiter.

I have got a workaround using CFF stage using pure TEXT file. Now I will perform this test using the HEX and packed values. In real file RDW have record length values in HEX.

For each record type we know how many bytes have been padded so I will declare that as part of record layout.

I will let you know once I get success in next Test.

Thanks for your help and please let me know if you have better way to read the file or in case my test does not work then I will be back with same question.
Thanks & Regards,
Rohit
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

You are on the right track, as the CFF file allows multiple record types with each of those possibly containing a different number of columns and having a different length.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

However, the length noted in the field is *not* the actual length as it is then "padded to a multiple of 4" if needed... that seems like an issue to me.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The column "RDW" is used to determine the type of record, then depending on that a given record layout can be used/defined in the CFF stage - and that will be some number of packed fields.
Post Reply