Fixed Width, variable length ascii file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
asyed
Participant
Posts: 16
Joined: Sun Dec 12, 2010 10:24 pm
Location: Hyderabad, India

Fixed Width, variable length ascii file

Post by asyed »

Hi,

I have to read a sequential file, which could be either 500 bytes or 800 bytes (either of them). All fields are char fields.

Is there a way to implement the above in a single datastage job such as specifying 800 bytes and specifying to check new line either at 500 or 800.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Hard to say but the first thing that comes to mind is read it was a single 800 byte string then check the actual length or each record. From there you can go down either a 500 or 800 byte path to parse it appropriately.

Clarify something for grins - is the 500 layout a subset of the 800 or are they different? Meaning, are the first 500 the same between the two and one just carries an extra trailing 300?
-craig

"You can never have too many knives" -- Logan Nine Fingers
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

Your situation is a basic scenario for multiple record types in one file. The Cobol standard (absent the different record lengths) is the first column is a record-type indicator.

Craig's suggestion works on its own. If you had something other than length to indicate the different record type, you might use CFF with your logic set to the record type.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
asyed
Participant
Posts: 16
Joined: Sun Dec 12, 2010 10:24 pm
Location: Hyderabad, India

Post by asyed »

Clarify something for grins - is the 500 layout a subset of the 800 or are they different? Meaning, are the first 500 the same between the two and one just carries an extra trailing 300?
yes the 500 byte layout is a subset of 800 bye layout
If you had something other than length to indicate the different record type, you might use CFF with your logic set to the record type.


The length is the only indicator for the record type.

Is there any way other than parsing the entire record, as it is quite a huge file with large number of fields. [800 byte, 500 byte was an example, we might have more number of bytes per record]
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You could try defining the first 500 as individual fields and then just leave the last 300 (or whatever) as an optional post-read parse. Or just read it as a single string and parse it later using the Column Export stage (unless, of course, it's the Column Import stage that goes 'one to many' - I never remember which dang one is which without looking).

Got to be parsed regardless and I don't believe you'll find a critical difference between the Sequential File stage doing that versus the other stage. It's probably the same operator under the covers.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Read a single line as VarChar(800) and manage its length and its parsing in a downstream Transformer stage.

This gives the beneficial side effect that your reading is a simple stream (which is as fast as possible) and that your parsing is done in parallel.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
asyed
Participant
Posts: 16
Joined: Sun Dec 12, 2010 10:24 pm
Location: Hyderabad, India

Post by asyed »

Hi
You could try defining the first 500 as individual fields and then just leave the last 300 (or whatever) as an optional post-read parse.
Could you tell me how we could specify "optional post-read parse", because if I add any a field [varchar 300] after 500 bytes and then the job is dropping records
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Which specific records are being dropped?

Keep in mind it was just a thought... me, I'd stick with reading the file as a single string and do the parsing inside the job.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply