Hi,
I have to read a sequential file, which could be either 500 bytes or 800 bytes (either of them). All fields are char fields.
Is there a way to implement the above in a single datastage job such as specifying 800 bytes and specifying to check new line either at 500 or 800.
Fixed Width, variable length ascii file
Moderators: chulett, rschirm, roy
Hard to say but the first thing that comes to mind is read it was a single 800 byte string then check the actual length or each record. From there you can go down either a 500 or 800 byte path to parse it appropriately.
Clarify something for grins - is the 500 layout a subset of the 800 or are they different? Meaning, are the first 500 the same between the two and one just carries an extra trailing 300?
Clarify something for grins - is the 500 layout a subset of the 800 or are they different? Meaning, are the first 500 the same between the two and one just carries an extra trailing 300?
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Your situation is a basic scenario for multiple record types in one file. The Cobol standard (absent the different record lengths) is the first column is a record-type indicator.
Craig's suggestion works on its own. If you had something other than length to indicate the different record type, you might use CFF with your logic set to the record type.
Craig's suggestion works on its own. If you had something other than length to indicate the different record type, you might use CFF with your logic set to the record type.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson
Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson
Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
yes the 500 byte layout is a subset of 800 bye layoutClarify something for grins - is the 500 layout a subset of the 800 or are they different? Meaning, are the first 500 the same between the two and one just carries an extra trailing 300?
If you had something other than length to indicate the different record type, you might use CFF with your logic set to the record type.
The length is the only indicator for the record type.
Is there any way other than parsing the entire record, as it is quite a huge file with large number of fields. [800 byte, 500 byte was an example, we might have more number of bytes per record]
You could try defining the first 500 as individual fields and then just leave the last 300 (or whatever) as an optional post-read parse. Or just read it as a single string and parse it later using the Column Export stage (unless, of course, it's the Column Import stage that goes 'one to many' - I never remember which dang one is which without looking).
Got to be parsed regardless and I don't believe you'll find a critical difference between the Sequential File stage doing that versus the other stage. It's probably the same operator under the covers.
Got to be parsed regardless and I don't believe you'll find a critical difference between the Sequential File stage doing that versus the other stage. It's probably the same operator under the covers.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Read a single line as VarChar(800) and manage its length and its parsing in a downstream Transformer stage.
This gives the beneficial side effect that your reading is a simple stream (which is as fast as possible) and that your parsing is done in parallel.
This gives the beneficial side effect that your reading is a simple stream (which is as fast as possible) and that your parsing is done in parallel.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Hi
Could you tell me how we could specify "optional post-read parse", because if I add any a field [varchar 300] after 500 bytes and then the job is dropping recordsYou could try defining the first 500 as individual fields and then just leave the last 300 (or whatever) as an optional post-read parse.