Design issue another spin

Post by **admin** » Sun Sep 09, 2001 8:06 pm

This is a topic for an orphaned message.

Post by **admin** » Sun Sep 09, 2001 8:06 pm

Here is some food for thought;

Technique #1 - assumption is that the header/detail rows have a key
relationship

Using the complex flat file stage you can have multiple output links. Each
link represents a record type and the CFF stage supports variable record
lengths if there is a line terminator. The header records will be written to
a hash file. The lower level details are passed into a transformer stage
which then does a subsequent lookup for the corresponding header row. At this
juncture you can do what ever you wish with the now combined header/detail
row (this handles many to one). This is done entirely within DataStage and
does not require bit fiddling. Gets real ugly if you have lots of record
types..

------> [hash_header]
/ |
hdr lookup_hdr
/ |
[CFFStage]-------detail---->------------------>Target

The engine will automatically load the hash table first followed by
processing the detail rows. This turns out to actually run pretty fast in
ds4.2 w/ write cache/load to memory turned on

Technique #2 - Assumption is that you do not have a key relationship with
header/detail (associated only by the fact that the header is followed with
the detail records)

Rays method.. Process each row as a single column and store the header in a
stage variable. Keeping track of group breaks also needs to be managed
within the stage variables. Requires substring processing. Does not require
lookups. Gets real ugly if you have lots of record types and/or lots of
columns..

I am interested in comments regarding a 3rd technique which would satisfy
technique #1 and #2 without the lookups or the extra substringing. I built a
prototype stage which takes multiple inputs from the complex flat file stage
and automaps the output link columns to the input link(s) (using the column
name) as well as handle the header/detail component such that for every
detail row the header row is produced. This seems to be not only a much
cleaner development approach but performance is also improved. Unmapped
source columns are noted in the log. Unmapped Target columns (where no input
link has a corresponding column name) causes the stage to generate a fatal
log event. The only problem is if in the data you have no detail row but you
have a header row (without a key relationship) and it duplicate names in the
input links (grabs the first name it finds).

/-----------header-------->|This stage reads header records |
[CFFStage] |when the detailhas a group
| -->Target
-----------detail---------->| break or if its the first row
|

I could envision a enhanced version which could support multi levels (more
than 2) using the same concepts.

Lastly in the past I have had to deal with true variable records with out
termination. I processed these by creating a fixed block sequencial file
reader routine and called it in a stage variable crawling through the file
bock by block. Im wondering how many folks out there might have this type
of challenge?

-Allen