Page 1 of 1

Line Splitup

Posted: Thu Jul 15, 2004 3:18 am
by badri
Hi,

I have a requirement like in the below scenario

I need to split source data in to multiples of 100 records and place a header record for them as given below

H...A...B..C....D..............................................1 (sequence number)
L1.........................E1.....F1.....G1......X............2
L2.........................E2.....F2.....G2......Y............3
.
.
L100.....................E100......F100.....G100.....X.........101

Where H is the header record and L1,L2...L100 are Line items and A,B,C,D,E,F,G are fields.

Now, I need to provide a separate summary line for all X records and a separate summary line for all Y records as given below.


H...A...B..C....D..............................................1 (sequence number)
L1.........................E1.....F1.....G1......X............2
.
L100.....................E100......F100.....G100.....X.........101

S1-------------------------------------------------------------- (summary line 1)

L2.........................E2.....F2.....G2......Y............3
.
.
S2--------------------------------------------------------------(summary line 2)

Note: Sequence number should be the max of the group line items (for X / Y records)

Posted: Thu Jul 15, 2004 5:47 am
by chulett
Is there a question in here somewhere? :?

You haven't given us any clues about your source data, only (apparently) what you need to do with it. Do you need help with "line splitup" or a different part or ?? Off the cuff...

* You can probably use the Row Splitter stage to help tell the difference between the header and detail records.

* Use constraints to run links off for the 'X' and 'Y' rows, each to an Aggregator stage to get your group totals.

* Write the output to seperate flat files and then combine and sort them, after job, using O/S scripts.

The last step can be a little tricky when you need to generate multiple 'sub total' type records. Creative use of a Hash File target is in order in that case. What I've done in the past is create a key for the hash that allows me to differentiate between header, detail and subtotal records but ensures I can get them out later in the proper sequence. Then you can have multiple processing streams writing to the same hash and then (in a second step or second job) pull the records out of the hash using a UV stage to enforce an order, leaving the key behind in the process.

Hope that helps get the creative juices flowing...