Line Splitup

badri · Post by **badri** » Thu Jul 15, 2004 3:18 am

Hi,

I have a requirement like in the below scenario

I need to split source data in to multiples of 100 records and place a header record for them as given below

H...A...B..C....D..............................................1 (sequence number)
L1.........................E1.....F1.....G1......X............2
L2.........................E2.....F2.....G2......Y............3
.
.
L100.....................E100......F100.....G100.....X.........101

Where H is the header record and L1,L2...L100 are Line items and A,B,C,D,E,F,G are fields.

Now, I need to provide a separate summary line for all X records and a separate summary line for all Y records as given below.

H...A...B..C....D..............................................1 (sequence number)
L1.........................E1.....F1.....G1......X............2
.
L100.....................E100......F100.....G100.....X.........101

S1-------------------------------------------------------------- (summary line 1)

L2.........................E2.....F2.....G2......Y............3
.
.
S2--------------------------------------------------------------(summary line 2)

Note: Sequence number should be the max of the group line items (for X / Y records)

chulett · Post by **chulett** » Thu Jul 15, 2004 5:47 am

Is there a question in here somewhere?

You haven't given us any clues about your source data, only (apparently) what you need to do with it. Do you need help with "line splitup" or a different part or ?? Off the cuff...

* You can probably use the Row Splitter stage to help tell the difference between the header and detail records.

* Use constraints to run links off for the 'X' and 'Y' rows, each to an Aggregator stage to get your group totals.

* Write the output to seperate flat files and then combine and sort them, after job, using O/S scripts.

The last step can be a little tricky when you need to generate multiple 'sub total' type records. Creative use of a Hash File target is in order in that case. What I've done in the past is create a key for the hash that allows me to differentiate between header, detail and subtotal records but ensures I can get them out later in the proper sequence. Then you can have multiple processing streams writing to the same hash and then (in a second step or second job) pull the records out of the hash using a UV stage to enforce an order, leaving the key behind in the process.

Hope that helps get the creative juices flowing...