Inserting Sub-Totals in Sequential output

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
jjrbikes
Premium Member
Premium Member
Posts: 117
Joined: Tue Nov 25, 2003 11:09 am
Location: Minneapolis

Inserting Sub-Totals in Sequential output

Post by jjrbikes »

There's gotta be a way...
For simplicity sake, let's say I'm reading in a table that houses n records. Of those n records, when the value in field-A changes the records processed prior to that change need to be sub-totaled. This will happen 3-4 times in processing the data from the table. What I want to do is read in the records, do the necessary transformation(s) and then, in the sequential output I want to write out the records processed before the first change followed by a sub-total record, then write out the next group of records followed by a sub-total record and so on until all records have been processed. Following the last sub-total record I want to write out a grand total record.

Can I do all this with stage variables and constraints or do I need to include link-collectors, etc?

Thanks a Bunch!!!

Jennifer
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
Do you mean accumulative subtotal of the same columns ?
Could you give an example of some input rows and your output rquirements?
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
rasi
Participant
Posts: 464
Joined: Fri Oct 25, 2002 1:33 am
Location: Australia, Sydney

Post by rasi »

Yes this can be easily done in datastage. Use stage variables to hold the sub-total and sort the record before sending it to the transformation so that whenever the field-A changes you can display the total and reset the subtotal. Also declare a separate stage variable for Grand Total.

Thanks
Siva
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Your biggest problem is that a DataStage transformer does not let you create a row that didn't exist in your input file. For example you cannot feed one row into your transformer down one link and send two rows down an output link. So while it is very easy to create your subtotal values it is very hard to write them out. If DataStage reads in 200 rows it really wants to write out 200 rows and not 210 with 10 sub total rows.

What you can do is write your output rows down one link and your subtotal rows down another link. They would end up in two files. Your trigger for writing out a sub total row is when a new group is found.

To get them in one file you would join those two output links together using a link collector and write them out to one file, hopefully in the right order! Make your subtotal the first output link from the transformer so it is collected by the link and appended after the group it belongs to before the new group is written out.

For this to work the link collector will be expecting your subtotal row to have the same metadata format as your other rows.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Provided the input file is sorted, these techniques are OK. Thinking slightly outside the square, another approach is to use the ODBC driver for text files and generate the sub-totals with a grouping query. Of course this does not insert them after the detail rows in the output; however there's probably something clever you could do with a Link Collector stage whose collection algorithm is set to Sort/Merge.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jjrbikes
Premium Member
Premium Member
Posts: 117
Joined: Tue Nov 25, 2003 11:09 am
Location: Minneapolis

Post by jjrbikes »

Thanks for all the ideas.

Roy - here is a simplified sample of my input:

LNAME FNAME STATUS AMT OWED
Goodhue Alison New 0
Queen Dairy New 13.23
Arc Joan Res 27.19
Hill James Res 36
Smith Sam Res 93.78
Truman Harry Res 84.5
Harris Mary Vis 73.27
McDonald Sally Vis 56.75
Murray Ann Vis 11.91


and the OUTPUT we'd like to generate would look like this:

LNAME FNAME STATUS AMT OWED
Goodhue Alison New 0
Queen Dairy New 13.23
SubTotal For Status New = 13.23
Arc Joan Res 27.19
Hill James Res 36
Smith Sam Res 93.78
Truman Harry Res 84.5
SubTotal For Status Res = 241.47
Harris Mary Vis 73.27
McDonald Sally Vis 56.75
Murray Ann Vis 11.91
SubTotal For Status Vis = 141.93
Grand Total For All Status' = 396.63

If I can do this with Stage Variables - how the heck do I set those up??
Thanks again Everyone!!

Jennifer
chucksmith
Premium Member
Premium Member
Posts: 385
Joined: Wed Jun 16, 2004 12:43 pm
Location: Virginia, USA
Contact:

Post by chucksmith »

The stage variable are easy:

Code: Select all

PrevTotal = Total
ThisStage = STATUS
NewStatus = (If ThisStatus <> LastStatus Then @TRUE Else @FALSE)
LastStatus = ThisStatus
Total = (If NewStatus Then 0 Else Total + AMT)
When NewStatus is true and @INROWNUM is not 1 then PrevTotal should hold your accumulated value.

Perhaps consider writing to a hash file with key columns of RowNumber and RowType. RowType 0 is a detail, RowType 1 is a summary. Once full, dump the hash file to a sequential file with a selection clause of:

Code: Select all

@ID = @ID BY RowNumber BY RowType
jjrbikes
Premium Member
Premium Member
Posts: 117
Joined: Tue Nov 25, 2003 11:09 am
Location: Minneapolis

Post by jjrbikes »

Sorry for the delay...
Thanks Chuck! I'll give this a try and see what happens!!

Jennifer
chucksmith
Premium Member
Premium Member
Posts: 385
Joined: Wed Jun 16, 2004 12:43 pm
Location: Virginia, USA
Contact:

Post by chucksmith »

Hi Jennifer,

With these stage variables, they are are tracking the subtotal of the rows that came before the current row. This works great except for the last subtotal. The simple fix is to have a second output link from the transform to and aggregator. In the aggregator, DO NOT GROUP, but set the derivation of all of the columns to LAST. Both the transform and the aggregator write to the same hash file. The transform writes detail rows and all but the last subtotal row, and the aggregator will write the last subtotal row.

It sounds harder than it is.

Code: Select all

SEQ -----------> XFR -----------> HASH ----------> XFR -----------> SEQ
                   |                                ^
                   |                                |
                   V                                |
                  AGGR -----------------------------+
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

UniVerse SQL has a BREAK ON clause that allows you to capture summary rows after each set of detail rows.
Using an intermediate UV stage, insert all your rows into the table with regular inserts.
On the output side, which will be feeding your text file, use user-defined SQL something like this:

Code: Select all

SELECT LNAME, FNAME, BREAK ON STATUS, TOTAL AMT_OWED
FROM TempUVtable
ORDER BY STATUS, LNAME;
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
peternolan9
Participant
Posts: 214
Joined: Mon Feb 23, 2004 2:10 am
Location: Dublin, Ireland
Contact:

Re: Inserting Sub-Totals in Sequential output

Post by peternolan9 »

jjrbikes wrote:There's gotta be a way...
For simplicity sake, let's say I'm reading in a table that houses n records. Of those n records, when the value in field-A changes the records processed prior to that change need to be sub-totaled. This will happen 3-4 times in processing the data from the table. What I want to do is read in the records, do the necessary transformation(s) and then, in the sequential output I want to write out the records processed before the first change followed by a sub-total record, then write out the next group of records followed by a sub-total record and so on until all records have been processed. Following the last sub-total record I want to write out a grand total record.

Can I do all this with stage variables and constraints or do I need to include link-collectors, etc?

Thanks a Bunch!!!

Jennifer
Hi Jennifer,
just my 2c...

I would consider it an unworkable/not advisable practice to try an mix your subtotals and your detail records in the one file.

For example, in later processing you then need to know which rows are subtotals and all your subsequent code needs to know this.....This is very likely one reason why DS/INFA struggle to do what you are asking as a 'single file'.

It is a much more workable practice to write your detailed level rows out and then, if you need subtotals for some rows, sort/sum or reprocess the file to get the subtotals you need.....if the reason for the subtotals is that you actually want them in your database and available to users they you would be well advised to create summary level fact tables or even multi-level summary fact tables (if your volumes are large) so that the subtotals and levels you need are readily available...
Best Regards
Peter Nolan
www.peternolan.com
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Ahh, the 1NF mind!

You can label the breakpoints. Should have done that.

Code: Select all

SELECT LNAME, FNAME, BREAK ON "'Sub-total 'V" STATUS, TOTAL AMT_OWED 
FROM TempUVtable 
ORDER BY STATUS, LNAME;
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply