Inserting Sub-Totals in Sequential output
Moderators: chulett, rschirm, roy
Inserting Sub-Totals in Sequential output
There's gotta be a way...
For simplicity sake, let's say I'm reading in a table that houses n records. Of those n records, when the value in field-A changes the records processed prior to that change need to be sub-totaled. This will happen 3-4 times in processing the data from the table. What I want to do is read in the records, do the necessary transformation(s) and then, in the sequential output I want to write out the records processed before the first change followed by a sub-total record, then write out the next group of records followed by a sub-total record and so on until all records have been processed. Following the last sub-total record I want to write out a grand total record.
Can I do all this with stage variables and constraints or do I need to include link-collectors, etc?
Thanks a Bunch!!!
Jennifer
For simplicity sake, let's say I'm reading in a table that houses n records. Of those n records, when the value in field-A changes the records processed prior to that change need to be sub-totaled. This will happen 3-4 times in processing the data from the table. What I want to do is read in the records, do the necessary transformation(s) and then, in the sequential output I want to write out the records processed before the first change followed by a sub-total record, then write out the next group of records followed by a sub-total record and so on until all records have been processed. Following the last sub-total record I want to write out a grand total record.
Can I do all this with stage variables and constraints or do I need to include link-collectors, etc?
Thanks a Bunch!!!
Jennifer
Hi,
Do you mean accumulative subtotal of the same columns ?
Could you give an example of some input rows and your output rquirements?
Do you mean accumulative subtotal of the same columns ?
Could you give an example of some input rows and your output rquirements?
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
![Image](http://www.worldcommunitygrid.org/images/logo.gif)
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
![Image](http://www.worldcommunitygrid.org/images/logo.gif)
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
Your biggest problem is that a DataStage transformer does not let you create a row that didn't exist in your input file. For example you cannot feed one row into your transformer down one link and send two rows down an output link. So while it is very easy to create your subtotal values it is very hard to write them out. If DataStage reads in 200 rows it really wants to write out 200 rows and not 210 with 10 sub total rows.
What you can do is write your output rows down one link and your subtotal rows down another link. They would end up in two files. Your trigger for writing out a sub total row is when a new group is found.
To get them in one file you would join those two output links together using a link collector and write them out to one file, hopefully in the right order! Make your subtotal the first output link from the transformer so it is collected by the link and appended after the group it belongs to before the new group is written out.
For this to work the link collector will be expecting your subtotal row to have the same metadata format as your other rows.
What you can do is write your output rows down one link and your subtotal rows down another link. They would end up in two files. Your trigger for writing out a sub total row is when a new group is found.
To get them in one file you would join those two output links together using a link collector and write them out to one file, hopefully in the right order! Make your subtotal the first output link from the transformer so it is collected by the link and appended after the group it belongs to before the new group is written out.
For this to work the link collector will be expecting your subtotal row to have the same metadata format as your other rows.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Provided the input file is sorted, these techniques are OK. Thinking slightly outside the square, another approach is to use the ODBC driver for text files and generate the sub-totals with a grouping query. Of course this does not insert them after the detail rows in the output; however there's probably something clever you could do with a Link Collector stage whose collection algorithm is set to Sort/Merge.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Thanks for all the ideas.
Roy - here is a simplified sample of my input:
LNAME FNAME STATUS AMT OWED
Goodhue Alison New 0
Queen Dairy New 13.23
Arc Joan Res 27.19
Hill James Res 36
Smith Sam Res 93.78
Truman Harry Res 84.5
Harris Mary Vis 73.27
McDonald Sally Vis 56.75
Murray Ann Vis 11.91
and the OUTPUT we'd like to generate would look like this:
LNAME FNAME STATUS AMT OWED
Goodhue Alison New 0
Queen Dairy New 13.23
SubTotal For Status New = 13.23
Arc Joan Res 27.19
Hill James Res 36
Smith Sam Res 93.78
Truman Harry Res 84.5
SubTotal For Status Res = 241.47
Harris Mary Vis 73.27
McDonald Sally Vis 56.75
Murray Ann Vis 11.91
SubTotal For Status Vis = 141.93
Grand Total For All Status' = 396.63
If I can do this with Stage Variables - how the heck do I set those up??
Thanks again Everyone!!
Jennifer
Roy - here is a simplified sample of my input:
LNAME FNAME STATUS AMT OWED
Goodhue Alison New 0
Queen Dairy New 13.23
Arc Joan Res 27.19
Hill James Res 36
Smith Sam Res 93.78
Truman Harry Res 84.5
Harris Mary Vis 73.27
McDonald Sally Vis 56.75
Murray Ann Vis 11.91
and the OUTPUT we'd like to generate would look like this:
LNAME FNAME STATUS AMT OWED
Goodhue Alison New 0
Queen Dairy New 13.23
SubTotal For Status New = 13.23
Arc Joan Res 27.19
Hill James Res 36
Smith Sam Res 93.78
Truman Harry Res 84.5
SubTotal For Status Res = 241.47
Harris Mary Vis 73.27
McDonald Sally Vis 56.75
Murray Ann Vis 11.91
SubTotal For Status Vis = 141.93
Grand Total For All Status' = 396.63
If I can do this with Stage Variables - how the heck do I set those up??
Thanks again Everyone!!
Jennifer
-
- Premium Member
- Posts: 385
- Joined: Wed Jun 16, 2004 12:43 pm
- Location: Virginia, USA
- Contact:
The stage variable are easy:
When NewStatus is true and @INROWNUM is not 1 then PrevTotal should hold your accumulated value.
Perhaps consider writing to a hash file with key columns of RowNumber and RowType. RowType 0 is a detail, RowType 1 is a summary. Once full, dump the hash file to a sequential file with a selection clause of:
Code: Select all
PrevTotal = Total
ThisStage = STATUS
NewStatus = (If ThisStatus <> LastStatus Then @TRUE Else @FALSE)
LastStatus = ThisStatus
Total = (If NewStatus Then 0 Else Total + AMT)
Perhaps consider writing to a hash file with key columns of RowNumber and RowType. RowType 0 is a detail, RowType 1 is a summary. Once full, dump the hash file to a sequential file with a selection clause of:
Code: Select all
@ID = @ID BY RowNumber BY RowType
Chuck Smith
www.anotheritco.com
www.anotheritco.com
-
- Premium Member
- Posts: 385
- Joined: Wed Jun 16, 2004 12:43 pm
- Location: Virginia, USA
- Contact:
Hi Jennifer,
With these stage variables, they are are tracking the subtotal of the rows that came before the current row. This works great except for the last subtotal. The simple fix is to have a second output link from the transform to and aggregator. In the aggregator, DO NOT GROUP, but set the derivation of all of the columns to LAST. Both the transform and the aggregator write to the same hash file. The transform writes detail rows and all but the last subtotal row, and the aggregator will write the last subtotal row.
It sounds harder than it is.
With these stage variables, they are are tracking the subtotal of the rows that came before the current row. This works great except for the last subtotal. The simple fix is to have a second output link from the transform to and aggregator. In the aggregator, DO NOT GROUP, but set the derivation of all of the columns to LAST. Both the transform and the aggregator write to the same hash file. The transform writes detail rows and all but the last subtotal row, and the aggregator will write the last subtotal row.
It sounds harder than it is.
Code: Select all
SEQ -----------> XFR -----------> HASH ----------> XFR -----------> SEQ
| ^
| |
V |
AGGR -----------------------------+
Chuck Smith
www.anotheritco.com
www.anotheritco.com
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
UniVerse SQL has a BREAK ON clause that allows you to capture summary rows after each set of detail rows.
Using an intermediate UV stage, insert all your rows into the table with regular inserts.
On the output side, which will be feeding your text file, use user-defined SQL something like this:
Using an intermediate UV stage, insert all your rows into the table with regular inserts.
On the output side, which will be feeding your text file, use user-defined SQL something like this:
Code: Select all
SELECT LNAME, FNAME, BREAK ON STATUS, TOTAL AMT_OWED
FROM TempUVtable
ORDER BY STATUS, LNAME;
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 214
- Joined: Mon Feb 23, 2004 2:10 am
- Location: Dublin, Ireland
- Contact:
Re: Inserting Sub-Totals in Sequential output
Hi Jennifer,jjrbikes wrote:There's gotta be a way...
For simplicity sake, let's say I'm reading in a table that houses n records. Of those n records, when the value in field-A changes the records processed prior to that change need to be sub-totaled. This will happen 3-4 times in processing the data from the table. What I want to do is read in the records, do the necessary transformation(s) and then, in the sequential output I want to write out the records processed before the first change followed by a sub-total record, then write out the next group of records followed by a sub-total record and so on until all records have been processed. Following the last sub-total record I want to write out a grand total record.
Can I do all this with stage variables and constraints or do I need to include link-collectors, etc?
Thanks a Bunch!!!
Jennifer
just my 2c...
I would consider it an unworkable/not advisable practice to try an mix your subtotals and your detail records in the one file.
For example, in later processing you then need to know which rows are subtotals and all your subsequent code needs to know this.....This is very likely one reason why DS/INFA struggle to do what you are asking as a 'single file'.
It is a much more workable practice to write your detailed level rows out and then, if you need subtotals for some rows, sort/sum or reprocess the file to get the subtotals you need.....if the reason for the subtotals is that you actually want them in your database and available to users they you would be well advised to create summary level fact tables or even multi-level summary fact tables (if your volumes are large) so that the subtotals and levels you need are readily available...
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Ahh, the 1NF mind!
You can label the breakpoints. Should have done that.
You can label the breakpoints. Should have done that.
Code: Select all
SELECT LNAME, FNAME, BREAK ON "'Sub-total 'V" STATUS, TOTAL AMT_OWED
FROM TempUVtable
ORDER BY STATUS, LNAME;
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.