Is there a better way? Adding / Combining Rows

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Is there a better way? Adding / Combining Rows

Post by asorrell »

Ok - to start with, I have a solution for this particular problem, I just want to know if there is a BETTER solution.

I'm getting a daily feed that contains sales amounts that are tagged with time-of-sale in 15 minute increments. A simplified look at a row of data looks something like this:

Simplified Input:

Store Date____ Time___ Amount
0001 20060418 093000 $157.00

The required XML output format needs amounts summed per 15 minute segment and then output in a single row per-store, per-date, with 96 comma separated values containing empty string if no sales were submitted for a 15-minute period, or the sum of that timeslots sales amounts. For example:

Required Output:

Store Date___ Summed Sales Per 15-Minute Increments
0001 2000418 ,,,,,,,,,,$300.00,$456.00,$950.00,,,$340.00,,,,,,,, (etc)

Here's my current solution:

1) Job1 sorts and aggregates data for each 15 minute increment. It outputs these aggregates into a temp file (call it actual_sales file). It also uses a de-duplicator to produce a separate file containing each valid store / and sales date combination.

2) I have a static sequential file containing the 96 possible valid timeslots for each day (from 001500 to 240000). A post-processing script for Job1 takes the file with valid store / sales date combinations and then uses the static increment file to create 96 rows for each sales/date combination (call it possible_sales file).

3) Job2 takes the possible_sales file and performs a left outer join with the actual_sales file. This gives me a 96 records per store/day combo, with nulls for rows that had no sales in that time slot and valid amounts when sales occurred.

4) Post-processing scripts for Job2 then use awk to take the beginning entry for each store/date, strip off time time increment, then concatenate the sales amounts (null or otherwise) from the 95 following records. This is stuffed into a final sequential file.

5) Job3 then takes this output and formats it for XML - which is another story for another day...

I had a very limited amount of time to find a solution, and this was the first one I thought of (inelegant as it was). It works, though I'm a bit dissatisfied with its heavy reliance on single-threaded scripts. I was just wondering if there is a better solution to the two major issues:

1) Creating missing rows
2) Combining elements from rows

I tried various parallel stages and at this point I'm stumped. Any thoughts?
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

All PX or can you use a database (Oracle maybe) to faciliate?
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
emma
Premium Member
Premium Member
Posts: 95
Joined: Fri Aug 08, 2003 10:30 am
Location: Montreal

Post by emma »

Did you try the switch stage?

input_file ---> Filter_Store ---> Switch_Slot--->Funnel_sequence_mode ---> Transformer ---> output_file

or

input_file ---> Switch_Slot--->Funnel_sequence_mode ---> Transformer ---> output_file
Thanks,
Emma
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Is there a better way? Adding / Combining Rows

Post by asorrell »

Ken - Due to some external constraints here I'm trying to do this in PX / local processing only w/o a database.

Emma - I'll play around with switch this evening and see if it works - thanks for the suggestions.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
jasper
Participant
Posts: 111
Joined: Mon May 06, 2002 1:25 am
Location: Belgium

Post by jasper »

something we did (but with lot less fields):

in a transformer create the record with all the fields, so 96 times logic like : if time=001500 then amount else NULL for field amount0015,if time=003000 then amount else NULL for field amount0030,...

after this do an aggregation where you summarize all these fields (which is actually more to put it all into one record, cause it will summarize one value with a lot of nulls.)

We're doing this with good performance for 12 columns, don't know about 96. But it will look better in the job I feel.
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Found a better way! Combining Rows / Consolidating Fieldss

Post by asorrell »

It turns out there IS a better way to consolidate rows. After giving it some thought we found a way to use a transformer stage to consolidate the rows.

First of all, the consolidation process requires that each "group" (ie: store / date / increments) be contiguous and in sorted order. To do that I set the transformer to sequential order sorted the input by store, then date, then increment.

I then created several stage variables with the following names and properties:

svNewStoreDate, VarChar(1), default="1"
svCompletedRecord, VarChar(2000), default=""
svBuildingRecord, VarChar(2000), default=""
svWriteFlag, VarChar(3), default="No"
SvStore,VarChar(4),default="0000"
svDate, VarChar(8), default="00000000"

The variables are then derived as follows in the transform. Please note: the order is VERY important, as these must be evaluated in the proper sequence for this to work:


svNewStoreDate: If (incoming.Store <> svStore) or (incoming.Date <> svDate) Then 1 Else 0

svCompletedRecord: If svNewStoreDate then svBuildingRecord Else ''

svBuildingRecord: If svNewStoreDate then incoming.Store:"|":incoming.Date:"|": incoming.IncrementSalesAmount Else svBuildingRecord:",":incoming.IncrementSalesAmount

svWriteFlag: If svNewStoreDate and (svCompletedRecord <> '') then 'Yes' Else 'No'

svStore: incoming.Store

svDate: incoming.Date


The output link for the transformer has the following constraint in it:
svWriteFlag="Yes"

and it outputs svCompletedRecord to an outgoing VarChar(2000) column


In essence - this transformer notes when either the store or date changes. When they DO change - it realizes the record it has been building is completed, and should be written out (unless it is empty because it is the first record), so it turns the WriteFlag on to allow the constraint to output the record. If the store/date match what it saw on the previous record, it grabs the field it needs and appends it to the record it is building.

The only thing to note is that a dummy record (I used 9999|99999999|999999) must be appended to the very end of the input file to "kick out" the very last record being constructed.

It operates several orders of magnitude faster than the awk script I was using (even in "single stream" mode). Processing time for a few thousand records dropped from 20 minutes (ugh!) to less than 15 seconds.

Hope this comes in handy!
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
jasper
Participant
Posts: 111
Joined: Mon May 06, 2002 1:25 am
Location: Belgium

Post by jasper »

if you place an order stage part of this logic can be done in there. I'm not sure about the exact name but there is an option for a key-change column. This will be set to 1 if the previous record had an other key then this one.
Post Reply