How to Handle Overlapping Priorities in Data Values

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
myukassign
Premium Member
Premium Member
Posts: 238
Joined: Fri Jul 25, 2008 8:55 am

How to Handle Overlapping Priorities in Data Values

Post by myukassign »

Hi ,

I have a output to achieve for which my experience in datastage is not helping much. I request if anyone can help me.

My Input
-----------

Code: Select all

FILE 1

PID        ORG    DC             Stdate           EndDate            Price

P1         10       1           1-MAR-10          30-OCT-10          100


FILE 2

PID        ORG    DC             Stdate           EndDate            Price

P1         10       1           22-MAR-10          1-OCT-10          98
MY OUTPUT SHOULD BE
-----------------------------

Code: Select all

PID         ORG    DC             Stdate           EndDate            Price

P1         10       1           1-MAR-10          21-MAR-10          100
P1         10       1           22-MAR-10         01-OCT-10          98
P1         10       1           2-OCT-10          30-OCT-10          100
Please suggest logic to do that.

Rules here are.

The records in the second file should always get priority. The first file start and end date and price should adjust according to a record in the second file. Please note that in my output, the dates/period of price is closing before the start date in second file

The keys of the table / file are PID, ORG & DC


Any ideas to do this .....
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Interesting problem. Four questions:

1) I assume it is a many to many relationship? There can be multiple records in both File 1 and File 2 with the same keys?

2) What do you do if a price in File 1 is completely overlaid (start to finish) by a higher priority price in file 2? Eliminate the record?

3) Do any of the start / end dates in File 2 overlap for identical keys? (I assume not.)

4) Do you have complete coverage of all dates or would their possibly be gaps with no prices (for example holidays)?
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
myukassign
Premium Member
Premium Member
Posts: 238
Joined: Fri Jul 25, 2008 8:55 am

Post by myukassign »

asorrell wrote:Interesting problem. Four questions:

1) I assume it is a many to many relationship? There can be multiple records in both File 1 and File 2 with the same keys?

2) What do you do if a price in File 1 is completely overlaid (start to finish) by a higher priority price in file 2? Eliminate the record?

3) Do any of the start / end dates in File 2 overlap for identical keys? (I assume not.)

4) Do you have complete coverage of all dates or would their possibly be gaps with no prices (for example holidays)?
1. Yes...

2. yes...

3. possible

4. No there wont be gaps......
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Because of the "many to many" relationships and the multiple overlays that are possible, I'm thinking the best way to do this might be to build a lookup table from the data, and then use that to re-create your date ranges. This is a complex operation because some of it really would be easier to do in a programming language, but I think it can be done.. At a very high level, here's an overview of the steps...

1) make one pass through file one - storing prices in a lookup table via:
PID, Org, DC, "inclusive dates" - which means if you have an entry from May 1 through May 5, there would be five entries in the table (one for each day).

2) Make a pass through file two, doing the same - which means that in many cases you are overlaying entries for file one.

3) Then use a row generator to generate rows for each valid PID, ORG, DC, date combination and lookup the price.

4) Feed the output to a transformer that uses save variables to store the current PID, Org, DC, StartDate and Price. It should also have a flag that indicates a "price change" (ie - current incoming price and stored price don't match). A constraint on the output link should only output a "record" when the price changes. When the price changes, output the saved values with the previous day as the "end date". Then store the new incoming values and keep looking for the next price change. The constraint should also not output any date ranges where the price is "null" - just in case there are gaps.

You'll also have to have a "trailer" of some sort to close out the last range.

Even with this overview, this will be difficult to do and will require an extensive knowledge of DataStage. Not an easy task...

Also - because you have marked this as a "parallel" job - I'm going to move this to the parallel forum in a couple of days.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Probably a simpler solution could be to

1.) combine both files into a single one
2.) ordered by key, fromDate and toDate
3.) append an extra dummy (end-of-file) record (if parallel, then 1 for each node)
4.) Using stage variables, compare current record with previous record and write out previous record when there is a change in the key.
By this way, you can ensure correct start and end dates are provided

Give it a shot and respond if you have any issues.
myukassign
Premium Member
Premium Member
Posts: 238
Joined: Fri Jul 25, 2008 8:55 am

Post by myukassign »

Sainath.Srinivasan wrote:Probably a simpler solution could be to

1.) combine both files into a single one
2.) ordered by key, fromDate and toDate
3.) append an extra dummy (end-of-file) record (if parallel, then 1 for each node)
4.) Using stage variables, compare current record with previous record and write out previous record when there is a change in the key.
By this way, you can ensure correct start and end dates are provided

Give it a shot and respond if you have any issues.
Can you please elaborate step 3. I did not understand.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You need something to drive out the last group change and a known "EOD" record will allow that. This capability will be built natively into "a future" (the next?) release.
-craig

"You can never have too many knives" -- Logan Nine Fingers
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

The solution posted by Sainath.Srinivasan is not as simple as he proposed. It would require VERY careful tracking of both File 1 and File 2 current prices in order to determine the correct output in overlapping dates.

For example:

File 1 has prices of: May 1 - May 10, $100 and May 11 - May 20, $50
File 2 has price of May 5 - May 15, $75

Should result in required output of
May 1 - May 4 $100
May 5 - May 15, $75
May 16 - May 20, $50

Also (2nd example)

File 1 has price of June 1 - June 20, $100
File 2 has prices of June 5 - June 10, $75 and June 15 - June 18, $50

Should result in required output of

June 1 - June 4, $100
June 5 - June 10, $75
June 11 - June 14, $100
June 15 - June 18, $50
June 19 - June 20, $100

Doing that using just saved variables might be do-able, but would be fairly difficult, requiring numerous "state" variables to keep track of active and inactive prices. I was trying to post something that might be simpler to implement, but in truth, there probably isn't an "easy" solution to this, it will be tough either way.

Note - I still plan on moving this thread to parallel forum - and will probably change the title to "Handling Overlapping Priorities in Data Values". The current title about "Extrapolation" isn't relevant, there is no extrapolation going on here - everything is spelled out.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
myukassign
Premium Member
Premium Member
Posts: 238
Joined: Fri Jul 25, 2008 8:55 am

Post by myukassign »

I was doing Unix shell scripting to achive the same. It's in the final stage. So I will use an exec command stge n sequncer to call my Unix script wo do the same for me.

It was really helpful. Thanks a lot for everyone who supported me.

I am marking this post as 'Resolved with workaround"

Thanks.
swades
Premium Member
Premium Member
Posts: 323
Joined: Mon Dec 04, 2006 11:52 pm

Post by swades »

I have DS job design ready for this issue. Where I can upload the Job DSX :?:
Post Reply