Hello All
I am having a problem calculating values based on previous row values while Hash partitioning. I would appreciate any advise.
I have account-wise daily data and am trying to create monthly aggregates:
acctid, calid, val1, val2,val3,closing - this is the layout of my data
there are roughly about 2000 different accounts and 1500 cal ids for each account.
For each account, closing of day1 becomes opening of day2 and so on. I am using stage variables for getting the previous day closing as current day's opening
and I have calculations based on closing, opening and val1,val2, val3 etc.
The job has
an enterprise stage for sourcing the data,
a lookup stage to lookup the calid,
transformer stage where the above calculations are done
and a remove duplicates stage to output only the last day of month's values.
The job works fine and I get the right results when I run all the stages in sequential mode. But when I partition the transformer in Hash partition and sort on the acctid and calid (I also tried sorting on acctid, calid in the SQL statement and Hash partitioning just on acctid as well) I do not get the right previous days values. I am thinking I need all rows of one acctid to be processed on one node and in the increasing order of calid.
Any inputs as to where I am wrong and how to rectify it, would be greatly appreciated.
Thanks
tsanan
calculating values based on previous row values
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
What's the key of the reference table for the lookup? What partitioning algorithm are you using for the reference input link?
I suspect you may be missing some calid lookups if you're not hash partitioning on calid on both inputs to the Lookup stage (or Entire partitioning on calid on the reference input).
Given that you logic works in sequential mode, I'd guess there's nothing wrong with your logic.
I suspect you may be missing some calid lookups if you're not hash partitioning on calid on both inputs to the Lookup stage (or Entire partitioning on calid on the reference input).
Given that you logic works in sequential mode, I'd guess there's nothing wrong with your logic.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Thanks for your response Ray.
Yes, Ray it works in sequential mode and it also works on single node. When I run in parallel I am not getting the right previous values.
I was wondering if the O/S has anything to do with it.
The key to the lookup is calendarid which is a surrogate id and am looking up for the calendar year and month.ray.wurlod wrote:What's the key of the reference table for the lookup? What partitioning algorithm are you using for the reference input link.
I have been testing it with 158 rows and see the same number 158 on the reference link as well.ray.wurlod wrote:I suspect you may be missing some calid lookups if you're not hash partitioning on calid on both inputs to the Lookup stage (or Entire partitioning on calid on the reference input)..
ray.wurlod wrote:Given that you logic works in sequential mode, I'd guess there's nothing wrong with your logic.
Yes, Ray it works in sequential mode and it also works on single node. When I run in parallel I am not getting the right previous values.
I was wondering if the O/S has anything to do with it.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
I had been trying different ways:ray.wurlod wrote:How are the data partitioned on both inputs to the Lookup stage?
1. Used Order by actid,calid clause in the enterprise stage which is a stream input to the lookup stage. The reference lookup being Parallel execution mode and Preserve partition set to Propogate.
The lookup stage - Parallel, HAsh Partition (actid,calid) and Propogate.
2. Also tried without the order by clause and a sort on the the Lookup stage.
I am having the logic for obtaining the previos values and the calculations on the transformer stage which is the next one. I am having a hash partition and sort on actid (tried with actid and calid aswell).
I am a little confused as to what HAsh Partitioning I should use - Actid alone or Actid,Calid to be able to process all rows pertaining one actid on one node at the same time having the rows in the calid order.
Appreciate all the help I get from dsxchange - though I have been lurking and benefitting from the searches I often do, I never took the time to come out to even thank all of you.
Thanks
tsanan
Re: calculating values based on previous row values
tsanan wrote: The job works fine and I get the right results when I run all the stages in sequential mode. But when I partition the transformer in Hash partition and sort on the acctid and calid (I also tried sorting on acctid, calid in the SQL statement and Hash partitioning just on acctid as well) I do not get the right previous days values. I am thinking I need all rows of one acctid to be processed on one node and in the increasing order of calid.
Any inputs as to where I am wrong and how to rectify it, would be greatly appreciated.
I was able to resolve this. What I did was:
Use Order by and not sort in any of the subsequent stages. I just did a Hash partition by acctid and now get right values.
Thanks for your inputs, Ray.
Thanks
tsanan