calculating values based on previous row values
Posted: Mon Jan 29, 2007 2:20 pm
Hello All
I am having a problem calculating values based on previous row values while Hash partitioning. I would appreciate any advise.
I have account-wise daily data and am trying to create monthly aggregates:
acctid, calid, val1, val2,val3,closing - this is the layout of my data
there are roughly about 2000 different accounts and 1500 cal ids for each account.
For each account, closing of day1 becomes opening of day2 and so on. I am using stage variables for getting the previous day closing as current day's opening
and I have calculations based on closing, opening and val1,val2, val3 etc.
The job has
an enterprise stage for sourcing the data,
a lookup stage to lookup the calid,
transformer stage where the above calculations are done
and a remove duplicates stage to output only the last day of month's values.
The job works fine and I get the right results when I run all the stages in sequential mode. But when I partition the transformer in Hash partition and sort on the acctid and calid (I also tried sorting on acctid, calid in the SQL statement and Hash partitioning just on acctid as well) I do not get the right previous days values. I am thinking I need all rows of one acctid to be processed on one node and in the increasing order of calid.
Any inputs as to where I am wrong and how to rectify it, would be greatly appreciated.
Thanks
tsanan
I am having a problem calculating values based on previous row values while Hash partitioning. I would appreciate any advise.
I have account-wise daily data and am trying to create monthly aggregates:
acctid, calid, val1, val2,val3,closing - this is the layout of my data
there are roughly about 2000 different accounts and 1500 cal ids for each account.
For each account, closing of day1 becomes opening of day2 and so on. I am using stage variables for getting the previous day closing as current day's opening
and I have calculations based on closing, opening and val1,val2, val3 etc.
The job has
an enterprise stage for sourcing the data,
a lookup stage to lookup the calid,
transformer stage where the above calculations are done
and a remove duplicates stage to output only the last day of month's values.
The job works fine and I get the right results when I run all the stages in sequential mode. But when I partition the transformer in Hash partition and sort on the acctid and calid (I also tried sorting on acctid, calid in the SQL statement and Hash partitioning just on acctid as well) I do not get the right previous days values. I am thinking I need all rows of one acctid to be processed on one node and in the increasing order of calid.
![Embarassed :oops:](./images/smilies/icon_redface.gif)
Any inputs as to where I am wrong and how to rectify it, would be greatly appreciated.
Thanks
tsanan