DSXchange

Posted: **Mon Jan 29, 2007 2:20 pm**

Hello All

I am having a problem calculating values based on previous row values while Hash partitioning. I would appreciate any advise.

I have account-wise daily data and am trying to create monthly aggregates:

acctid, calid, val1, val2,val3,closing - this is the layout of my data

there are roughly about 2000 different accounts and 1500 cal ids for each account.

For each account, closing of day1 becomes opening of day2 and so on. I am using stage variables for getting the previous day closing as current day's opening
and I have calculations based on closing, opening and val1,val2, val3 etc.

The job has
an enterprise stage for sourcing the data,
a lookup stage to lookup the calid,
transformer stage where the above calculations are done
and a remove duplicates stage to output only the last day of month's values.

The job works fine and I get the right results when I run all the stages in sequential mode. But when I partition the transformer in Hash partition and sort on the acctid and calid (I also tried sorting on acctid, calid in the SQL statement and Hash partitioning just on acctid as well) I do not get the right previous days values. I am thinking I need all rows of one acctid to be processed on one node and in the increasing order of calid.

Any inputs as to where I am wrong and how to rectify it, would be greatly appreciated.

Thanks
tsanan

Posted: **Mon Jan 29, 2007 4:07 pm**

What's the key of the reference table for the lookup? What partitioning algorithm are you using for the reference input link?

I suspect you may be missing some calid lookups if you're not hash partitioning on calid on both inputs to the Lookup stage (or Entire partitioning on calid on the reference input).

Given that you logic works in sequential mode, I'd guess there's nothing wrong with your logic.

Posted: **Mon Jan 29, 2007 4:28 pm**

Thanks for your response Ray.

ray.wurlod wrote:What's the key of the reference table for the lookup? What partitioning algorithm are you using for the reference input link.

The key to the lookup is calendarid which is a surrogate id and am looking up for the calendar year and month.

ray.wurlod wrote:I suspect you may be missing some calid lookups if you're not hash partitioning on calid on both inputs to the Lookup stage (or Entire partitioning on calid on the reference input)..

I have been testing it with 158 rows and see the same number 158 on the reference link as well.

ray.wurlod wrote:Given that you logic works in sequential mode, I'd guess there's nothing wrong with your logic.

Yes, Ray it works in sequential mode and it also works on single node. When I run in parallel I am not getting the right previous values.
I was wondering if the O/S has anything to do with it.

Posted: **Mon Jan 29, 2007 4:39 pm**

How are the data partitioned on both inputs to the Lookup stage?

Posted: **Tue Jan 30, 2007 8:19 am**

ray.wurlod wrote:How are the data partitioned on both inputs to the Lookup stage?

I had been trying different ways:

1. Used Order by actid,calid clause in the enterprise stage which is a stream input to the lookup stage. The reference lookup being Parallel execution mode and Preserve partition set to Propogate.
The lookup stage - Parallel, HAsh Partition (actid,calid) and Propogate.

2. Also tried without the order by clause and a sort on the the Lookup stage.

I am having the logic for obtaining the previos values and the calculations on the transformer stage which is the next one. I am having a hash partition and sort on actid (tried with actid and calid aswell).

I am a little confused as to what HAsh Partitioning I should use - Actid alone or Actid,Calid to be able to process all rows pertaining one actid on one node at the same time having the rows in the calid order.

Appreciate all the help I get from dsxchange - though I have been lurking and benefitting from the searches I often do, I never took the time to come out to even thank all of you.

Thanks
tsanan

Posted: **Tue Jan 30, 2007 12:21 pm**

tsanan wrote: The job works fine and I get the right results when I run all the stages in sequential mode. But when I partition the transformer in Hash partition and sort on the acctid and calid (I also tried sorting on acctid, calid in the SQL statement and Hash partitioning just on acctid as well) I do not get the right previous days values. I am thinking I need all rows of one acctid to be processed on one node and in the increasing order of calid.

Any inputs as to where I am wrong and how to rectify it, would be greatly appreciated.

I was able to resolve this. What I did was:
Use Order by and not sort in any of the subsequent stages. I just did a Hash partition by acctid and now get right values.

Thanks for your inputs, Ray.

Thanks
tsanan

Posted: **Tue Jan 30, 2007 12:23 pm**

You can mark your post as "Resolved" then.

DSXchange

calculating values based on previous row values

calculating values based on previous row values

Re: calculating values based on previous row values