Page 1 of 1

Median Calculation

Posted: Wed Jun 20, 2012 12:14 pm
by hiral.chauhan
Hello Datastage experts,

My post is in continuation to this thread: viewtopic.php?t=89090

I think the earlier post was very old and I did not get any response from anyone so I have posted a new topic.

I am trying to calculate median using the stage variables described by VMcBurney in the post mentioned above.

For Even median, I have the 2 rows which will give me final median value. But I am not able to use "IsEvenMedian" & "EvenMedianFirstValue" stage variables correctly to calculate median because in my case when IsEvenMedian is TRUE at that time EvenMedianFirstValue is FALSE. In short their values are in two different rows.

My question is:

Can I calculate EvenMedianFirstValue and EvenMedianValue in one transformer?

I appreciate your valuable time and inputs.

Thanks,
Hiral

Posted: Wed Jun 20, 2012 4:57 pm
by ray.wurlod
There is a technique for changing stage variables only when you need to; namely to assign the stage variable to itself when no change is required.

Code: Select all

If condition_met Then calcualted_value Else svMyVariable  -->  svMyVariable
It is necessary, because the condition might not be met in row #1, that the stage variable be initialized.

Posted: Wed Jun 20, 2012 5:00 pm
by Kryt0n
Can only guess you are missing the functionality of stage variables which do not reset as it loops through the rows.
As such, IsEvenMedian will be true once you have hit the first row after the middle ground of an even number of records (e.g. 5th of 8 records, 6th of 10 etc), else false. EvenFirstValue will be set at the middle point (e.g. 4th of 8, 5th of 10).

All in all, yes, they are done in one transformer but while processing different rows.

What I think should change is EvenMedianFirstValue variable should be declared after the EvenMedianValue so that its value still exists at the subsequent row

Posted: Thu Jun 28, 2012 12:09 pm
by hiral.chauhan
I am sorry I have not been able to respond quickly enough... but Many many thanks for your expert advice Ray and Kryt0n !!!

I don't know what I would do without your help! :)

Yes. I was missing the functionality of stage variables which do not reset as it loops through the rows. And doing what Ray suggested SOLVED my problem..

I think this may be a very inefficient way of doing it, but I am running the transformer (where all my stage variables are) in Sequential mode.. there was no other way that I could think of of ensuring that EvenMedianFirstValue was calculated BEFORE EvenMedian and EvenMedianFirstValue was never reset..

Is there a way I can run the transformer in parallel AND ensure EvenMedianFirstValue is calculated before EvenMedian.....

Posted: Thu Jun 28, 2012 4:35 pm
by Kryt0n
Do you have a key upon which you are calculating your median? If so, partition by the key. If your median is against the full data set, then you don't have a choice but to run sequentially.

EvenMedianFirstValue will be calculated before EvenMedianValue because it was be determined on the row before. The only reason you declare it after is because it will reset on the EvenMedian row but you want its value before it gets reset

Posted: Tue Aug 14, 2012 10:25 am
by hiral.chauhan
Thank you all for your valuable inputs..!!!