Median Calculation

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
hiral.chauhan
Premium Member
Premium Member
Posts: 45
Joined: Fri Nov 07, 2008 12:22 pm

Median Calculation

Post by hiral.chauhan »

Hello Datastage experts,

My post is in continuation to this thread: viewtopic.php?t=89090

I think the earlier post was very old and I did not get any response from anyone so I have posted a new topic.

I am trying to calculate median using the stage variables described by VMcBurney in the post mentioned above.

For Even median, I have the 2 rows which will give me final median value. But I am not able to use "IsEvenMedian" & "EvenMedianFirstValue" stage variables correctly to calculate median because in my case when IsEvenMedian is TRUE at that time EvenMedianFirstValue is FALSE. In short their values are in two different rows.

My question is:

Can I calculate EvenMedianFirstValue and EvenMedianValue in one transformer?

I appreciate your valuable time and inputs.

Thanks,
Hiral
Thanks,
Hiral Chauhan
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There is a technique for changing stage variables only when you need to; namely to assign the stage variable to itself when no change is required.

Code: Select all

If condition_met Then calcualted_value Else svMyVariable  -->  svMyVariable
It is necessary, because the condition might not be met in row #1, that the stage variable be initialized.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

Can only guess you are missing the functionality of stage variables which do not reset as it loops through the rows.
As such, IsEvenMedian will be true once you have hit the first row after the middle ground of an even number of records (e.g. 5th of 8 records, 6th of 10 etc), else false. EvenFirstValue will be set at the middle point (e.g. 4th of 8, 5th of 10).

All in all, yes, they are done in one transformer but while processing different rows.

What I think should change is EvenMedianFirstValue variable should be declared after the EvenMedianValue so that its value still exists at the subsequent row
hiral.chauhan
Premium Member
Premium Member
Posts: 45
Joined: Fri Nov 07, 2008 12:22 pm

Post by hiral.chauhan »

I am sorry I have not been able to respond quickly enough... but Many many thanks for your expert advice Ray and Kryt0n !!!

I don't know what I would do without your help! :)

Yes. I was missing the functionality of stage variables which do not reset as it loops through the rows. And doing what Ray suggested SOLVED my problem..

I think this may be a very inefficient way of doing it, but I am running the transformer (where all my stage variables are) in Sequential mode.. there was no other way that I could think of of ensuring that EvenMedianFirstValue was calculated BEFORE EvenMedian and EvenMedianFirstValue was never reset..

Is there a way I can run the transformer in parallel AND ensure EvenMedianFirstValue is calculated before EvenMedian.....
Thanks,
Hiral Chauhan
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

Do you have a key upon which you are calculating your median? If so, partition by the key. If your median is against the full data set, then you don't have a choice but to run sequentially.

EvenMedianFirstValue will be calculated before EvenMedianValue because it was be determined on the row before. The only reason you declare it after is because it will reset on the EvenMedian row but you want its value before it gets reset
hiral.chauhan
Premium Member
Premium Member
Posts: 45
Joined: Fri Nov 07, 2008 12:22 pm

Post by hiral.chauhan »

Thank you all for your valuable inputs..!!!
Thanks,
Hiral Chauhan
Post Reply