Page 1 of 1

Regarding M and u probability?

Posted: Thu Jul 27, 2006 3:54 pm
by gsym
Hii Everybody

I like to get more clear picture of m and u probability used in match stage?. Can u guys help me out?

Thanks

Posted: Fri Jul 28, 2006 4:31 am
by ray.wurlod
These are discussed in some detail in the QualityStage Essentials class offered by IBM. Have you done this class?
The one is essentially a measure of the probability of a match, while the other is a measure of the probability that, if a match is found, it might have occurred by chance.

So If We set M probability as .9?

Posted: Fri Jul 28, 2006 11:45 am
by gsym
Hii

So if we set M probability as .9, we specify QS to find the match data that is 90% match?

Thank you

Re: Regarding M and u probability?

Posted: Mon Jan 22, 2007 1:35 am
by wannabexpert
suppose in match process u have examined a record pairs which is a matched pair ,means the field agrees this probability is called M probability.
FOR EXAMPLE in a sample of 100 matched records if 20 records disagrees the match process then m probality for this variable is 0.8 (1-0.2)


The u probability is the probability that a field agrees given that the record pair being examined is an unmatched pair.

Posted: Sat Nov 10, 2007 5:37 pm
by abc123
The DS8 User's Guide says the following:

1)The m probability reflects the "error rate" of the column? What does "error rate" mean? In a Reference Match, we are comparing two columns not one. I am assuming that they are talking about the data column. Also, does "error rate" mean blanks or nulls in the column?

2)The guide also says, "You set the probability that the column agrees provided that the record pair is a match". How can you set the "probability" that it agrees? For example, if we have "Ave" and "Avenue". How can you determine the probability that they will match? I really don't understand the context in which "probability" is being used here.

3)The guide also says, "The u probability is the probability that the column agrees at random". What does "random" mean? For ColumnX, any row to any row?

There is very little information available anywhere on these.

Posted: Sat Nov 10, 2007 11:41 pm
by ray.wurlod
Error rate is probably most easily understood as "the likelihood of error" - to say that you are 90% confident in a result is the same as admitting that there is a 10% probability that the result is not correct.

The M probability is the confidence that, if a match is found, it is a match. It is not error rate; it is more accurately described as (1.0 - error rate). Your confidence in a match is related to the rarity value within the set of values in that particular domain - for example getting a match on an uncommon value (like Wurlod) gives you more confidence than getting a match on a more common value (like Smith).

U probability allows the uncertainty of a match to be quantified to some extent; for example if you do get a match on two values (whether from one data source or two) there is a certain small probability that this match occurred purely by chance. U probability attempts to put an overall number on this "match occurred purely by chance" probability.