Regarding M and u probability?

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
gsym
Charter Member
Charter Member
Posts: 118
Joined: Thu Feb 02, 2006 3:05 pm

Regarding M and u probability?

Post by gsym »

Hii Everybody

I like to get more clear picture of m and u probability used in match stage?. Can u guys help me out?

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

These are discussed in some detail in the QualityStage Essentials class offered by IBM. Have you done this class?
The one is essentially a measure of the probability of a match, while the other is a measure of the probability that, if a match is found, it might have occurred by chance.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
gsym
Charter Member
Charter Member
Posts: 118
Joined: Thu Feb 02, 2006 3:05 pm

So If We set M probability as .9?

Post by gsym »

Hii

So if we set M probability as .9, we specify QS to find the match data that is 90% match?

Thank you
wannabexpert
Participant
Posts: 13
Joined: Mon Sep 11, 2006 8:01 am

Re: Regarding M and u probability?

Post by wannabexpert »

suppose in match process u have examined a record pairs which is a matched pair ,means the field agrees this probability is called M probability.
FOR EXAMPLE in a sample of 100 matched records if 20 records disagrees the match process then m probality for this variable is 0.8 (1-0.2)


The u probability is the probability that a field agrees given that the record pair being examined is an unmatched pair.
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Post by abc123 »

The DS8 User's Guide says the following:

1)The m probability reflects the "error rate" of the column? What does "error rate" mean? In a Reference Match, we are comparing two columns not one. I am assuming that they are talking about the data column. Also, does "error rate" mean blanks or nulls in the column?

2)The guide also says, "You set the probability that the column agrees provided that the record pair is a match". How can you set the "probability" that it agrees? For example, if we have "Ave" and "Avenue". How can you determine the probability that they will match? I really don't understand the context in which "probability" is being used here.

3)The guide also says, "The u probability is the probability that the column agrees at random". What does "random" mean? For ColumnX, any row to any row?

There is very little information available anywhere on these.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Error rate is probably most easily understood as "the likelihood of error" - to say that you are 90% confident in a result is the same as admitting that there is a 10% probability that the result is not correct.

The M probability is the confidence that, if a match is found, it is a match. It is not error rate; it is more accurately described as (1.0 - error rate). Your confidence in a match is related to the rarity value within the set of values in that particular domain - for example getting a match on an uncommon value (like Wurlod) gives you more confidence than getting a match on a more common value (like Smith).

U probability allows the uncertainty of a match to be quantified to some extent; for example if you do get a match on two values (whether from one data source or two) there is a certain small probability that this match occurred purely by chance. U probability attempts to put an overall number on this "match occurred purely by chance" probability.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply