m-prob and u-prob in QS

kkumardatastage · Post by **kkumardatastage** » Mon Dec 07, 2009 5:30 pm

Hi

Please can any one help me,
1)what is meant by m-prob and U-prob in quality stage match designer, I know the default value we need to give but what will this effect to data. Please if u got some example that will be great.
2)what is the minimum and maximum Cutoff we can use for Match and Clericals and
3)what will be the Parm 1, I know the Parm value is the exact match data should be 900 for (Names) but what will be the Parm value for DOB(can u give me some example)
4)Match designer Places with Agree Weight and Disagree Weight but is there any chance to get the Weight to be displaces the weights for each and every individual coloumns(there is a statistics shows u the percentage but i need to check the Scores for the columns)

Please can you help me in this queations.

Thanks
k

ray.wurlod · Post by **ray.wurlod** » Mon Dec 07, 2009 6:38 pm

Tell you what, you research in the QualityStage User Guide the answers to your questions and post back here, and we'll help to clarify any remaining doubts.

When you post back, don't address your question to U (one of our posters) specifically - U doesn't check in all that often. Note, perhaps, that the second person personal pronoun in English is spelled "you", and that we strive for a professional standard of written English here on DSXchange. There is no need for SMS-style abbreviations; you are not limited to 140 characters.

JRodriguez · Post by **JRodriguez** » Wed Dec 09, 2009 3:44 pm

m and u-Prob

- m-Prob reflects the error rate for the column. In Layman's terms this is the probability that two columns that should match, ending matching

- u- Prob The u probability is the probability that the column agrees provided that the record pair does not match. In Layman's terms this is the probability that two columns that shouldn't match, ending matching

Match and Clerical Cut off values

Zero values should be the minimun, negative values doesn't make sense. The maximun depend on your data and how many token the record contain. Each token in the record add/substract value to the composite weight depending on the m and u-Prob, so a fix maximun cut off value doesn't exist

Normally I used zero(0) as Match, duplicate and clerical cut off value as the initial value to start researching the final cut off values for the data / match specification. The ultimate set of cut off values should be taken from the histogram which help you to determine, graphically, and with sample data at which composite weight level the records don't match anymore ....you need to have a very good knowledge of the data or get somebody from the business side to help you with the task. Same process is used for clericals

Param1

- Param1 have different meaning in diferent Match Comparison, so it will depend on the match algorism that you are using

If you are matching names or material description you should allows space for mispelling, typos, transposition and other common errors. When you are matching DOB you don't allow that kind of error, so you used a different comparison