Regarding Quality stage Match Concept

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
thesri
Participant
Posts: 6
Joined: Mon Oct 09, 2006 11:12 pm
Location: Bangalore

Regarding Quality stage Match Concept

Post by thesri »

In Match Stage how can i assign weight for every field and what is composite weight..Is there is some samples to help me....Thanks in advance.....
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard. :D

While you CAN assign weight to every field in the Match stage, you don't want to. Let QualityStage calculate the weights based on the frequency distributions in the data, which you may have reported upon in an investigation.

There are agreement and disagreement weights calculated. You can bias these by altering them based upon external knowledge (typically about the general population compared to the sample upon which the frequencies were calculated). The agreement and disagreement weights are calculated for every field (unless excluded specifically from analysis) and reflect the "information content" - how rare the value is in its domain.

The agreement and disagreement weights are summed across the non-excluded fields in each record to yield the aggregate weight. It is the aggregate weights that determine which are masters, which are duplicates and which are residuals during the match.

You can set cutoff points that govern the decision whether a pair of records is a match or not, based on the aggregate weight of the putative duplicate.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Alexander
Participant
Posts: 17
Joined: Fri May 12, 2006 10:10 am
Location: Europe

Post by Alexander »

In my opinion, the weights based on the frequency distribuitions should only be used on fields with well known values, not over fields wich accept free text, because the frequency tables will not cover all the range, and the result can fall into extreme situations.

If you assign weight to every field you will lose a potencial tool of QS, but on the other hand you will control the weight give to wich record. And that can be a great advantage.

You can give fixed weigth as first step, and then adjust it to use frequency fields :idea:.

Good luke!!!
Post Reply