MatchFrequency file Vs. the weightage

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
vimalvik
Premium Member
Premium Member
Posts: 19
Joined: Tue Feb 21, 2006 5:56 am
Location: India

MatchFrequency file Vs. the weightage

Post by vimalvik »

How does the source volume affects the frequency file generation process and in turn how it affects the matching process?

Match frequency file is generated for 100m records and the source data is having some 200M.
vimal.R
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

How are you generating the match frequencies? Are you perhaps on a two node environment and only looking at figures from one node?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vairus
Participant
Posts: 52
Joined: Thu Feb 07, 2008 8:02 am
Location: Johannesburg

Re: MatchFrequency file Vs. the weightage

Post by vairus »

Hi,

MatchFrequency file describe how often a value appears in source column.

If a name appears 100 times in a column. MatchFrequency file output for that name will be a single row with number of occurance and statistical weight.
So 200M records can generate lesser output records.

In Matching process , less weight will be given to the value which occured many times and more weight will be given to the value which occured few times.

Regards
vimalvik wrote:How does the source volume affects the frequency file generation process and in turn how it affects the matching process?

Match frequency file is generated for 100m records and the source data is having some 200M.
vairamuthu
Post Reply