QualityStage Match - 2 Files, multiple records in output

bobbybooda · Post by **bobbybooda** » Wed Apr 26, 2006 8:25 pm

I've set up a Match job which reads in two files. I'm getting plenty of good matches, but the output file contains many of the same match. For example, I have two records that are matching, but QS is putting those same records in the output 25 times (ilo of the other good match records). Does anyone have any suggestions on what I may have set up incorrectly? Thanks...

ray.wurlod · Post by **ray.wurlod** » Wed Apr 26, 2006 8:56 pm

Can you tell us how your match criteria were defined? Were your blocking fields tight enough? Indeed, did you use an Investigation job to determine the domains and cardinality of candidate blocking fields? And, then, there are the match fields? Were you cautious with these, or try to match on all possible fields with a single match pass? Consider changing your design to multiple, simpler passes.

bobbybooda · Post by **bobbybooda** » Thu Apr 27, 2006 8:05 pm

Thanks...I think I solved the "problem" -- I am accustomed to seeing the files that come out of an Undupe. I see that the records from a 2-file match each actually have 2 records (one from fileA, one from FileB). This is correct, right? I was only looking at the first half of each record, so I thought there were multiples of the same record, but they were actually matching to various other records. It is different than what I am used to seeing, but this is the correct outcome of a 2-file match, right? Thanks.

ray.wurlod · Post by **ray.wurlod** » Thu Apr 27, 2006 8:46 pm

That's probably right. You're used to seeing XA for master, DA for duplicate and RA for residual from a single file (UNDUP) match. The A means "first file". Now you also see B, meaning "second file", the other file in a two file match (MATCH for one-to-one, GEOMATCH for many-to-one).

This A and B convention is followed consistently in QualityStage; for example when building a custom report, you might say MOVEALL OF A or MOVEALL OF B.