QualityStage Match - 2 Files, multiple records in output

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
bobbybooda
Participant
Posts: 11
Joined: Sun Mar 05, 2006 6:18 pm

QualityStage Match - 2 Files, multiple records in output

Post by bobbybooda »

I've set up a Match job which reads in two files. I'm getting plenty of good matches, but the output file contains many of the same match. For example, I have two records that are matching, but QS is putting those same records in the output 25 times (ilo of the other good match records). Does anyone have any suggestions on what I may have set up incorrectly? Thanks...
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Can you tell us how your match criteria were defined? Were your blocking fields tight enough? Indeed, did you use an Investigation job to determine the domains and cardinality of candidate blocking fields? And, then, there are the match fields? Were you cautious with these, or try to match on all possible fields with a single match pass? Consider changing your design to multiple, simpler passes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bobbybooda
Participant
Posts: 11
Joined: Sun Mar 05, 2006 6:18 pm

Post by bobbybooda »

Thanks...I think I solved the "problem" -- I am accustomed to seeing the files that come out of an Undupe. I see that the records from a 2-file match each actually have 2 records (one from fileA, one from FileB). This is correct, right? I was only looking at the first half of each record, so I thought there were multiples of the same record, but they were actually matching to various other records. It is different than what I am used to seeing, but this is the correct outcome of a 2-file match, right? Thanks.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That's probably right. You're used to seeing XA for master, DA for duplicate and RA for residual from a single file (UNDUP) match. The A means "first file". Now you also see B, meaning "second file", the other file in a two file match (MATCH for one-to-one, GEOMATCH for many-to-one).

This A and B convention is followed consistently in QualityStage; for example when building a custom report, you might say MOVEALL OF A or MOVEALL OF B.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply