QualityStage Match - 2 Files, multiple records in output
-
- Participant
- Posts: 11
- Joined: Sun Mar 05, 2006 6:18 pm
QualityStage Match - 2 Files, multiple records in output
I've set up a Match job which reads in two files. I'm getting plenty of good matches, but the output file contains many of the same match. For example, I have two records that are matching, but QS is putting those same records in the output 25 times (ilo of the other good match records). Does anyone have any suggestions on what I may have set up incorrectly? Thanks...
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Can you tell us how your match criteria were defined? Were your blocking fields tight enough? Indeed, did you use an Investigation job to determine the domains and cardinality of candidate blocking fields? And, then, there are the match fields? Were you cautious with these, or try to match on all possible fields with a single match pass? Consider changing your design to multiple, simpler passes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 11
- Joined: Sun Mar 05, 2006 6:18 pm
Thanks...I think I solved the "problem" -- I am accustomed to seeing the files that come out of an Undupe. I see that the records from a 2-file match each actually have 2 records (one from fileA, one from FileB). This is correct, right? I was only looking at the first half of each record, so I thought there were multiples of the same record, but they were actually matching to various other records. It is different than what I am used to seeing, but this is the correct outcome of a 2-file match, right? Thanks.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
That's probably right. You're used to seeing XA for master, DA for duplicate and RA for residual from a single file (UNDUP) match. The A means "first file". Now you also see B, meaning "second file", the other file in a two file match (MATCH for one-to-one, GEOMATCH for many-to-one).
This A and B convention is followed consistently in QualityStage; for example when building a custom report, you might say MOVEALL OF A or MOVEALL OF B.
This A and B convention is followed consistently in QualityStage; for example when building a custom report, you might say MOVEALL OF A or MOVEALL OF B.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.