Duplicate Job Recommendation
Posted: Thu Oct 04, 2012 2:11 am
Hello guys,
This is not an issue related post, but more to suggestions on what you think the best approach is.
We have a table about 60+M records, and the client needs to have some statistics about duplicate in this table, and later on remove.
There are four rules to find duplicates on this table:
1. First two parts of full Arabic name, nationality, year of birth and an ID number. All those columns reside on the same table, but the ID number is on a different table, and it's 1-to-many. As in, the person can have multiple IDs, with different types. Obviously, the first three should match, and for the ID, at least one record of the many should match to call it a duplicate.
2. First two parts of full English name, nationality, year of birth and mother's Arabic name.
3. Full Arabic name, nationality, year of birth, and mother Arabic name.
4. Full English name, nationality, year of birth and mother's English name
The above four rules are OR. As in, if any of them is passed, then a duplicate is valid. The last 3 are somehow straight forward, but the first is a bit challenging.
If you have such a case, what is the approach you would take to make the report happen using QualityStage?
It would be highly appreciate to share with me your thoughts and experience, and the steps you would take.
Thank you for any input, I'm still kicking my way through in QS after years in working with DS.
-Issam
This is not an issue related post, but more to suggestions on what you think the best approach is.
We have a table about 60+M records, and the client needs to have some statistics about duplicate in this table, and later on remove.
There are four rules to find duplicates on this table:
1. First two parts of full Arabic name, nationality, year of birth and an ID number. All those columns reside on the same table, but the ID number is on a different table, and it's 1-to-many. As in, the person can have multiple IDs, with different types. Obviously, the first three should match, and for the ID, at least one record of the many should match to call it a duplicate.
2. First two parts of full English name, nationality, year of birth and mother's Arabic name.
3. Full Arabic name, nationality, year of birth, and mother Arabic name.
4. Full English name, nationality, year of birth and mother's English name
The above four rules are OR. As in, if any of them is passed, then a duplicate is valid. The last 3 are somehow straight forward, but the first is a bit challenging.
If you have such a case, what is the approach you would take to make the report happen using QualityStage?
It would be highly appreciate to share with me your thoughts and experience, and the steps you would take.
Thank you for any input, I'm still kicking my way through in QS after years in working with DS.
-Issam