Design Approach Scenario - Grouping Logic
Posted: Fri Nov 18, 2011 8:41 am
Hi All,
Good Morning.
In one of my new project, I have a requirement as below,
We have 2 files (One is Master-Full File & another file with Add/Delete/Update records). These 2 files has the following fields in it.
SSN First_Name Last_Name Middle_Name DOB Generation
The requirement is to group the similar records from both the above files into one group based on different scenarios.
SSN FN LN MN DOB Generation
*********************************************************************************
Valid SSN Populated& EQUAL EQUAL EQUAL EQUAL EQUAL EQUAL
Valid SSN Populated& EQUAL EQUAL EQUAL EQUAL Partial EQUAL
Partially Populated SSN EQUAL EQUAL EQUAL Partial EQUAL
Partially Populated SSN EQUAL Partial EQUAL Partial EQUAL
Valid SSN Populated& EQUAL Different by max 4 characters Partial Different by max 4 characters Partial EQUAL
Partially Populated SSN EQUAL Partial EQUAL Partial Partial
Valid SSN Populated& EQUAL Different by max 4 characters Partial Different by max 4 characters Partial NOT EQUAL
Partially Populated SSN EQUAL Partial EQUAL Partial Not populated
Partially Populated SSN EQUAL Partial EQUAL Partial NOT EQUAL
Not Populated EQUAL Partial EQUAL Partial Partial
Not Populated Different max by 4 characters Partial EQUAL Partial EQUAL
Different SSN EQUAL Partial EQUAL Partial EQUAL
Based on the how close the match of the records are, we need to group it by giving distinct values for each groups. Pls note that there are lots of such scenarios like the above.
I know the above info might not be enough to suggest a good design approach. But I am trying to use the DataStage/Quality stage tools (Match Stage) & functionalities like Soundex etc.
Any thoughts on how we can achieve this/whether I am going in the right direction will be highly appreciated.
Thanks in advance for your inputs.
Freddie
Good Morning.
In one of my new project, I have a requirement as below,
We have 2 files (One is Master-Full File & another file with Add/Delete/Update records). These 2 files has the following fields in it.
SSN First_Name Last_Name Middle_Name DOB Generation
The requirement is to group the similar records from both the above files into one group based on different scenarios.
SSN FN LN MN DOB Generation
*********************************************************************************
Valid SSN Populated& EQUAL EQUAL EQUAL EQUAL EQUAL EQUAL
Valid SSN Populated& EQUAL EQUAL EQUAL EQUAL Partial EQUAL
Partially Populated SSN EQUAL EQUAL EQUAL Partial EQUAL
Partially Populated SSN EQUAL Partial EQUAL Partial EQUAL
Valid SSN Populated& EQUAL Different by max 4 characters Partial Different by max 4 characters Partial EQUAL
Partially Populated SSN EQUAL Partial EQUAL Partial Partial
Valid SSN Populated& EQUAL Different by max 4 characters Partial Different by max 4 characters Partial NOT EQUAL
Partially Populated SSN EQUAL Partial EQUAL Partial Not populated
Partially Populated SSN EQUAL Partial EQUAL Partial NOT EQUAL
Not Populated EQUAL Partial EQUAL Partial Partial
Not Populated Different max by 4 characters Partial EQUAL Partial EQUAL
Different SSN EQUAL Partial EQUAL Partial EQUAL
Based on the how close the match of the records are, we need to group it by giving distinct values for each groups. Pls note that there are lots of such scenarios like the above.
I know the above info might not be enough to suggest a good design approach. But I am trying to use the DataStage/Quality stage tools (Match Stage) & functionalities like Soundex etc.
Any thoughts on how we can achieve this/whether I am going in the right direction will be highly appreciated.
Thanks in advance for your inputs.
Freddie