matching job

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
suneelchallagali
Participant
Posts: 251
Joined: Tue Dec 02, 2008 3:09 pm

matching job

Post by suneelchallagali »

Hi Guys,

The main goal of this job is comparing source and target.i have created the match specification for reference match.

All the columns are varchar

In blocking variable iam using SSN
In matching variable i am using firstname,SSN,age

i have set the m=.9 and u=.1

consider the first name as uncert and set parameters1 as 900.
SSN as char
age as char

i am getting the exact match but i am not the getting 90% match for first name.

i have set clerical, match,duplicates as 0.

Please can you help me out regarding this.

Thank you,
suneel
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

900 is NOT 90%. 900 is "must match exactly". Read the manual.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
suneelchallagali
Participant
Posts: 251
Joined: Tue Dec 02, 2008 3:09 pm

Post by suneelchallagali »

Hi ray,

i had given 900 only just for understand i had mention as 90%.even i tried with 850 also but i am not getting the match
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Post by JRodriguez »

suneelchallagali

If you are using SSN as a blocking variable you don't need to include it as a matching variable. The process will compare only records with same SSN anyway

Having a cut off as 0 doesn't mean that records having a composite weight greater than 0 are matches. You should play around with the test environment to find out at which level cut off values are real matches - the graphic shows those values - then set your cut off with that value

Thansk

Julio R
suneelchallagali
Participant
Posts: 251
Joined: Tue Dec 02, 2008 3:09 pm

Post by suneelchallagali »

Thank you JRodriguez
suneelchallagali
Participant
Posts: 251
Joined: Tue Dec 02, 2008 3:09 pm

Post by suneelchallagali »

i am getting this error

REFERENCE_MATCH,0: Fatal error from object MatJoinOp, code 1
REFERENCE_MATCH,0: The runLocally() of the operator failed.
REFERENCE_MATCH,0: Operator terminated abnormally: runLocally did not return APT_StatusO
main_program: APT_PMsectionLeader(3, node3), player 15 - Unexpected exit status 1.
APT_PMsectionLeader(4, node4), player 16 - Unexpected exit status 1.
APT_PMsectionLeader(4, node4), player 20 - Unexpected exit status 1.

Actually from the output of reference match i am considering only exact match and residuals as i required only those two feilds.

Following fields are used in blocking and matching stage

In blocking stage
birthdate char
ssn char

matching stage:

firstname uncert set parameter =850
middle name uncert set prameter =900
last name uncert set parameter=850

actually when i ran this job with reference data it is working fine for few runs but not for all runs. i am getting error message as mention above.

so please any one can help me out!!!!!!!!!
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Post by JRodriguez »

suneelchallagali

If you are looking to get only exact match on birthdate and ssn then do not use any matching fields the process will dump into residual all records that don't match on those two fields

Regarding the error looks like the processes - players on different node- are running out of resources - normally memory or temp space.

One way that you can test if you are running out of resources is executing your process in sequential mode ....


Julio Rodriguez
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
suneelchallagali
Participant
Posts: 251
Joined: Tue Dec 02, 2008 3:09 pm

Post by suneelchallagali »

Hi rodriguez,


i have to match the records with are exact match as well as 90% match . For 90% match on first_name,last_name,midle_name i have set the probablities as m=.09 and u=.01 as set the parameter 1 as 850 but i am not getting the match even though they are 90% match.

Type for First_name,Middle_name,Last_name consider as uncert.
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Post by JRodriguez »

suneelchallagali,

Just to be clear, you need exact matches on ssn and birthdate and matches were if the ssn or birthday are missing or are different then you would like to get all records that match 90% base on First_name, Middle_name and Last_name?

If so you should have two passes. One having only birthdate and ssn as blocking fields this pass will give you all matches and anther pass with a different set of blocking variables like NYSIIS of first name and NYSIIS of last name and matching variables First_name, Middle_name and Last_name using Uncert param1 as 880 ( Play around to set the cut off value)

Thanks
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
suneelchallagali
Participant
Posts: 251
Joined: Tue Dec 02, 2008 3:09 pm

how to handle Null in blocking and matcing feilds

Post by suneelchallagali »

Hi

i am using date of birthday and ssn numbere as blocking fields

first_name.last_name,midle_name ass matching fields

How to handle if i get null records in blocking records as well as in matching records.

If the source data and reference data containing ssn field as null but all other records are having value and they are same that records must go to match dataset.

so please can you help me out!!!!!!!!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Is null genuine null or simply "missing"? There are ways to handle missing values, among them setting up VARTYPEs. Or you can convert the null values to something else upstream of the Match stage. To what values are your cutoffs set?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
suneelchallagali
Participant
Posts: 251
Joined: Tue Dec 02, 2008 3:09 pm

Post by suneelchallagali »

it is missing value i means blank values for particular fields
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Re: how to handle Null in blocking and matcing feilds

Post by JRodriguez »

suneelchallagali,

All records having nulls values in blocking fields are skipped in the current pass, but become available for other passes. Just add other passes with different blocking fields to process those records


Records will become matches, duplicates or residual depending on the composite weight and the cut off values for the pass. A null value in a matching field by default will add a zero(0) weight to the composite weight - you can use a different default if you want- so you should to set the match, duplicate and clerical cutt off to proper values

You also should use a vartype - CRITICAL MISSING OK - for fields in the matching commands having nulls to specify that should be consider in matching process, if not they will become residuals



Also you need to specity that columns containing nulls values need special handling using a vartype like "CRITICAL MISSINGOK", if not those records become residuals
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
suneelchallagali
Participant
Posts: 251
Joined: Tue Dec 02, 2008 3:09 pm

Post by suneelchallagali »

Thank alot for the help JRodriguez!!!
Post Reply