Which columns for standardizing names using USNAME rule set?

suryadev · Post by **suryadev** » Tue Feb 01, 2011 10:05 am

In quality stage there was a job designed earlier in which the address is standardized using USADDR and USAREA.
Now in the same job I need to standardize the name also,For that I will use USNAME rule set.

So when using USNAME rule set which columns do I need to use for the standardization.firstname or middlename or lastname.

please suggest me

ray.wurlod · Post by **ray.wurlod** » Tue Feb 01, 2011 4:07 pm

Use as much information as you have. This might also include title, generational (e.g. JR), suffix, ..., whatever.

suryadev · Post by **suryadev** » Fri Feb 04, 2011 11:53 am

The job for which I am standardizing the name contains standardize stage which has address and area standardized.

Now In the same stage I have to standardize the name using all the available fields (firstname,middlename,lastname and suffix)

In the previous job the output from the standardize stage has first name,middle name,last name but using tranformer two more fields have been added along with these fields viz first INItial and middle INItial.

Which columns do I need to map in the transformer from the output of the standardize stage after the NAME standardization.

please suggest

JRodriguez · Post by **JRodriguez** » Fri Feb 04, 2011 12:31 pm

Suryadev,

You would like to use the standardized output columns from the previos job, that should improve the results of your standardization process. Better input, better output....

Basically you should force the proper input format that the USNAME rule set is expecting to get the most of it, (lastname, firstname) and at the same time, taking advantages of the previous standaridization effort

Don't forget to massage and fix those records coming in the UnhandledData_USNAME field from the previous job

suryadev · Post by **suryadev** » Sun Feb 06, 2011 11:42 am

In the first job which was designed earlier the name was not standardized and after standardize stage in the transformer stage first name was split into first_name and INItialFirst_name,middlename was split into middle_name and InitMiddle_name and the last name was the same.

Now after standardizing the name using USNAME rule set and using all the fields in the input related to names like (first_name,middle_name,last_name,suffix)

the output from the standardized stage properties has lot of fields like (match primary fields,unhandled fields and number of fields)
Which fields do I need to map which will be the output from standardized stage to the tranformer?
I mapped the fields firstname,middlename,prefix,primary name
But I cannot find the lastname in the output,
please suggest me which fields do I need to map to get the standardized names as output

Thanks

ray.wurlod · Post by **ray.wurlod** » Sun Feb 06, 2011 12:15 pm

Whatever you've got. If it's not in your source, you haven't got it. Try synonyms like LastName, FamilyName, PrimaryName.

suryadev · Post by **suryadev** » Sun Feb 06, 2011 1:43 pm

so,while mapping in the output of stage properties do I need to map all the fields (match primary fields,unhandled fields and all other fields)

After that which fields do I need to send as output to the target,Only the firstname,middlename,lastname or rest all the fields also?

stuartjvnorton · Post by **stuartjvnorton** » Sun Feb 06, 2011 5:10 pm

Which extra fields you include will depend on what you plan on doing with the data afterwards.

If you want to do some matching involving the name, then the match fields will be useful. so take them along for as long as you need them.
If you want to improve the standardisation your ruleset does, UnhandledData and UnhandledPattern are both useful, but probably not for taking with you.
If you have unhandled data, it may be safer to just pass on the original values. So you can use use UnhandledData to check. Then again, it may not be safer.

But don't just just blindly take my word for it.
You are in the best position to know what you want to do, or what you actually need.

Read the DCT file and you will see the fields that contain the different parts of the standardised name (the ones you need to have), versus the extra fields used to help with later matching activities or unhandled data etc.
Use the rule tester. It takes 2 seconds to plug a name in, hit the button and see where it goes. You can learn a lot about a ruleset in a few minutes just by trying it out.

Understand your data and a lot of questions answer themselves.

suryadev · Post by **suryadev** » Mon Feb 07, 2011 10:02 am

Yes I need to do matching involving the name,So the match fields are useful.

After standardizing I need to create passes in the match specification based on the name.
For this purpose all the fields have to be moved till the end of matching.
Is this right?

I checked the DCT file of the USNAME rule set I found BI fields,Matching Fields and reporting fields.
So do I need only the matching fields ?

Also I included all the fields in the columns of the when a rule set is selected.
(firstname,lastname,middlename,suffixcd)

Thanks

rjdickson · Post by **rjdickson** » Mon Feb 07, 2011 11:52 am

You may want to try to run through the tutorial. It might help answer some of your questions.

suryadev · Post by **suryadev** » Mon Feb 07, 2011 4:33 pm

I read the tutorial which helped me to some extent.

Actually the output for the 1st job which I am designing now has to be used as a source to 2nd job in which match frequency is used to generate frequency for the source.

after this by using match specifications I need to match the fields with passes.

Based on the above situation can anyone suggest me which fields do need to take as output from after standardizing using USNAME rule set in my 1st job.

Thanks

stuartjvnorton · Post by **stuartjvnorton** » Mon Feb 07, 2011 4:43 pm

suryadev wrote:I read the tutorial which helped me to some extent.

Actually the output for the 1st job which I am designing now has to be used as a source to 2nd job in which match frequency is used to generate frequency for the source.

after this by using match specifications I need to match the fields with passes.

Based on the above situation can anyone suggest me which fields do need to take as output from after standardizing using USNAME rule set in my 1st job.

Thanks

Don't just read it: actually do it.
You'll be surprised how much more you'll learn if you take the extra couple of hours and run through the tutorials properly.
You'll also find that with a bit of knowledge about the tool, you'll ask better questions that lead to quicker answers.

suryadev · Post by **suryadev** » Wed Feb 16, 2011 8:54 am

Thanks for the Information.

Can custom rulesets be reused?

ray.wurlod · Post by **ray.wurlod** » Wed Feb 16, 2011 2:41 pm

Custom rule sets can be used as often as necessary. It is better, therefore, if some thought has gone into their construction, particularly about handling conditions as generally as possible.