Which columns for standardizing names using USNAME rule set?

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
suryadev
Premium Member
Premium Member
Posts: 211
Joined: Sun Jul 11, 2010 7:39 pm

Which columns for standardizing names using USNAME rule set?

Post by suryadev »

In quality stage there was a job designed earlier in which the address is standardized using USADDR and USAREA.
Now in the same job I need to standardize the name also,For that I will use USNAME rule set.

So when using USNAME rule set which columns do I need to use for the standardization.firstname or middlename or lastname.

please suggest me
Thanks,
Surya
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use as much information as you have. This might also include title, generational (e.g. JR), suffix, ..., whatever.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
suryadev
Premium Member
Premium Member
Posts: 211
Joined: Sun Jul 11, 2010 7:39 pm

Post by suryadev »

The job for which I am standardizing the name contains standardize stage which has address and area standardized.

Now In the same stage I have to standardize the name using all the available fields (firstname,middlename,lastname and suffix)

In the previous job the output from the standardize stage has first name,middle name,last name but using tranformer two more fields have been added along with these fields viz first INItial and middle INItial.


Which columns do I need to map in the transformer from the output of the standardize stage after the NAME standardization.

please suggest
Thanks,
Surya
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Post by JRodriguez »

Suryadev,

You would like to use the standardized output columns from the previos job, that should improve the results of your standardization process. Better input, better output....

Basically you should force the proper input format that the USNAME rule set is expecting to get the most of it, (lastname, firstname) and at the same time, taking advantages of the previous standaridization effort

Don't forget to massage and fix those records coming in the UnhandledData_USNAME field from the previous job
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
suryadev
Premium Member
Premium Member
Posts: 211
Joined: Sun Jul 11, 2010 7:39 pm

Post by suryadev »

In the first job which was designed earlier the name was not standardized and after standardize stage in the transformer stage first name was split into first_name and INItialFirst_name,middlename was split into middle_name and InitMiddle_name and the last name was the same.

Now after standardizing the name using USNAME rule set and using all the fields in the input related to names like (first_name,middle_name,last_name,suffix)

the output from the standardized stage properties has lot of fields like (match primary fields,unhandled fields and number of fields)
Which fields do I need to map which will be the output from standardized stage to the tranformer?
I mapped the fields firstname,middlename,prefix,primary name
But I cannot find the lastname in the output,
please suggest me which fields do I need to map to get the standardized names as output

Thanks
Thanks,
Surya
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Whatever you've got. If it's not in your source, you haven't got it. Try synonyms like LastName, FamilyName, PrimaryName.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
suryadev
Premium Member
Premium Member
Posts: 211
Joined: Sun Jul 11, 2010 7:39 pm

Post by suryadev »

so,while mapping in the output of stage properties do I need to map all the fields (match primary fields,unhandled fields and all other fields)

After that which fields do I need to send as output to the target,Only the firstname,middlename,lastname or rest all the fields also?
Thanks,
Surya
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

Which extra fields you include will depend on what you plan on doing with the data afterwards.

If you want to do some matching involving the name, then the match fields will be useful. so take them along for as long as you need them.
If you want to improve the standardisation your ruleset does, UnhandledData and UnhandledPattern are both useful, but probably not for taking with you.
If you have unhandled data, it may be safer to just pass on the original values. So you can use use UnhandledData to check. Then again, it may not be safer. ;-)

But don't just just blindly take my word for it.
You are in the best position to know what you want to do, or what you actually need.

Read the DCT file and you will see the fields that contain the different parts of the standardised name (the ones you need to have), versus the extra fields used to help with later matching activities or unhandled data etc.
Use the rule tester. It takes 2 seconds to plug a name in, hit the button and see where it goes. You can learn a lot about a ruleset in a few minutes just by trying it out.

Understand your data and a lot of questions answer themselves.
suryadev
Premium Member
Premium Member
Posts: 211
Joined: Sun Jul 11, 2010 7:39 pm

Post by suryadev »

Yes I need to do matching involving the name,So the match fields are useful.

After standardizing I need to create passes in the match specification based on the name.
For this purpose all the fields have to be moved till the end of matching.
Is this right?

I checked the DCT file of the USNAME rule set I found BI fields,Matching Fields and reporting fields.
So do I need only the matching fields ?

Also I included all the fields in the columns of the when a rule set is selected.
(firstname,lastname,middlename,suffixcd)

Thanks
Thanks,
Surya
rjdickson
Participant
Posts: 378
Joined: Mon Jun 16, 2003 5:28 am
Location: Chicago, USA
Contact:

Post by rjdickson »

You may want to try to run through the tutorial. It might help answer some of your questions.
Regards,
Robert
suryadev
Premium Member
Premium Member
Posts: 211
Joined: Sun Jul 11, 2010 7:39 pm

Post by suryadev »

I read the tutorial which helped me to some extent.

Actually the output for the 1st job which I am designing now has to be used as a source to 2nd job in which match frequency is used to generate frequency for the source.

after this by using match specifications I need to match the fields with passes.

Based on the above situation can anyone suggest me which fields do need to take as output from after standardizing using USNAME rule set in my 1st job.


Thanks
Thanks,
Surya
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

suryadev wrote:I read the tutorial which helped me to some extent.

Actually the output for the 1st job which I am designing now has to be used as a source to 2nd job in which match frequency is used to generate frequency for the source.

after this by using match specifications I need to match the fields with passes.

Based on the above situation can anyone suggest me which fields do need to take as output from after standardizing using USNAME rule set in my 1st job.


Thanks
Don't just read it: actually do it.
You'll be surprised how much more you'll learn if you take the extra couple of hours and run through the tutorials properly.
You'll also find that with a bit of knowledge about the tool, you'll ask better questions that lead to quicker answers.
suryadev
Premium Member
Premium Member
Posts: 211
Joined: Sun Jul 11, 2010 7:39 pm

Post by suryadev »

Thanks for the Information.

Can custom rulesets be reused?
Thanks,
Surya
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Custom rule sets can be used as often as necessary. It is better, therefore, if some thought has gone into their construction, particularly about handling conditions as generally as possible.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply