Name Standardization issue 7.5 vs 8x

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
dodda
Premium Member
Premium Member
Posts: 244
Joined: Tue May 29, 2007 11:31 am

Name Standardization issue 7.5 vs 8x

Post by dodda »

Hi all,

I had worked a little on Qualitystage 7.5( not an expert).
while standardizing names when i use literals (ZQ) i got Matchfirstname and soundex results, but when i do the same in 8 its shows unhandled pattern.

example:

ZQ MARTIN ZQ K ZQ SMITH
ZQ VENKAT ZQ K ZQ SMITH

The above two strings when analayzed using USNAME rule set parsed correctly ( no unhandled pattern) in 7.5 version. but the same is not true in 8x version.

can someone guide me what needs to be changed in USNAME rule sets of v8 in order to accept non US names.

Thanks in advance
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Post by JRodriguez »

Dodda,

One way to make it work is using classification overrides.

Just double click on the rule set then the Rule management windows will pop up:

Rules Management tool --> Overrides --> Classification

Classify all your names base on your investigation frequency report
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
dodda
Premium Member
Premium Member
Posts: 244
Joined: Tue May 29, 2007 11:31 am

Post by dodda »

JRodriguez wrote:Dodda,

One way to make it work is using classification overrides.

Just double click on the rule set then the Rule management windows will pop up:

Rules Management tool --> Overrides --> Classification

Classify all your names base on your investigation frequency report

Thanks Julio for your reply.
i am pretty sure that i have not done any overrides in my previous project on 7.5. i am not sure if classifying overrides will solve my problem as we would be anonymous names. if we have to classify each name, how about new names that are not there in the system? i need to a direction to solve this problem as i would be using standardized names further for matching, if name turns out to be unhandled pattern then matching fails per logic.


Thanks for your suggestion and i am looking for more.

thanks all
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Somewhere, somehow, you are going to have to classify some names as first names. You probably also need to parse the pattern + | I | + into first name, middle name and primary name buckets.
Last edited by ray.wurlod on Thu Dec 03, 2009 4:28 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Post by JRodriguez »

Dodda,

Notice that with unhandled pattern overrides you will be solving not individual names but patterns so with one pattern overrides rule you will be solving any unclassified name that fall into that pattern. A good trick is to add the most common pattern coming in your data (Investigation job) as unhandled pattern overrides rules (Proactive)

If I know which names are probably be coming in myr data ... I would classify them in front instead of waiting for the tool to handled it as unhandled pattern


I don't remember using just ZQ literal to identify anonymous names ... as a matter of fact the USNAME Rule will set them to null ( You can see it in the Pattern Action File). I remember seeing orphans literals(ZQ) in preprocessor USPREP output after using metadata delimiters(ZQ Domain ZQ)
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
dodda
Premium Member
Premium Member
Posts: 244
Joined: Tue May 29, 2007 11:31 am

Post by dodda »

Thanks guys for all your inputs. I am going to try few things suggested here during this weekend and on Monday and will let you know the results!

Just my obervation:
I believe that +|+ format is handled in GBNAME rule set. when i send Vekant K Dodda to GBNAME its actually parsing the name. so i am planning to combine patterns in GBNAME and USNAME after some analysis.

I'll get back in touch with you with my results.

Thanks all for your valuble suggestions
dodda
Premium Member
Premium Member
Posts: 244
Joined: Tue May 29, 2007 11:31 am

Post by dodda »

Thanks again all for your suggestions.

I've investigated the names and found the most common patterns and added them to overrides, thus solving the problem.

Thre are too many names to classify nd for sure we know that we will get new names , so I separated the most common patterns and added them as overrides.

Thanks guys !
Post Reply