Add 30000 Names to Rule Set Specification as First Name

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
kemoid
Participant
Posts: 6
Joined: Wed May 02, 2007 3:29 am
Location: cairo

Add 30000 Names to Rule Set Specification as First Name

Post by kemoid »

the result of the standardize stage shows that there is many words were unrecognized, so ,
the solution is to add them to words table using the override windows and using the Classification tab . but
this is a manual process and it's impractical to add more than 30000 words row by row by hand
So, when trying to add them all to the USNAME.UCL file opened by Notepad : (the project called UBE)

E:\IBM\InformationServer\Server\Projects\UBE\Quality\USNAME.UCL
E:\IBM\InformationServer\Clients\Classic\QS_TEMP\USNAME.UCL

when opening the ruleset windows, quality stage didn't recognized the changes to the USNAME.UCL file

What is the way to add a bunch of words like this
AAB AAB F
AABANY AABANY F
AABAS AABAS F
AABD AABD F
AABDA AABDA F
AABDEE AABDEE F
AABDEEN AABDEEN F
AABDEIN AABDEIN F
AABDEL AABDEL F
AABDEN AABDEN F
AABDIEN AABDIEN F
AABDIN AABDIN F
AABE AABE F
AABED AABED F
AABEDIN AABEDIN F

to the ruleset instead of the manual add process?

thanks for ur cooperation
boxtoby
Premium Member
Premium Member
Posts: 138
Joined: Mon Mar 13, 2006 5:11 pm
Location: UK

Post by boxtoby »

What I would do is to run a word investigation stage over the name column. This will produce several files one of which is effectively a .ucl or .cls file. It looks a bit like this:

AAAAA AAAAA ?

The problem is that for a name like FRED SMITH you will get

FRED FRED ?
SMITH SMITH ?

I imagine you want FRED but not SMITH, so you will need to edit the file unless the forename and surname are in separate columns.

Then change the ? to whatever the token you require

Cut and paste the entries in to the .ucl file.

Not perfect, but quicker than typing!
Bob Oxtoby
kemoid
Participant
Posts: 6
Joined: Wed May 02, 2007 3:29 am
Location: cairo

Post by kemoid »

Thanks Bob, but this is not what i need.

i open the USNAMECOPY.UCL by notepad , add the word ( UUU UUU F ) as an example and close it , when try to test the word in the RuleSet test Window, it didn't works , it recognized it as + (means unknown word). but if a add it in the RuleSet test Window and try test again it did recognized it right as firstname.

i have more than 30000 words , what is the correct steps for adding these words to the Rule Set. So that it can gives correct identification in the test window
karim
kemoid
Participant
Posts: 6
Joined: Wed May 02, 2007 3:29 am
Location: cairo

Post by kemoid »

thanks you all guys , guys who think about the probelm and guys who didn't
The problem has been solved


I tried to export the only .UCL file to a xml format file , and then open it in notpad and add the words like
(JONE JONE F)
, save the file, open QDesigner /Import DataStage Components/ok
and it's done , if u test the any word , u 'll find it recognized as First Name (F)
karim
mydsworld
Participant
Posts: 321
Joined: Thu Sep 07, 2006 3:55 am

Post by mydsworld »

Hi Kemoid,

Plz explain the solution you found for this problem (I have also faced it a couple of times).
1.How did you export the .UCL file
2.How did you import the modified file to incorporate the change.

I had also tried to classification override to solve this, but found the problem remains.
kemoid
Participant
Posts: 6
Joined: Wed May 02, 2007 3:29 am
Location: cairo

Post by kemoid »

mydsworld wrote:Hi Kemoid,

Plz explain the solution you found for this problem (I have also faced it a couple of times).
1.How did you export the .UCL file
2.How did you import the modified file to incorporate the change.

I had also tried to classification override to solve this, but found the problem remains.
1. open the rule set / add a new class like (SSS SSS F) /
2. right click on UCL file/ Export/ DataStage file or xml /
3. open the exported file with notepad / search for the word (SSS) / replace it with ur own table (be sure ur table will be in the same format)
4. Import / Import DataStage Component (or xml) / choose the file and Yes
karim
Post Reply