Which columns for standardizing names using USNAME rule set?
Which columns for standardizing names using USNAME rule set?
In quality stage there was a job designed earlier in which the address is standardized using USADDR and USAREA.
Now in the same job I need to standardize the name also,For that I will use USNAME rule set.
So when using USNAME rule set which columns do I need to use for the standardization.firstname or middlename or lastname.
please suggest me
Now in the same job I need to standardize the name also,For that I will use USNAME rule set.
So when using USNAME rule set which columns do I need to use for the standardization.firstname or middlename or lastname.
please suggest me
Thanks,
Surya
Surya
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The job for which I am standardizing the name contains standardize stage which has address and area standardized.
Now In the same stage I have to standardize the name using all the available fields (firstname,middlename,lastname and suffix)
In the previous job the output from the standardize stage has first name,middle name,last name but using tranformer two more fields have been added along with these fields viz first INItial and middle INItial.
Which columns do I need to map in the transformer from the output of the standardize stage after the NAME standardization.
please suggest
Now In the same stage I have to standardize the name using all the available fields (firstname,middlename,lastname and suffix)
In the previous job the output from the standardize stage has first name,middle name,last name but using tranformer two more fields have been added along with these fields viz first INItial and middle INItial.
Which columns do I need to map in the transformer from the output of the standardize stage after the NAME standardization.
please suggest
Thanks,
Surya
Surya
-
- Premium Member
- Posts: 425
- Joined: Sat Nov 19, 2005 9:26 am
- Location: New York City
- Contact:
Suryadev,
You would like to use the standardized output columns from the previos job, that should improve the results of your standardization process. Better input, better output....
Basically you should force the proper input format that the USNAME rule set is expecting to get the most of it, (lastname, firstname) and at the same time, taking advantages of the previous standaridization effort
Don't forget to massage and fix those records coming in the UnhandledData_USNAME field from the previous job
You would like to use the standardized output columns from the previos job, that should improve the results of your standardization process. Better input, better output....
Basically you should force the proper input format that the USNAME rule set is expecting to get the most of it, (lastname, firstname) and at the same time, taking advantages of the previous standaridization effort
Don't forget to massage and fix those records coming in the UnhandledData_USNAME field from the previous job
Julio Rodriguez
ETL Developer by choice
"Sure we have lots of reasons for being rude - But no excuses
ETL Developer by choice
"Sure we have lots of reasons for being rude - But no excuses
In the first job which was designed earlier the name was not standardized and after standardize stage in the transformer stage first name was split into first_name and INItialFirst_name,middlename was split into middle_name and InitMiddle_name and the last name was the same.
Now after standardizing the name using USNAME rule set and using all the fields in the input related to names like (first_name,middle_name,last_name,suffix)
the output from the standardized stage properties has lot of fields like (match primary fields,unhandled fields and number of fields)
Which fields do I need to map which will be the output from standardized stage to the tranformer?
I mapped the fields firstname,middlename,prefix,primary name
But I cannot find the lastname in the output,
please suggest me which fields do I need to map to get the standardized names as output
Thanks
Now after standardizing the name using USNAME rule set and using all the fields in the input related to names like (first_name,middle_name,last_name,suffix)
the output from the standardized stage properties has lot of fields like (match primary fields,unhandled fields and number of fields)
Which fields do I need to map which will be the output from standardized stage to the tranformer?
I mapped the fields firstname,middlename,prefix,primary name
But I cannot find the lastname in the output,
please suggest me which fields do I need to map to get the standardized names as output
Thanks
Thanks,
Surya
Surya
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 527
- Joined: Thu Apr 19, 2007 1:25 am
- Location: Melbourne
Which extra fields you include will depend on what you plan on doing with the data afterwards.
If you want to do some matching involving the name, then the match fields will be useful. so take them along for as long as you need them.
If you want to improve the standardisation your ruleset does, UnhandledData and UnhandledPattern are both useful, but probably not for taking with you.
If you have unhandled data, it may be safer to just pass on the original values. So you can use use UnhandledData to check. Then again, it may not be safer.
But don't just just blindly take my word for it.
You are in the best position to know what you want to do, or what you actually need.
Read the DCT file and you will see the fields that contain the different parts of the standardised name (the ones you need to have), versus the extra fields used to help with later matching activities or unhandled data etc.
Use the rule tester. It takes 2 seconds to plug a name in, hit the button and see where it goes. You can learn a lot about a ruleset in a few minutes just by trying it out.
Understand your data and a lot of questions answer themselves.
If you want to do some matching involving the name, then the match fields will be useful. so take them along for as long as you need them.
If you want to improve the standardisation your ruleset does, UnhandledData and UnhandledPattern are both useful, but probably not for taking with you.
If you have unhandled data, it may be safer to just pass on the original values. So you can use use UnhandledData to check. Then again, it may not be safer.
But don't just just blindly take my word for it.
You are in the best position to know what you want to do, or what you actually need.
Read the DCT file and you will see the fields that contain the different parts of the standardised name (the ones you need to have), versus the extra fields used to help with later matching activities or unhandled data etc.
Use the rule tester. It takes 2 seconds to plug a name in, hit the button and see where it goes. You can learn a lot about a ruleset in a few minutes just by trying it out.
Understand your data and a lot of questions answer themselves.
Yes I need to do matching involving the name,So the match fields are useful.
After standardizing I need to create passes in the match specification based on the name.
For this purpose all the fields have to be moved till the end of matching.
Is this right?
I checked the DCT file of the USNAME rule set I found BI fields,Matching Fields and reporting fields.
So do I need only the matching fields ?
Also I included all the fields in the columns of the when a rule set is selected.
(firstname,lastname,middlename,suffixcd)
Thanks
After standardizing I need to create passes in the match specification based on the name.
For this purpose all the fields have to be moved till the end of matching.
Is this right?
I checked the DCT file of the USNAME rule set I found BI fields,Matching Fields and reporting fields.
So do I need only the matching fields ?
Also I included all the fields in the columns of the when a rule set is selected.
(firstname,lastname,middlename,suffixcd)
Thanks
Thanks,
Surya
Surya
I read the tutorial which helped me to some extent.
Actually the output for the 1st job which I am designing now has to be used as a source to 2nd job in which match frequency is used to generate frequency for the source.
after this by using match specifications I need to match the fields with passes.
Based on the above situation can anyone suggest me which fields do need to take as output from after standardizing using USNAME rule set in my 1st job.
Thanks
Actually the output for the 1st job which I am designing now has to be used as a source to 2nd job in which match frequency is used to generate frequency for the source.
after this by using match specifications I need to match the fields with passes.
Based on the above situation can anyone suggest me which fields do need to take as output from after standardizing using USNAME rule set in my 1st job.
Thanks
Thanks,
Surya
Surya
-
- Participant
- Posts: 527
- Joined: Thu Apr 19, 2007 1:25 am
- Location: Melbourne
Don't just read it: actually do it.suryadev wrote:I read the tutorial which helped me to some extent.
Actually the output for the 1st job which I am designing now has to be used as a source to 2nd job in which match frequency is used to generate frequency for the source.
after this by using match specifications I need to match the fields with passes.
Based on the above situation can anyone suggest me which fields do need to take as output from after standardizing using USNAME rule set in my 1st job.
Thanks
You'll be surprised how much more you'll learn if you take the extra couple of hours and run through the tutorials properly.
You'll also find that with a bit of knowledge about the tool, you'll ask better questions that lead to quicker answers.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Custom rule sets can be used as often as necessary. It is better, therefore, if some thought has gone into their construction, particularly about handling conditions as generally as possible.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.