Page 1 of 1

Splitting name into title, forename, surname

Posted: Mon Apr 10, 2006 3:35 am
by AlexD
Hi,

I've probably posting something that is a fairly common problem, unfortunately the posts I've searched on have only really solved it for strings in a consistent format. We have a fullname field that needs to be split into title, forename/initials and surname fields. The problem is that the fullname field contains a variety of name formats

e.g.
Mr Edward Smith
D Jones
Miss F G H Underhill
Peter Morrison
T.D. Watson
etc.

Obviously using Field and defining space as a delimiter is not going to work effectively for this and Datastage itself is possibly unsuitable. Is there some code that someone has devised for this scenario...or should I be looking at QualityStage?

Thanks in advance,
Alex

Posted: Mon Apr 10, 2006 4:05 am
by ray.wurlod
This is precisely the task that QualityStage performs, almost out of the box. You can invoke a QualityStage standardization task through a QualityStage stage in a DataStage job.

Posted: Mon Apr 10, 2006 6:16 am
by jhmckeever
ray.wurlod wrote:This is precisely the task that QualityStage performs, almost out of the box. You can invoke a QualityStage standardization task through a QualityStage stage in a DataStage job.
The USNAME/GBNAME rulesets achieve this with little or no customization effort required. You can play with your examples using the Rules Analyzer to see if it meets your needs.

J.

Posted: Tue Apr 11, 2006 2:32 am
by AlexD
Thanks for your responses. We shall look into these areas of Quality Stage more closely, but does anyone know a way of doing it in DataStage?

regards,
Alex

Posted: Tue Apr 11, 2006 3:01 am
by ArndW
Alex,

there is no builtin function to accurately do this in DataStage. There are 3rd party product for name cleansing out there (Trillium software comes to mind).

In the past I've programmed my own logic in short routines, using spaces, commas, and periods as delimiters and using a list of known prefixes and titles to strip the non-name portions out; then using the last word as the family name and any string left for the first and second names.