Page 1 of 1

Parsing Names

Posted: Wed Jul 27, 2011 3:43 pm
by SUMAN CHILUVURI
I need to split this name into 3 fields even if there is no middle name

Names
Broyles,Keith T
Hammond,Edward J
Bullock,Monica
Poston Jr.,Hugh O

Result

Code: Select all

firstname   lastname  middlename
Broyles     Keith     T
Bullock     Monica
I am facing difficulty if the name does not have a middle name
can any one please give me the syntax.

datastage or sql query any thing is ok.

Moderator: subject changed

Posted: Wed Jul 27, 2011 4:34 pm
by chulett
Please choose more meaningful subjects for your posts, one that is actually related to the question being asked.

As to your issue, spell out in words - regardless of the tool used - the process of parsing the name and rules for determining if a middle name is present or not. Once you do, how you would attempt to do this in DataStage should be much clearer.

Posted: Wed Jul 27, 2011 5:19 pm
by ray.wurlod
Does the comma always appear?

How will you handle:
WILLIAMS, Mary-Jane Fiona
WILLIAMS, Mary Jane Fiona
WILLIAMS Mary-Jane
DE WORDE, William
DE WORDE William

Posted: Wed Jul 27, 2011 5:35 pm
by chulett
ray.wurlod wrote:Does the comma always appear?
I certainly hope so. :wink:

Posted: Wed Jul 27, 2011 7:34 pm
by ray.wurlod
It's important to know. Otherwise, what is the family name of DENNIS JAMES or JAMES DENNIS ?

Is there a business rule that explains how to handle names in which the comma is not present?

What if there are two commas? GATES, III, WILLIAM F

What if there are three commas? SCHULTZ, JR, Charles, Admiral

BUSINESS RULES!!!

Posted: Thu Jul 28, 2011 9:53 am
by SUMAN CHILUVURI
Comma always appears after the last name.

Posted: Thu Jul 28, 2011 10:16 am
by FranklinE
This is a delimiter problem, and the root cause is the rules (or lack of them) at the data source.

If the main record delimiter is a comma, parse into last-name and first-middle. If there will always be a single space between the first name and middle name (full or initial), reparse first-middle into first-name and middle-name with space as the delimiter.

You main problem is that you cannot code for humans who are allowed to change their minds about how they enter data. With valid variations in names like compound names with a space in between, you see the problem. It can be solved only by imposing requirements at the data entry point and refusing to accept exceptions.

Posted: Thu Jul 28, 2011 4:48 pm
by ray.wurlod
... or using something like QualityStage to figure it out based on known first names. But even that's not 100%. Can't be, because parents go on inventing new names for their children.