Index Position of Last Occurrence of String

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
jonesv
Premium Member
Premium Member
Posts: 19
Joined: Thu Jul 31, 2003 6:58 am
Location: Canada

Index Position of Last Occurrence of String

Post by jonesv »

Hi

I am looking for a way to return the index position of the last occurrence of a substring in a given string. The requirement I have is to confirm that an email address has at least one period (.) after the @ character in the address. But there could be multiple dots before the @ and after the @. For example, these are all valid addresses:

J.o.h.n.D.o.e@gmail.com
JohnDoe@name.co.uk

etc.

Any advice would be appreciated.

Thanks
ankursaxena.2003
Participant
Posts: 96
Joined: Mon May 14, 2012 1:30 pm

Post by ankursaxena.2003 »

Try to get the string after @ in a Stage Variable and then try to find period.
For example,

SV1 ---> @name.co.uk
SV2 ---> Try to find period in SV1 using Index using Index(SV1,'.',1)

If SV2 has any value other than 0, it means that it has period in it. And if SV2 has value 0, it means that there is no period.
jonesv
Premium Member
Premium Member
Posts: 19
Joined: Thu Jul 31, 2003 6:58 am
Location: Canada

Post by jonesv »

Now why didn't I think of that :D

Makes perfect sense. Thanks!
ankursaxena.2003
Participant
Posts: 96
Joined: Mon May 14, 2012 1:30 pm

Post by ankursaxena.2003 »

Sometimes, the problem is easy but we think too hard and make it complex....
:D
jonesv
Premium Member
Premium Member
Posts: 19
Joined: Thu Jul 31, 2003 6:58 am
Location: Canada

Post by jonesv »

Very true :wink:
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

ankursaxena.2003 wrote:Try to get the string after @ in a Stage Variable and then try to find period.
Just curious what you (jonesv) are using for this - field, I assume.

Also curious how far you need to take the checking - what if is doesn't have an "@" in it? Or has more than one? Doing these kind of validations "manually" can take on a life of their own. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
jonesv
Premium Member
Premium Member
Posts: 19
Joined: Thu Jul 31, 2003 6:58 am
Location: Canada

Post by jonesv »

Just curious what you (jonesv) are using for this - field, I assume.
I don't understand your question. To extract everything after the "@" (after already validating that it does have an "@") I have this derivation is the stage var:

Code: Select all

Right(Trim(lnk_TestDataRead.EMAIL_ID),Len(Trim(lnk_TestDataRead.EMAIL_ID))-Index(Trim(lnk_TestDataRead.EMAIL_ID),"@",1)+1)
Also curious how far you need to take the checking - what if is doesn't have an "@" in it? Or has more than one? Doing these kind of validations "manually" can take on a life of their own.
Understood. But I am only a lowly developer. I am given a spec, and I code to that spec. But I hear ya, and I have explained to the powers that be that there is no way you could code EVERY validation for a valid email address. But they still want these two checks:

1) each email addy has to have an "@"
2) there has to be at least one period after the "@".

C'est la vie.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Sorry, by "field" I meant the Field() function which makes pulling sections out of a "delimited string" quite simple. So, something like:

Code: Select all

Field(lnk_TestDataRead.EMAIL_ID,"@",2,99)
Would get every "field" starting with the second one in the column, considering the "@" as the field delimiter. You wouldn't have the actual "@" in the result but your subsequent tests would still be valid.

And as you noted, my comment on the "life of its own" was from being in the same situation a number of years ago - "validate the email addresses". Umm yah, sure... at a very high level maybe. :roll:
-craig

"You can never have too many knives" -- Logan Nine Fingers
jonesv
Premium Member
Premium Member
Posts: 19
Joined: Thu Jul 31, 2003 6:58 am
Location: Canada

Post by jonesv »

That makes my derivation much simpler. Thanks!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Make the assumption that it's a valid email address, and therefore contains only one "@". If you want to check this, search for the second occurrence of "@".

Code: Select all

If Index(InLink.EmailAddr, "@", 2) Then "invalid address" Else Index(Field(InLink.EmailAddr,"@",2,1),".",1)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

Or that DCount(field, '@') = 1...
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Kryt0n wrote:Or that DCount(field, '@') = 1...
No, sorry, that won't give the required answer - it will only be true if there are NO @ characters in the string.

Further, it's less efficient to use DCount() than Index() because DCount() must process the entire string whereas Index() can stop as soon as its condition is met.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jonesv
Premium Member
Premium Member
Posts: 19
Joined: Thu Jul 31, 2003 6:58 am
Location: Canada

Post by jonesv »

Understood. Thanks everyone!
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

I actually meant count rather than dcount (had dcount on my mind due to other thoughts). As for efficiency, I see minimal difference unless the odds on having 2 or more @ symbols in the string is high, since you are always searching for 2 in your index query, it is likely to scan the full string when the majority of records have a valid address.

On top of that an index for the second occurrence with NO @ symbol will attempt to run the Index/Field functions decreasing efficiency
Post Reply