Validating email id field

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
dr46014
Participant
Posts: 59
Joined: Fri Dec 08, 2006 11:09 am
Location: bhubaneswar

Validating email id field

Post by dr46014 »

hi...
in my flat file i m getting a email id field..i want to validate this field before inserting it into table..the email id should be in the format
xyz...@xyz...com or xyz...@xyz...co.xy.i need to write some function in the transform stage..so can u plz suggest me what code i need to write and what functions i need to use
thanks in advance
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Use the Matches function in the transformer. You dont need a routine. Something like

Code: Select all

If in.Link Matches '...@...com' then @True else @False
If you get a 1 that means email is in the format xyz@xyz.com else if you get a 0 that means its not in xyz@xyz.com
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
narasimha
Charter Member
Charter Member
Posts: 1236
Joined: Fri Oct 22, 2004 8:59 am
Location: Staten Island, NY

Post by narasimha »

In continuation to DSguru2B's solution

Code: Select all

If in.Link Matches '...@...' then in.Link else 'Invalid EmailId'
To handle other than .com extensions, to directly passing the valid Email Id's and specifying that the Email Id is invalid if it does not match your format.
Narasimha Kade

Finding answers is simple, all you need to do is come up with the correct questions.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Solutions given thus far do not verify that xyz at the beginning of the address and to the left of the @ symbol is the same as the xyz immediately to the right of the @ symbol. Is this, in fact, part of the requirement?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Can anybody validate this further or if you email these then tell me which ones failed?
Mamu Kim
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

You lost me there Kim. :roll:
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Sorry but wouldn't it be better if you knew these emails are real? Maybe someone has done this. If you email then you should get a return saying undeliverable. Has anybody figured out how to process this? This is taking this issue to the next level. If you have an email address that is kim@hotmail.com does not mean that it is a real email address. If you email to this address and it fails then you need to take it out of your database.

Maybe Walter does this on this web site. If anybody has done this then jump in and share. I think this is a solution a lot of us could benefit from because we all have email address to scrub. No WAVE or CASS files for this?
Mamu Kim
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Hmm. Interesting. Lets think about the logic and how to verifty the validity. But i am sure this is above the requirement of the OP, but a very interesting task. Dont know if its possible though without actually emailing the person, resulting in a lot of non-sense mail to the mailing addresses everytime a validity check is run. It can make a lot of people mad. But then again, this is my take on it. I might be wrong.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Not sure I'd want to be actually emailing anything to see if it's good, there's got to be a better way. One sec... I put 'validating email addresses' into Google and got 1.4 million hits and all kinds of references to packages to do this. Also some fun stuff like this that may help someone architect a solution.
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Dont know if that could be done with DS. But here is what i could come up. Contains validation checks only. Validation checks as follows
  1. -Check for Zero Length/NULL Email Address
    -Makes sure Length of email address is greater than 6 as the shortest email address to be found on the net is of length 6.
    -Checks for space in email address.
    -Check for exactly 1 @ symbol.
    -Makes ure @ symbol is not present in the begining or end of email
    -Checks for valid extension (.COM.EDU.GOV.NET.BIZ.ORG.TV) and ISO Country Codes.
Returns a 1 if the email is valid and a 0 if its invalid (fails all validations listed above)

Heres the routine.

Code: Select all



FUNCTION IsValidEmail(Arg1)


      
      Equate RoutineName To "IsValidEmail"
      in.Email=Trim(Arg1)
      emailLength=Len(in.Email)
      Ans=@FALSE

***Check for Zero Length
      If emailLength = 0 OR ISNULL(in.Email) = 1
      Then
         RETURN(Ans)
      END

***email address is shorter than 6 characters is impossible

      If emailLength < 6
      Then
         RETURN(Ans)
      End

***Check for exactly 1 @ symbol

      symbolCount=COUNT(in.Email, "@")

      If (symbolCount = 0 OR symbolCount > 1 )
      Then
         RETURN(Ans)
      End

***Check for space in email address

      spacePos = INDEX(in.Email, " ",1)
      If (spacePos > 0)
      Then
         RETURN(Ans)
      End

***@ symbol should not be in the begining or end of email
      symbolPosition=INDEX(in.Email, "@", 1)

      If (symbolPosition =1 OR symbolPosition = emailLength)
      Then
         RETURN(Ans)
      End


***Valid extensions including ISO country codes

      mailExt = UPCASE(".":Field(in.Email,".",Count(in.Email,".")+1))

      ISO_CountryCodes = ".COM.EDU.GOV.NET.BIZ.ORG.TV"

      ISO_CountryCodes := ".AF.AL.DZ.As.AD.AO.AI.AQ.AG.AP.AR.AM.AW.AU.AT.AZ.BS.BH.BD.BB.BY"

      ISO_CountryCodes := ".BE.BZ.BJ.BM.BT.BO.BA.BW.BV.BR.IO.BN.BG.BF.MM.BI.KH.CM.CA.CV.KY"


      ISO_CountryCodes := ".CF.TD.CL.CN.CX.CC.CO.KM.CG.CD.CK.CR.CI.HR.CU.CY.CZ.DK.DJ.DM.DO"

      ISO_CountryCodes := ".TP.EC.EG.SV.GQ.ER.EE.ET.FK.FO.FJ.FI.CS.SU.FR.FX.GF.PF.TF.GA.GM.GE.DE"

      ISO_CountryCodes := ".GH.GI.GB.GR.GL.GD.GP.GU.GT.GN.GW.GY.HT.HM.HN.HK.HU.IS.IN.ID.IR.IQ"

      ISO_CountryCodes := ".IE.IL.IT.JM.JP.JO.KZ.KE.KI.KW.KG.LA.LV.LB.LS.LR.LY.LI.LT.LU.MO.MK.MG"


      ISO_CountryCodes := ".MW.MY.MV.ML.MT.MH.MQ.MR.MU.YT.MX.FM.MD.MC.MN.MS.MA.MZ.NA"

      ISO_CountryCodes := ".NR.NP.NL.AN.NT.NC.NZ.NI.NE.NG.NU.NF.KP.MP.NO.OM.PK.PW.PA.PG.PY"


      ISO_CountryCodes := ".PE.PH.PN.PL.PT.PR.QA.RE.RO.RU.RW.GS.SH.KN.LC.PM.ST.VC.SM.SA.SN.SC"


      ISO_CountryCodes := ".SL.SG.SK.SI.SB.SO.ZA.KR.ES.LK.SD.SR.SJ.SZ.SE.CH.SY.TJ.TW.TZ.TH.TG.TK"


      ISO_CountryCodes := ".TO.TT.TN.TR.TM.TC.TV.UG.UA.AE.UK.US.UY.UM.UZ.VU.VA.VE.VN.VG.VI"


      ISO_CountryCodes := ".WF.WS.EH.YE.YU.ZR.ZM.ZW"


      i = INDEX(ISO_CountryCodes, mailExt, 1)

      If (i <= 0)
      Then
         RETURN(Ans)
      End

***Its finally valid
      Ans = @TRUE

I would love for you guys to try and break the code. This way it could be enhanced. Not the best thing around, but this is what i could come up with for now.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Syntactic validation (which can be done) is not the same thing as certification (which, as far as I am aware, can only be done by trial and error, and still requires the destination mail server to be up).

There is nothing in DataStage or QualityStage for performing email address certification.

Does anyone know the email address rules for IPv6?

DSGuru2B, .INFO and .NAME are also valid extensions, and that's without researching further. Your checks for "@" could have been combined, just using the Index() function. Similarly your checks for length could have been combined. Having checked for one of the country extensions such as ".uk" you still need to check that the previous element is a valid extension code. Some countries use ".co" rather than ".com", for example "jolt.co.uk". So you can get "amp.com.au" but "amp.co.nz".
Depending on how you write your new revised function, would it properly handle "billg@microsoft.com.net"? (And, indeed, is this a valid address?) Time to refer to the standards documents!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Ill definately combine the checks. I'll also add the .NAME and .INFO. Tell me this, isnt there any way where we can open a socket via Basic coding. I was thinking of doing something in those lines to check for a valid DSN. Even if i want to something at the OS level, some companies have there ping functionality blocked, i know mine did, so that wont cut it. Google wasnt much help either except for psuedocode.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There is a sockets library in UniVerse, but it was added after the split from DataStage. There is no BASIC-callable sockets library for DataStage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Thats what i thought too. Well, guess email format check is all that can be done. [sigh]
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Post Reply