From: yakir on
Thanks man..

"Ron Rosenfeld" wrote:

> On Mon, 24 May 2010 07:21:17 -0500, Chip Pearson <chip(a)cpearson.com> wrote:
>
> >In VBA insert a new module. Then go to the Tools menu, choose
> >References, and scroll down to and check "Microsoft VBScript Regular
> >Expressions 5.5". Then, use code like the following
> >
> >Sub AAA()
> >Dim LastRow As Long
> >Dim RowNdx As Long
> >Dim TopRow As Long
> >Dim WS As Worksheet
> >Dim Pattern As String
> >Dim RegEx As RegExp
> >Const ADDR_COL = "A" '<<<< CHANGE
> >TopRow = 1 '<<<< CHANGE
> >Pattern = "(^[a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}" & _
> > "\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})$"
> >Set RegEx = New RegExp
> >RegEx.Pattern = Pattern
> >Set WS = ActiveSheet
> >With WS
> > LastRow = .Cells(.Rows.Count, ADDR_COL).End(xlUp).Row
> > For RowNdx = LastRow To TopRow Step -1
> > If RegEx.Test(.Cells(RowNdx, ADDR_COL).Text) = False Then
> > .Rows(RowNdx).Delete
> > End If
> > Next RowNdx
> >End With
> >End Sub
> >
> >
> >Change the lines marked with <<<< to the correct values. ADDR_COL
> >should be the column letter of the column containing the email
> >addresses to test. TopRow should be the row number of the top of the
> >range of values to test. The code begins testing the addresses on the
> >row calculated as LastRow and moves upwards until it reaches TopRow,
> >at which point it quits. For each row between LastRow and TopRow
> >(inclusive), if an invalid email address is found in ADDR_COL, that
> >row is deleted.
> >
> >Cordially,
> >Chip Pearson
> >Microsoft Most Valuable Professional,
> > Excel, 1998 - 2010
> >Pearson Software Consulting, LLC
> >www.cpearson.com
> >
> >
> >
> >
>
> It's tough to devise a regex that can comply with the relevant internet
> standards. I think the current one is RFC5322
>
> Your regex excludes some email addresses which I believe are valid, and
> includes some which are invalid.
>
> For example, your regex excludes any TLD's that have more than four characters.
> But the current generic domain list includes .museum & .travel, and there will
> likely be more, longer lettered TLD's in the future.
>
> It also excludes some characters that are valid in email addresses, at least in
> the local part:
>
> ! # $ % & ' * + - / = ? ^ _ ` { | } ~
>
> It allows consecutive dots in the local part of the address, as well as
> allowing them at the start or end of the local part of the address -- and that
> is not valid.
>
> Since the OP is looking for malformed email addresses, I'd be concerned that
> your regex would eliminate some valid one's, and include some invalid one's.
>
> It can be pretty tough to devise something that does not have false positives
> or negatives. If IP addresses don't need to be included, I'd suggest something
> (admittedly incomplete) like (with the case insensitive option (ignorecase)
> equal to true):
>
>
> "^[-\w.%+']+@[A-Z0-9.-]+\.(?:aero|asia|biz|cat|com|coop|edu|gov|info|int|" _
> & "jobs|mil|mobi|museum|name|net|org|pro|tel|travel|[A-Z]{2})$"
>
> which will allow any two character domain name (to allow for the country codes)
> as well as the currently used specific list of generic top-level domains.
>
>
>
> --ron
> .
>
From: Ron Rosenfeld on
On Sat, 29 May 2010 00:15:01 -0700, yakir <yakir(a)discussions.microsoft.com>
wrote:

>Thanks man..

You're welcome. Glad to help.

If you are going to be using my modification, be sure to add a line after
these two in Chip's routine:


Set RegEx = New RegExp
RegEx.Pattern = Pattern

'Add this next line:
RegEx.IgnoreCase = True

--ron
First  |  Prev  | 
Pages: 1 2
Prev: Using Trim in a Userform
Next: Try this