The is an excerpt from the Perl 5 cookbook
quote:
The common patterns that people try to use for this are all quite incorrect. As an example, the address fred&barney@stonehenge.com is valid and deliverable (as of this writing), but most patterns that allegedly match valid mail addresses fail miserably.
RFC-822 documents have a formal specification for what constitutes a syntactically valid mail address. However, complete processing requires recursive parsing of nested comments, something that one single regular expression cannot do. If you first strip off legal comments:
1 while $addr =~ s/\([^()]*\)//g;
You could then in theory use the 6598-byte pattern given on the last page of Mastering Regular Expressions to test for RFC-conformance, but that's still not good enough, for three reasons.
First, not all RFC-valid address are deliverable. For example, foo@foo.foo.foo.foo is valid in form, but in practice is not deliverable. Some people try to do DNS lookups for MX records, even trying to connect to the host handling that address's mail to check if it's valid at that site. This is a poor approach because most sites can't do a direct connect to any other site, and even if they could, mail receiving sites increasingly either ignore the SMTP VRFY command or fib about its answer.
Second, some RFC-invalid addresses, in practice, are perfectly deliverable. For example, a lone postmaster is almost certainly deliverable but doesn't pass RFC-822 muster. It doesn't have an @ in it.
Thirdly and most important, just because the address happens to be both valid and deliverable doesn't mean that it's the right one. president@whitehouse.gov, for example, is valid by the RFC and deliverable. But it's very unlikely that would really be the mail address of the person submitting information to your CGI script.
The script at http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/ckaddr.gz makes a valiant (albeit provably imperfect) attempt at doing this incorrectly. It jumps through many hoops, including the RFC-822 regular expression from Mastering Regular Expressions, DNS MX record look-up, and stop lists for naughty words and famous people. But this is still a very weak approach.
Basically, your regexp will probably be fine for 99.9% of cases, but there are possibly some cases it'll rehect when it shouldn't