By acrosman

2008-10-14 14:14:34 8 Comments

Over the years I have slowly developed a regular expression that validates MOST email addresses correctly, assuming they don't use an IP address as the server part.

I use it in several PHP programs, and it works most of the time. However, from time to time I get contacted by someone that is having trouble with a site that uses it, and I end up having to make some adjustment (most recently I realized that I wasn't allowing 4-character TLDs).

What is the best regular expression you have or have seen for validating emails?

I've seen several solutions that use functions that use several shorter expressions, but I'd rather have one long complex expression in a simple function instead of several short expression in a more complex function.


@Carli Beeli 2019-03-11 19:59:13

For Angular2 / Angular7 I use this pattern:

emailPattern = '^[a-zA-Z0-9_.+-][email protected][a-zA-Z0-9-]+[.]+[a-zA-Z0-9-.]+(\\s)*';

private createForm() {
  this.form ={
    email: ['', [Validators.required, Validators.pattern(this.emailPattern)]]

It also allows for extra spaces at the end, which you should truncate before sending it to the backend, but some users, especially on mobile are easy to mistakenly add a space at the end.

@Jonathan Leffler 2019-03-13 07:16:16

I think you'll find that lets through invalid email addresses.

@Carli Beeli 2019-03-14 10:44:04

@JonathanLeffler Thanks for the hint. Do you have any example? How would you fix it?

@Rinke 2012-12-28 21:07:53

Quick answer

Use the following regex for input validation:

([-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)+

Addresses matched by this regex:

  • have a local part (i.e. the part before the @-sign) that is strictly compliant with RFC 5321/5322,
  • have a domain part (i.e. the part after the @-sign) that is a host name with at least two labels, each of which is at most 63 characters long.

The second constraint is a restriction on RFC 5321/5322.

Elaborate answer

Using a regular expression that recognizes email addresses could be useful in various situations: for example to scan for email addresses in a document, to validate user input, or as an integrity constraint on a data repository.

It should however be noted that if you want to find out if the address actually refers to an existing mailbox, there's no substitute for sending a message to the address. If you only want to check if an address is grammatically correct then you could use a regular expression, but note that ""@[] is a grammatically correct email address that certainly doesn't refer to an existing mailbox.

The syntax of email addresses has been defined in various RFCs, most notably RFC 822 and RFC 5322. RFC 822 should be seen as the "original" standard and RFC 5322 as the latest standard. The syntax defined in RFC 822 is the most lenient and subsequent standards have restricted the syntax further and further, where newer systems or services should recognize obsolete syntax, but never produce it.

In this answer I’ll take “email address” to mean addr-spec as defined in the RFCs (i.e. [email protected], but not "John Doe"<[email protected]>, nor some-group:[email protected],[email protected];).

There's one problem with translating the RFC syntaxes into regexes: the syntaxes are not regular! This is because they allow for optional comments in email addresses that can be infinitely nested, while infinite nesting can't be described by a regular expression. To scan for or validate addresses containing comments you need a parser or more powerful expressions. (Note that languages like Perl have constructs to describe context free grammars in a regex-like way.) In this answer I'll disregard comments and only consider proper regular expressions.

The RFCs define syntaxes for email messages, not for email addresses as such. Addresses may appear in various header fields and this is where they are primarily defined. When they appear in header fields addresses may contain (between lexical tokens) whitespace, comments and even linebreaks. Semantically this has no significance however. By removing this whitespace, etc. from an address you get a semantically equivalent canonical representation. Thus, the canonical representation of first. last (comment) @ [] is [email protected][].

Different syntaxes should be used for different purposes. If you want to scan for email addresses in a (possibly very old) document it may be a good idea to use the syntax as defined in RFC 822. On the other hand, if you want to validate user input you may want to use the syntax as defined in RFC 5322, probably only accepting canonical representations. You should decide which syntax applies to your specific case.

I use POSIX "extended" regular expressions in this answer, assuming an ASCII compatible character set.

RFC 822

I arrived at the following regular expression. I invite everyone to try and break it. If you find any false positives or false negatives, please post them in a comment and I'll try to fix the expression as soon as possible.

([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]))*(\\\r)*")(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]))*(\\\r)*"))*@([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]))*(\\\r)*])(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]))*(\\\r)*]))*

I believe it's fully complient with RFC 822 including the errata. It only recognizes email addresses in their canonical form. For a regex that recognizes (folding) whitespace see the derivation below.

The derivation shows how I arrived at the expression. I list all the relevant grammar rules from the RFC exactly as they appear, followed by the corresponding regex. Where an erratum has been published I give a separate expression for the corrected grammar rule (marked "erratum") and use the updated version as a subexpression in subsequent regular expressions.

As stated in paragraph 3.1.4. of RFC 822 optional linear white space may be inserted between lexical tokens. Where applicable I've expanded the expressions to accommodate this rule and marked the result with "opt-lwsp".

CHAR        =  <any ASCII character>
            =~ .

CTL         =  <any ASCII control character and DEL>
            =~ [\x00-\x1F\x7F]

CR          =  <ASCII CR, carriage return>
            =~ \r

LF          =  <ASCII LF, linefeed>
            =~ \n

SPACE       =  <ASCII SP, space>

HTAB        =  <ASCII HT, horizontal-tab>
            =~ \t

<">         =  <ASCII quote mark>
            =~ "

CRLF        =  CR LF
            =~ \r\n

LWSP-char   =  SPACE / HTAB
            =~ [ \t]

linear-white-space =  1*([CRLF] LWSP-char)
                   =~ ((\r\n)?[ \t])+

specials    =  "(" / ")" / "<" / ">" / "@" /  "," / ";" / ":" / "\" / <"> /  "." / "[" / "]"
            =~ [][()<>@,;:\\".]

quoted-pair =  "\" CHAR
            =~ \\.

qtext       =  <any CHAR excepting <">, "\" & CR, and including linear-white-space>
            =~ [^"\\\r]|((\r\n)?[ \t])+

dtext       =  <any CHAR excluding "[", "]", "\" & CR, & including linear-white-space>
            =~ [^][\\\r]|((\r\n)?[ \t])+

quoted-string  =  <"> *(qtext|quoted-pair) <">
               =~ "([^"\\\r]|((\r\n)?[ \t])|\\.)*"
(erratum)      =~ "(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*"

domain-literal =  "[" *(dtext|quoted-pair) "]"
               =~ \[([^][\\\r]|((\r\n)?[ \t])|\\.)*]
(erratum)      =~ \[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]

atom        =  1*<any CHAR except specials, SPACE and CTLs>
            =~ [^][()<>@,;:\\". \x00-\x1F\x7F]+

word        =  atom / quoted-string
            =~ [^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*"

domain-ref  =  atom

sub-domain  =  domain-ref / domain-literal
            =~ [^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]

local-part  =  word *("." word)
            =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*"))*
(opt-lwsp)  =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")(((\r\n)?[ \t])*\.((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*"))*

domain      =  sub-domain *("." sub-domain)
            =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*])(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]))*
(opt-lwsp)  =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*])(((\r\n)?[ \t])*\.((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]))*

addr-spec   =  local-part "@" domain
            =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*"))*@([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*])(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]))*
(opt-lwsp)  =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")((\r\n)?[ \t])*(\.((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")((\r\n)?[ \t])*)*@((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*])(((\r\n)?[ \t])*\.((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]))*
(canonical) =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]))*(\\\r)*")(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]))*(\\\r)*"))*@([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]))*(\\\r)*])(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]))*(\\\r)*]))*

RFC 5322

I arrived at the following regular expression. I invite everyone to try and break it. If you find any false positives or false negatives, please post them in a comment and I'll try to fix the expression as soon as possible.

([-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*|\[[\t -Z^-~]*])

I believe it's fully complient with RFC 5322 including the errata. It only recognizes email addresses in their canonical form. For a regex that recognizes (folding) whitespace see the derivation below.

The derivation shows how I arrived at the expression. I list all the relevant grammar rules from the RFC exactly as they appear, followed by the corresponding regex. For rules that include semantically irrelevant (folding) whitespace, I give a separate regex marked "(normalized)" that doesn't accept this whitespace.

I ignored all the "obs-" rules from the RFC. This means that the regexes only match email addresses that are strictly RFC 5322 compliant. If you have to match "old" addresses (as the looser grammar including the "obs-" rules does), you can use one of the RFC 822 regexes from the previous paragraph.

VCHAR           =   %x21-7E
                =~  [!-~]

ALPHA           =   %x41-5A / %x61-7A
                =~  [A-Za-z]

DIGIT           =   %x30-39
                =~  [0-9]

HTAB            =   %x09
                =~  \t

CR              =   %x0D
                =~  \r

LF              =   %x0A
                =~  \n

SP              =   %x20

DQUOTE          =   %x22
                =~  "

CRLF            =   CR LF
                =~  \r\n

WSP             =   SP / HTAB
                =~  [\t ]

quoted-pair     =   "\" (VCHAR / WSP)
                =~  \\[\t -~]

FWS             =   ([*WSP CRLF] 1*WSP)
                =~  ([\t ]*\r\n)?[\t ]+

ctext           =   %d33-39 / %d42-91 / %d93-126
                =~  []!-'*-[^-~]

("comment" is left out in the regex)
ccontent        =   ctext / quoted-pair / comment
                =~  []!-'*-[^-~]|(\\[\t -~])

(not regular)
comment         =   "(" *([FWS] ccontent) [FWS] ")"

(is equivalent to FWS when leaving out comments)
CFWS            =   (1*([FWS] comment) [FWS]) / FWS
                =~  ([\t ]*\r\n)?[\t ]+

atext           =   ALPHA / DIGIT / "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "/" / "=" / "?" / "^" / "_" / "`" / "{" / "|" / "}" / "~"
                =~  [-!#-'*+/-9=?A-Z^-~]

dot-atom-text   =   1*atext *("." 1*atext)
                =~  [-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*

dot-atom        =   [CFWS] dot-atom-text [CFWS]
                =~  (([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*(([\t ]*\r\n)?[\t ]+)?
(normalized)    =~  [-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*

qtext           =   %d33 / %d35-91 / %d93-126
                =~  []!#-[^-~]

qcontent        =   qtext / quoted-pair
                =~  []!#-[^-~]|(\\[\t -~])

quoted-string   =   [CFWS] DQUOTE ((1*([FWS] qcontent) [FWS]) / FWS) DQUOTE [CFWS]
                =~  (([\t ]*\r\n)?[\t ]+)?"(((([\t ]*\r\n)?[\t ]+)?([]!#-[^-~]|(\\[\t -~])))+(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?)"(([\t ]*\r\n)?[\t ]+)?
(normalized)    =~  "([]!#-[^-~ \t]|(\\[\t -~]))+"

dtext           =   %d33-90 / %d94-126
                =~  [!-Z^-~]

domain-literal  =   [CFWS] "[" *([FWS] dtext) [FWS] "]" [CFWS]
                =~  (([\t ]*\r\n)?[\t ]+)?\[((([\t ]*\r\n)?[\t ]+)?[!-Z^-~])*(([\t ]*\r\n)?[\t ]+)?](([\t ]*\r\n)?[\t ]+)?
(normalized)    =~  \[[\t -Z^-~]*]

local-part      =   dot-atom / quoted-string
                =~  (([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?"(((([\t ]*\r\n)?[\t ]+)?([]!#-[^-~]|(\\[\t -~])))+(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?)"(([\t ]*\r\n)?[\t ]+)?
(normalized)    =~  [-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+"

domain          =   dot-atom / domain-literal
                =~  (([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?\[((([\t ]*\r\n)?[\t ]+)?[!-Z^-~])*(([\t ]*\r\n)?[\t ]+)?](([\t ]*\r\n)?[\t ]+)?
(normalized)    =~  [-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*|\[[\t -Z^-~]*]

addr-spec       =   local-part "@" domain
                =~  ((([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?"(((([\t ]*\r\n)?[\t ]+)?([]!#-[^-~]|(\\[\t -~])))+(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?)"(([\t ]*\r\n)?[\t ]+)?)@((([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?\[((([\t ]*\r\n)?[\t ]+)?[!-Z^-~])*(([\t ]*\r\n)?[\t ]+)?](([\t ]*\r\n)?[\t ]+)?)
(normalized)    =~  ([-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*|\[[\t -Z^-~]*])

Note that some sources (notably w3c) claim that RFC 5322 is too strict on the local part (i.e. the part before the @-sign). This is because "..", "a..b" and "a." are not valid dot-atoms, while they may be used as mailbox names. The RFC, however, does allow for local parts like these, except that they have to be quoted. So instead of [email protected] you should write "a..b", which is semantically equivalent.

Further restrictions

SMTP (as defined in RFC 5321) further restricts the set of valid email addresses (or actually: mailbox names). It seems reasonable to impose this stricter grammar, so that the matched email address can actually be used to send an email.

RFC 5321 basically leaves alone the "local" part (i.e. the part before the @-sign), but is stricter on the domain part (i.e. the part after the @-sign). It allows only host names in place of dot-atoms and address literals in place of domain literals.

The grammar presented in RFC 5321 is too lenient when it comes to both host names and IP addresses. I took the liberty of "correcting" the rules in question, using this draft and RFC 1034 as guidelines. Here's the resulting regex.

([-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)*|\[((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}|IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)|(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+)])

Note that depending on the use case you may not want to allow for a "General-address-literal" in your regex. Also note that I used a negative lookahead (?!IPv6:) in the final regex to prevent the "General-address-literal" part to match malformed IPv6 addresses. Some regex processors don't support negative lookahead. Remove the substring |(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+ from the regex if you want to take the whole "General-address-literal" part out.

Here's the derivation:

Let-dig         =   ALPHA / DIGIT
                =~  [0-9A-Za-z]

Ldh-str         =   *( ALPHA / DIGIT / "-" ) Let-dig
                =~  [0-9A-Za-z-]*[0-9A-Za-z]

(regex is updated to make sure sub-domains are max. 63 charactes long - RFC 1034 section 3.5)
sub-domain      =   Let-dig [Ldh-str]
                =~  [0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?

Domain          =   sub-domain *("." sub-domain)
                =~  [0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)*

Snum            =   1*3DIGIT
                =~  [0-9]{1,3}

(suggested replacement for "Snum")
ip4-octet       =   DIGIT / %x31-39 DIGIT / "1" 2DIGIT / "2" %x30-34 DIGIT / "25" %x30-35
                =~  25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9]

IPv4-address-literal    =   Snum 3("."  Snum)
                        =~  [0-9]{1,3}(\.[0-9]{1,3}){3}

(suggested replacement for "IPv4-address-literal")
ip4-address     =   ip4-octet 3("." ip4-octet)
                =~  (25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}

(suggested replacement for "IPv6-hex")
ip6-h16         =   "0" / ( (%x49-57 / %x65-70 /%x97-102) 0*3(%x48-57 / %x65-70 /%x97-102) )
                =~  0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}

(not from RFC)
ls32            =   ip6-h16 ":" ip6-h16 / ip4-address
                =~  (0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}

(suggested replacement of "IPv6-addr")
ip6-address     =                                      6(ip6-h16 ":") ls32
                    /                             "::" 5(ip6-h16 ":") ls32
                    / [                 ip6-h16 ] "::" 4(ip6-h16 ":") ls32
                    / [ *1(ip6-h16 ":") ip6-h16 ] "::" 3(ip6-h16 ":") ls32
                    / [ *2(ip6-h16 ":") ip6-h16 ] "::" 2(ip6-h16 ":") ls32
                    / [ *3(ip6-h16 ":") ip6-h16 ] "::"   ip6-h16 ":"  ls32
                    / [ *4(ip6-h16 ":") ip6-h16 ] "::"                ls32
                    / [ *5(ip6-h16 ":") ip6-h16 ] "::"   ip6-h16
                    / [ *6(ip6-h16 ":") ip6-h16 ] "::"
                =~  (((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::

IPv6-address-literal    =   "IPv6:" ip6-address
                        =~  IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)

Standardized-tag        =   Ldh-str
                        =~  [0-9A-Za-z-]*[0-9A-Za-z]

dcontent        =   %d33-90 / %d94-126
                =~  [!-Z^-~]

General-address-literal =   Standardized-tag ":" 1*dcontent
                        =~  [0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+

address-literal =   "[" ( IPv4-address-literal / IPv6-address-literal / General-address-literal ) "]"
                =~  \[((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}|IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)|(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+)]

Mailbox         =   Local-part "@" ( Domain / address-literal )
                =~  ([-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)*|\[((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}|IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)|(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+)])

User input validation

A common use case is user input validation, for example on an html form. In that case it's usually reasonable to preclude address-literals and to require at least two labels in the hostname. Taking the improved RFC 5321 regex from the previous section as a basis, the resulting expression would be:

([-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)+

I do not recommend restricting the local part further, e.g. by precluding quoted strings, since we don't know what kind of mailbox names some hosts allow (like "a..b" or even "a b"

I also do not recommend explicitly validating against a list of literal top-level domains or even imposing length-constraints (remember how ".museum" invalidated [a-z]{2,4}), but if you must:

([-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?\.)*(net|org|com|info|etc...)

Make sure to keep your regex up-to-date if you decide to go down the path of explicit top-level domain validation.

Further considerations

When only accepting host names in the domain part (after the @-sign), the regexes above accept only labels with at most 63 characters, as they should. However, they don't enforce the fact that the entire host name must be at most 253 characters long (including the dots). Although this constraint is strictly speaking still regular, it's not feasible to make a regex that incorporates this rule.

Another consideration, especially when using the regexes for input validation, is feedback to the user. If a user enters an incorrect address, it would be nice to give a little more feedback than a simple "syntactically incorrect address". With "vanilla" regexes this is not possible.

These two considerations could be addressed by parsing the address. The extra length constraint on host names could in some cases also be addressed by using an extra regex that checks it, and matching the address against both expressions.

None of the regexes in this answer are optimized for performance. If performance is an issue, you should see if (and how) the regex of your choice can be optimized.

@user2350426 2015-06-26 17:30:37

RFC 6532 updates 5322 to allow and include full, clean UTF-8. Additional details here.

@Xavi Montero 2017-05-22 00:35:07

According to wikipedia seems that the local part, when dotted, has a limitation of 64 chars per part, and also the RFC 5322 refers to the dotted local part to be interpretted with the restrictions of the domains. For example arbitrary-long-email-address-should-be-invalid-arbitrary-lon‌​g-email-address-shou‌​ld-be-invalid.and-th‌​e-second-group-also-‌​should-not-be-so-lon‌​g-and-the-second-gro‌​up-also-should-not-b‌​[email protected]‌​m should not validate. I suggest changing the "+" signs in the first group (name before the optional dot) and in the second group (name after the following dots) to {1,64}

@Xavi Montero 2017-05-22 00:39:04

As the comments are limited in size, here is the resulting regex I plan to use, which is the one at the beginning of this answer, plus limitting the size in the local part, plus adding a back-slash prior to the "/" symbol as required by PHP and also in In PHP I use: $emailRegex = '/^([-!#-\'*+\/-9=?A-Z^-~]{1,64}(\.[-!#-\'*+\/-9=?A-Z^-~]{1,‌​64})*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A‌​-Za-z]([0-9A-Za-z-]{‌​0,61}[0-9A-Za-z])?)+‌​$/';

@Xavi Montero 2017-05-22 00:48:11

CAUTION: For some reason, StackOverflow adds hidden characters when copying from the rendered markdown. Copy it into the and you'll see black dots there. You have to remove them and correct the string... Maybe if integrated in the answer, there they are correctly copiable. Sorry for the inconvenience. I don't want to add a new answer as this one is the proper one. Also I don't want to directly edit unless the community thinks this should be integrated into it.

@Rinke 2017-05-22 11:21:26

@XaviMontero Thaks for contributing Xavi! Do you have a reference to the RFC stating the 64 character limit on local part labels? If so, I would gladly adjust the answer.

@Savas Adar 2018-11-05 09:11:04

I use this;

^(([^<>()\[\]\\.,;:\[email protected]"]+(\.[^<>()\[\]\\.,;:\[email protected]"]+)*)|(".+"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$

@Naveen Soni 2018-08-04 09:43:48

Short And Easy Syntax Of Regular Expression

"^(?!\.)(""([^""\r\\]|\\[""\r\\])*""|" + @"([-a-z0-9!#$%&'*+/=?^_`{|}~]|(?<!\.)\.)*)(?<!\.)" + @"@[a-z0-9][\w\.-]*[a-z0-9]\.[a-z][a-z\.]*[a-z]$"

Use in your code for validate email.

@Dave Black 2018-07-13 20:27:57

According to RFC 2821 and RFC 2822, the local-part of an email addresses may use any of these ASCII characters:

  1. Uppercase and lowercare letters
  2. The digits 0 through 9
  3. The characters, !#$%&'*+-/=?^_`{|}~
  4. The character "." provided that it is not the first or last character in the local-part.



For one that is RFC 2821, 2822 Compliant you can use:


Email - RFC 2821, 2822 Compliant

@Luna 2015-08-14 12:30:42

The HTML5 spec suggests a simple regex for validating email addresses:

/^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-][email protected][a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/

This intentionally doesn't comply with RFC 5322.

Note: This requirement is a willful violation of RFC 5322, which defines a syntax for e-mail addresses that is simultaneously too strict (before the @ character), too vague (after the @ character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.

The total length could also be limited to 254 characters, per RFC 3696 errata 1690.

@Ryan Taylor 2017-11-06 22:13:40

Best answer! Here's a link to the w3 recommendation: This regex is adopted by many browsers.

@Sheridan 2018-03-21 11:47:58

This is SO not the best answer! This pattern matches this wholly invalid address: [email protected]. I would urge caution and much testing before you use it!

@Luna 2018-03-21 12:56:24

@Sheridan, if you think there is an issue with the HTML5 spec you can raise an issue here:

@user743382 2018-04-29 21:50:04

This doesn't add much over and would IMHO be better as an edit of or comment on that.

@Mitch Satchwell 2018-05-16 09:05:41

[email protected] is valid, but for a real world application you may want to enforce a domain extension, all you need to do is change the final * to a + to achieve this (changing that part of the pattern from 0+ to 1+)

@Hany Sakr 2017-09-27 10:41:14

Nice, I converted the code into java to match the compiler

String pattern ="(?:[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")@(?:(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\\.)+[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?|\\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-zA-Z0-9-]*[a-zA-Z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])";

@bortzmeyer 2008-10-14 14:26:43

The fully RFC 822 compliant regex is inefficient and obscure because of its length. Fortunately, RFC 822 was superseded twice and the current specification for email addresses is RFC 5322. RFC 5322 leads to a regex that can be understood if studied for a few minutes and is efficient enough for actual use.

One RFC 5322 compliant regex can be found at the top of the page at but uses the IP address pattern that is floating around the internet with a bug that allows 00 for any of the unsigned byte decimal values in a dot-delimited address, which is illegal. The rest of it appears to be consistent with the RFC 5322 grammar and passes several tests using grep -Po, including cases domain names, IP addresses, bad ones, and account names with and without quotes.

Correcting the 00 bug in the IP pattern, we obtain a working and fairly fast regex. (Scrape the rendered version, not the markdown, for actual code.)


Here is diagram of finite state machine for above regexp which is more clear than regexp itself enter image description here

The more sophisticated patterns in Perl and PCRE (regex library used e.g. in PHP) can correctly parse RFC 5322 without a hitch. Python and C# can do that too, but they use a different syntax from those first two. However, if you are forced to use one of the many less powerful pattern-matching languages, then it’s best to use a real parser.

It's also important to understand that validating it per the RFC tells you absolutely nothing about whether that address actually exists at the supplied domain, or whether the person entering the address is its true owner. People sign others up to mailing lists this way all the time. Fixing that requires a fancier kind of validation that involves sending that address a message that includes a confirmation token meant to be entered on the same web page as was the address.

Confirmation tokens are the only way to know you got the address of the person entering it. This is why most mailing lists now use that mechanism to confirm sign-ups. After all, anybody can put down [email protected], and that will even parse as legal, but it isn't likely to be the person at the other end.

For PHP, you should not use the pattern given in Validate an E-Mail Address with PHP, the Right Way from which I quote:

There is some danger that common usage and widespread sloppy coding will establish a de facto standard for e-mail addresses that is more restrictive than the recorded formal standard.

That is no better than all the other non-RFC patterns. It isn’t even smart enough to handle even RFC 822, let alone RFC 5322. This one, however, is.

If you want to get fancy and pedantic, implement a complete state engine. A regular expression can only act as a rudimentary filter. The problem with regular expressions is that telling someone that their perfectly valid e-mail address is invalid (a false positive) because your regular expression can't handle it is just rude and impolite from the user's perspective. A state engine for the purpose can both validate and even correct e-mail addresses that would otherwise be considered invalid as it disassembles the e-mail address according to each RFC. This allows for a potentially more pleasing experience, like

The specified e-mail address '[email protected],com' is invalid. Did you mean '[email protected]'?

See also Validating Email Addresses, including the comments. Or Comparing E-mail Address Validating Regular Expressions.

Regular expression visualization

Debuggex Demo

@Tomalak 2008-10-14 14:33:33

You said "There is no good regular expression." Is this general or specific to e-mail address validation?

@Luk 2008-10-14 16:23:56

@Tomalak: only for email addresses. As bortzmeyer said, the RFC is extremely complicated

@Dominic Sayers 2009-04-08 15:56:16

The linux journal article you mention is factually wrong in several respects. In particular Lovell clearly hasn't read the errata to RFC3696 and repeats some of the errors in the published version of the RFC. More here:

@CMircea 2010-03-22 18:26:42

Jeff Atwood has a lovely regex in this blog post to validate all valid email addresses:

@Nick Presta 2010-07-18 05:57:10

See: for a valid use of regular expressions to validate an email address.

@Zsolti 2010-07-23 20:00:35

These scripts seems to not work with unicode domain names

@tchrist 2010-11-07 20:00:44

If “there is no good regular expression”, then how come this answer seems to have managed it?

@tchrist 2010-11-07 20:05:07

@Zsolti: Is there a spec regarding Unicode domain names? This solution enumerates a set of legit domain text characters in its <dtext> token. That regex is obviously written from an RFC. If there’s an updated one that should have priority, I’m sure it would be trivial to update, just because Abigail’s pattern is so well-written.

@bortzmeyer 2010-11-23 13:01:15

@tchrist Yes, there is a standard for Unicode domain names and it is seven years old (RFC 3490, now RFC 5891 and 5892).

@porges 2011-05-09 09:38:40

@bortzmeyer: Yes, there is a standard for Unicode domain names, but the relevant RFCs for their integration into email are still in EXPERIMENTAL status, and subject to change. RFC5322 hasn't yet been updated to handle them.

@Random832 2011-11-03 17:50:17

The monster regex targets the "address" grammar element, it's perfectly reasonable to require users to type in an "addr-spec", and make display name a separate box. The regex is so large because it has to repeat addr-spec numerous times (and allows folding whitespace, which you could simply require users not to use). Your web form is not an SMTP server, it doesn't have to deal with "groups" or multiple equivalent [due to whitespace and display names] forms of an address. An addr-spec regex that doesn't allow folding whitespace ends up only being a hundred characters or so.

@dolmen 2011-12-17 12:50:44

Most of the RFC5322 is irrelevant to the question. Because the RFC describes also how to format a list of multiple addresses, or some metadata (ex: display name) for a mailbox. Most "huge regexp" in other answers just give nearly full RFC5322 regexps and are so irrelevant too.

@aug 2013-09-08 17:48:49

Here are some offical rules for valid emails:

@Synchro 2014-09-09 16:14:32

Note that the current HTML5 spec includes a regex and ABNF for email-type input validation that is deliberately more restrictive than the original RFCs.

@Synchro 2014-09-09 16:21:33

RFC 5336 has been deprecated in favour of the much shinier RFC 6531 (SMTPUTF8), which has has already been deployed by google in gmail and is shipping in Postfix. Suddenly Unicode email addresses got a lot more interesting.

@moldcraft 2014-10-06 22:48:29

Domain name can contain spaces? the regex matches [email protected] täst . d e

@BananaMan 2014-11-29 14:13:54

With the new domain extensions, shouldn't this be updated since the last part can contain {2, 13} characters now?

@Luna 2015-06-04 11:51:11

Alternatively, follow the HTML5 spec's definition of an email address. It disagrees with the RFC, but agrees with real world usage. It's very easy to validate.

@amandathewebdev 2015-07-06 16:35:29

This is a great answer. I do have a question though, I know my Marketing Director will argue with me over this and stress why we need valid email addresses. I understand his viewpoint from a leads perspective but I also understand it seems impractical and impossible to filter every single potentially invalid email. Could someone just do me a solid and tell me this is really right before I go have a debate with someone who doesn't get it?

@Code Jockey 2015-07-27 13:57:01

RE: Debuggex Demo - maybe I'm ignorant of the spec, not having read it, or maybe this proves the point that regex is not a good way to validate email addresses... but is :; a valid email address? it passes the debuggex hosted expression(?)

@Sander Visser 2016-09-09 15:42:41

Note that php 7.1 has a new e-mail validation implementation. RFC 6531 that supports international e-mail.

@Garret Wilson 2017-05-13 21:32:09

I'm trying to validate [email protected], but this regular expression says it's invalid. It seems to want a . in the domain. But I read RFC 5322 to say domain is dot-atom / domain-literal / obs-domain, and dot-atom is 1*atext *("." 1*atext), which looks to me as if the dot is optional. Am I reading that correctly?

@Garret Wilson 2017-05-13 21:59:00

It doesn't seem to like [email protected][IPv6:2001:DB8::1], either, which says should be valid. I'm testing at . Am I just doing something wrong?

@Segfault 2017-09-28 20:04:06

"perfectly valid e-mail address is invalid (a false positive)" Isn't that a false negative?

@Jonas 2017-12-01 10:12:52

Some valid emails aren't matched in the regex: "very.(),:;<>[]\".VERY.\"[email protected]\\ \"very\".unusual" [email protected] "()<>[]:,;@\\\"!#$%&'-/=?^_`{}| ~.a" [email protected] [email protected][2001:DB8::1] Source:

@Jamie Davis 2018-02-22 19:56:11

This regex is vulnerable to catastrophic backtracking in JavaScript (node/V8). I did not test other languages.

@Alnitak 2018-06-21 08:33:01

The only useful regex for validating email is /@/

@artgrohe 2018-07-15 10:33:58

throws error in my html input pattern attribute...

@sajanyamaha 2018-10-15 15:33:35

will this pass ? '[email protected]' But is there any singlr char domains available or is it even allowed ??

@Rick Gladwin 2019-01-18 02:00:07

The given regex pattern allows single-character top level domains. These are technically allowed (see but none currently exist, meaning that any email that includes a single-character TLD is actually invalid. Since the focus is on email validation specifically, shouldn't the pattern exclude single character TLDs?

@José Ernesto Lara Rodríguez 2019-02-21 21:22:45

Both the regexp and the diagram have an error: in the domain part, one of the character class ranges include the range \x21-\x5a and the range \x53-\x7f, which overlap (5a > 53).

@Jared Beck 2019-04-10 22:36:39

RFC 822, section 6.2.4. specifically and explicitly allows capital letters, but this answer does not. Perhaps the author of this answer intended for their regex to be applied insensitively. If so, that should be made explicit in the body of the answer.

@Simon_Weaver 2017-02-10 00:52:03

Just about every RegEx I've seen - including some used by Microsoft will not allow the following valid email to get through : [email protected]

Just had a real customer with an email address in this format who couldn't place an order.

Here's what I settled on:

  • A minimal Regex that won't have false negatives. Alternatively use the MailAddress constructor with some additional checks (see below):
  • Checking for common typos .cmo or and asking for confirmation Are you sure this is your correct email address. It looks like there may be a mistake. Allow the user to accept what they typed if they are sure.
  • Handling bounces when the email is actually sent and manually verifying them to check for obvious mistakes.

            var email = new MailAddress(str);

            if (email.Host.EndsWith(".cmo"))
                return EmailValidation.PossibleTypo;

            if (!email.Host.EndsWith(".") && email.Host.Contains("."))
                return EmailValidation.OK;
            return EmailValidation.Invalid;

@Kerem Demirer 2017-03-23 21:36:32

This answer is misleading and unrelated to question. Allowing users to enter wrong email is a business decision, question is about validating it with regex.

@Prassd Nidode 2016-12-28 10:17:43

List item

I use this function

function checkmail($value){
        $value = trim($value);
        if( stristr($value,"@") 
            && stristr($value,".") 
            && (strrpos($value, ".") - stripos($value, "@") > 2) 
            && (stripos($value, "@") > 1) 
            && (strlen($value) - strrpos($value, ".") < 6) 
            && (strlen($value) - strrpos($value, ".") > 2) 
            && ($value == preg_replace('/[ ]/', '', $value)) 
            && ($value == preg_replace('/[^A-Za-z0-9\[email protected]!*]/', '', $value))

            return "Invalid Mail-Id";

@FlameStorm 2016-12-13 20:12:17

For me the right way for checking emails is:

  1. Check that symbol @ exists, and before and after it there are some [email protected] symbols: /^[^@][email protected][^@]+$/
  2. Try to send an email to this address with some "activation code".
  3. When the user "activated" his email address, we will see that all is right.

Of course, you can show some warning or tooltip in front-end when user typed "strange" email to help him to avoid common mistakes, like no dot in domain part or spaces in name without quoting and so on. But you must accept the address "[email protected]" if user really want it.

Also, you must remember that email address standard was and can evolute, so you can't just type some "standard-valid" regexp once and for all times. And you must remember that some concrete internet servers can fail some details of common standard and in fact work with own "modified standard".

So, just check @, hint user on frontend and send verification emails on given address.

@Prasad 2012-04-28 09:45:03

If you are fine with accepting empty values (which is not invalid email) and are running PHP 5.2+, I would suggest:

static public function checkEmail($email, $ignore_empty = false) {
        if($ignore_empty && (is_null($email) || $email == ''))
                return true;
        return filter_var($email, FILTER_VALIDATE_EMAIL);

@Andy Lester 2008-10-14 14:42:25

It all depends on how accurate you want to be. For my purposes, where I'm just trying to keep out things like bob @ (spaces in emails) or steve (no domain at all) or [email protected] (no period before .com), I use

/^\[email protected]\S+\.\S+$/

Sure, it will match things that aren't valid email addresses, but it's a matter of playing the 90/10 rule.

@bortzmeyer 2008-10-14 19:30:13

It does not match [email protected] which is a valid and working email address (although probably most mail servers won't accept it or will add

@Andy Lester 2009-02-15 04:50:50

Yes, that's right, it's not RFC-compliant, but that's usually not an issue.

@Richard 2009-03-04 18:26:34

It won't match anything with a three part host name, e.g. and domains).

@Andy Lester 2009-03-06 04:51:56

Yes, it will. I suggest you try it yourself. $ perl -le'print q{[email protected]} =~ /^\[email protected]\S+\.\S+$/ ? q{Y} : q{N}'

@David Thornley 2009-12-17 18:48:17

@Richard: . is included in \S.

@tchrist 2010-11-07 20:16:13

@bortzmeyer: Yeah, well. It doesn’t match postmaster, either, which I am pretty sure is going to be a deliverable address. :)

@JJJ 2012-10-16 06:18:14

\S includes @ so it will also match [email protected]@b.c

@Andy Lester 2012-10-16 16:03:16

JJJ: Yes, it will match a lot of crap. It will match &$*#$(@$0(%))$#.)&*)(*$, too. For me, I'm more concerned with catching the odd fumble-finger typo like [email protected] than I am complete garbage. YMMV.

@Chris Moschini 2014-08-04 21:32:18

@Piskvor 2015-09-24 09:12:51

And another common typo: two consecutive dots in domain name or a comma instead of a dot. ^[^\[email protected]][email protected]([^\[email protected],]+\.)+[^\[email protected],]{2,}$

@lulalala 2015-10-07 15:00:25

@bortzmeyer is there discussion about this kind of address? It's my first time hearing someone mention this kind of address before?

@Pratik C Joshi 2015-12-06 10:47:59

We want explanation about this :) . People come here to see Why it is the way it is. Please consider Regex explanation too! Not everyone is advanced enough to know what you wrote there without explanation. Thanks

@Per G 2019-03-04 07:53:43

In C# i got it working with "^\[email protected]\S+\.\S+$"

@SIslam 2015-09-12 17:58:03

I did not find any that deals with top level domain name but it should be considered.

So for me following worked-


That easily discarded emails like [email protected], [email protected] etc

Domain name can be further edited if needed e.g. specific country domain etc.

@tripleee 2015-09-15 04:32:43

Like pointed out in multiple comments to other answers here already, the list of valid TLDs is growing rapidly. Your "2-letter ccTLD or one of big-6, info, mobi, etc" would have been reasonable five years ago, but no longer works at all reliably.

@user2366842 2016-01-27 19:38:57

Even at time of original writing, this was already invalid by a couple hundred TLD's. As of currently, you're missing a little under 1200 possibilities (and growing at a pretty regular rate) Current list of valid domains:

@Suhaib Janjua 2014-02-06 06:38:33

I always use the below regular expression to validate the email address. This is the best regex I have ever seen to validate email address.


This regular expression I always uses in my Asp.NET Code and I'm pretty satisfied with it.

use this assembly reference

using System.Text.RegularExpressions;

and try the following code, as it is simple and do the work for you.

private bool IsValidEmail(string email) {
    bool isValid = false;
    const string pattern = @"\A(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)\Z";

    isValid = email != "" && Regex.IsMatch(email, pattern);

    // an alternative of the above line is also given and commented
    //if (email == "") {
    //    isValid = false;
    //} else {
    //    // address provided so use the IsMatch Method
    //    // of the Regular Expression object
    //    isValid = Regex.IsMatch(email, pattern);
    return isValid;

this function validates the email string. If the email string is null, it returns false, if the email string is not in a correct format it returns false. It only returns true if the format of the email is valid.

@Ivan Z 2014-03-27 23:07:18

Does this code accept "Håkan.Söderströ[email protected]ö.se" or "试@例子.测试.مثال.آزمایشی" emails?

@Suhaib Janjua 2014-03-28 05:55:35

It's for standard Email Servers with standard characters. In case of non English language one should have to make its own customized ReGex.

@Ivan Z 2014-03-28 09:33:14

For standard English email looks good!

@Evan Carroll 2010-01-27 16:43:48

This regex is from Perl's Email::Valid library. I believe it to be the most accurate, it matches all 822. And, it is based on the regular expression in the O'Reilly book:

Regular expression built using Jeffrey Friedl's example in Mastering Regular Expressions (

$RFC822PAT = <<'EOF';

@Chris McGrath 2013-01-30 22:20:42

O_O you would also need to be a regex master to understand what it is doing

@syp_dino 2015-09-19 14:13:09

I found nice article, which says that the best way to validate e-mail address is that regex expresion: /[email protected]+\..+/i

@Toto 2015-09-25 08:13:24

It doesn't match valid addresses like: [email protected]

@chukko 2016-02-03 11:22:37

It also matches invalid addresses like john [email protected]

@Ondřej Šotek 2015-07-15 22:25:52

For PHP I'm using email address validator from Nette Framework -

/* public static */ function isEmail($value)
    $atom = "[-a-z0-9!#$%&'*+/=?^_`{|}~]"; // RFC 5322 unquoted characters in local-part
    $localPart = "(?:\"(?:[ !\\x23-\\x5B\\x5D-\\x7E]*|\\\\[ -~])+\"|$atom+(?:\\.$atom+)*)"; // quoted or unquoted
    $alpha = "a-z\x80-\xFF"; // superset of IDN
    $domain = "[0-9$alpha](?:[-0-9$alpha]{0,61}[0-9$alpha])?"; // RFC 1034 one domain component
    $topDomain = "[$alpha](?:[-0-9$alpha]{0,17}[$alpha])?";
    return (bool) preg_match("(^[email protected](?:$domain\\.)+$topDomain\\z)i", $value);

@Ramesh Kotkar 2014-12-10 06:20:50

You can use following regular Expression for any email address

^(([^<>()[\]\\.,;:\[email protected]\"]+(\.[^<>()[\]\\.,;:\[email protected]\"]+)*)|(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$


  function checkEmailValidation($email)
       $expression='/^(([^<>()[\]\\.,;:\[email protected]\"]+(\.[^<>()[\]\\.,;:\[email protected]\"]+)*)|(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/';
        if(preg_match($expression, $email))
            return true;
            return false;

For Javascript

 function checkEmailValidation(email)
        var pattern='/^(([^<>()[\]\\.,;:\[email protected]\"]+(\.[^<>()[\]\\.,;:\[email protected]\"]+)*)|(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/';
            return true;
            return false;

@Luna 2015-08-14 12:33:46

if(preg_match($expression, $email)) { return true; } else { return false; } can be simplified to return (bool) preg_match($expression, $email);

@Rajneesh071 2014-11-11 09:17:57

Valid RegEx according to w3 org and wikipedia

[A-Z0-9a-z.!#$%&'*+-/=?^_`{|}~][email protected][A-Za-z0-9.-]+\\.[A-Za-z]{2,4}

e.g. !#$%&'*+-/=?^_`.{|}[email protected]

@givanse 2014-12-03 05:08:23

[email protected] passes, valid, but most likely not very useful

@Brad 2014-12-15 15:36:02

This regex is wrong, and isn't the one recommended on

@Rajneesh071 2014-12-15 21:45:49

Then what is correct ? @Brad

@Brad 2014-12-15 21:48:57

The actual regex on the page you link to for the W3C isn't bad.

@Prasad Bhosale 2014-11-07 11:14:58

Following is the regular expression for validating email address

^[email protected]\w+(\.\w+)+$

@Per Hornshøj-Schierbeck 2008-10-14 14:17:44

I use


Which is the one used in ASP.NET by the RegularExpressionValidator.

@Phrogz 2011-01-19 21:35:17

Boo! My (ill-advised) address of [email protected] is rejected.

@Wayne Whitty 2014-06-16 15:00:35

So basically, it doesn't allow ridiculous email addresses. :)

@Tomasz Szulc 2015-11-19 12:53:37

According to this page there is no domains with just a single character in top level e.g. "something.c", "something.a", here is version that support at least 2 characters: "", "": ^\\w+([-+.']\\w+)*@\\w+([-.]\\w+)*\\.\\w{2,}([-.]\\w+)*$

@Patanjali 2015-11-28 03:13:13

@Wayne Whitty. You have hit upon the primary issue of whether to cater for the vast majority of addresses, or ALL, including ones that nobody would use, except to test email validation.

@Aqib Mumtaz 2015-11-30 11:16:59

@TomaszSzulc extra back slash in your answer is confusing, I just corrected it and 2 chars domains names support is working, ^\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w{2,}([-.]\w+)*$

@Pratik C Joshi 2015-12-06 10:47:46

We want explanation about this :) . People come here to see Why it is the way it is. Please consider Regex explanation too! Not everyone is advanced enough to know what you wrote there without explanation. Thanks

@ganqqwerty 2015-12-07 14:25:11

^\w+([-+.']|\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$ allows stuff like [email protected]

@Simon_Weaver 2017-02-10 00:43:31

this fails on [email protected] which is in fact valid (a customer of ours had a similar address)`

@Voicu 2018-01-04 00:54:46

@Simon_Weaver: Yes, this fails on emails with local-part that ends in -, +, . and '.

@Voicu 2018-01-04 00:57:00

If people choose to provide a + sign at the end of their local-part (which is valid for Gmail for example), why would we restrict that by using this regex?

@zıəs uɐɟəʇs 2014-09-11 14:22:51

As mentioned already, you can't validate an email with a regex. However, here's what we currently use to make sure user-input isn't totally bogus (forgetting the TLD etc.).

This regex will allow IDN domains and special characters (like Umlauts) before and after the @ sign.

/^[\w.+-_][email protected][^.][\w.-]*\.[\w-]{2,63}$/iu

@Dominic Sayers 2009-02-10 16:13:28

[UPDATED] I've collated everything I know about email address validation here:, which now not only validates but also diagnoses problems with email addresses. I agree with many of the comments here that validation is only part of the answer; see my essay at

is_email() remains, as far as I know, the only validator that will tell you definitively whether a given string is a valid email address or not. I've upload a new version at

I collated test cases from Cal Henderson, Dave Child, Phil Haack, Doug Lovell, RFC5322 and RFC 3696. 275 test addresses in all. I ran all these tests against all the free validators I could find.

I'll try to keep this page up-to-date as people enhance their validators. Thanks to Cal, Michael, Dave, Paul and Phil for their help and co-operation in compiling these tests and constructive criticism of my own validator.

People should be aware of the errata against RFC 3696 in particular. Three of the canonical examples are in fact invalid addresses. And the maximum length of an address is 254 or 256 characters, not 320.

@tchrist 2010-11-07 20:11:46

This validator also seems correct. [...time passes...] Hm, looks like it is just RFC 5322, not 3693 or errata thereto.

@bgmCoder 2013-04-09 20:49:29

Very nice. Here we not only get a nice essay, we get a validation tester as well as a library to download. Nice answer!

@Josef 2015-03-25 07:28:30

Your validator doesn't support punycode (RFC 3492). [email protected]öäü.at can be a valid address. (it translates to [email protected])

@Dominic Sayers 2015-04-27 18:19:17

Hi @Josef. You should try to validate [email protected] since this code is about validation, not interpretation. If you'd like to add a punycode translator then I'm happy to accept a pull request at

@Fragment 2014-07-31 09:17:10

Had to mention, that nearly has been added new domain "yandex". Possible emails: [email protected] And also uppercase letters supported, so a bit modified version of acrosman solution is:


@McGaz 2014-08-06 10:46:53

The regular expressions posted in this thread are out of date now because of the new generic top level domains (gTLDs) coming in (e.g. .london, .basketball, .通販). To validate an email address there are two answers (That would be relevant to the vast majority).

  1. As the main answer says - don't use a regular expression, just validate it by sending an email to the address (Catch exceptions for invalid addresses)
  2. Use a very generic regex to at least make sure that they are using an email structure {something}@{something}.{something}. There's no point in going for a detailed regex because you won't catch them all and there'll be a new batch in a few years and you'll have to update your regular expression again.

I have decided to use the regular expression because, unfortunately, some users don't read forms and put the wrong data in the wrong fields. This will at least alert them when they try to put something which isn't an email into the email input field and should save you some time supporting users on email issues.


@TombMedia 2012-12-02 06:15:39

I've been using this touched up version of your regex for a while and it hasn't left me with too many surprises. I've never encountered an apostrophe in an email yet so it doesn't validate that. It does validate Jean+Franç[email protected] and 试@例子.测试.مثال.آزمایشی but not weird abuse of those non alphanumeric characters [email protected].


It does support IP addresses [email protected] but I haven't refined it enough to deal with bogus IP ranges such as 999.999.999.1.

It also supports all the TLDs over 3 characters which stops [email protected] which I think the original let through. I've been beat, there are too many tlds now over 3 characters.

I know acrosman has abandoned his regex but this flavour lives on.

@sunleo 2014-08-25 11:13:00

Java Mail API does magic for us.

     InternetAddress internetAddress = new InternetAddress(email);
     return true;
    catch(Exception ex)
        return false;

I got this from here

@display_name 2014-12-11 07:56:56

Java Mail API is an optional package for use with Java SE platform and is included in the Java EE platform.

@Francisco Costa 2013-10-25 14:19:03


This matches 99.99% of email addresses, including some of the newer top-level-domain extensions, such as info, museum, name, etc. It also allows for emails tied directly to IP addresses.

@Dinesh Devkota 2014-03-15 19:03:32

I used


which includes the capitalized letter as well. You wouldn't even need to use tolowercase in this case.

Related Questions

Sponsored Content

80 Answered Questions

[SOLVED] How to validate an email address in JavaScript?

8 Answered Questions

[SOLVED] Is there a regular expression to detect a valid regular expression?

  • 2008-10-05 17:07:35
  • psytek
  • 101576 View
  • 661 Score
  • 8 Answer
  • Tags:   regex

28 Answered Questions

38 Answered Questions

[SOLVED] C# code to validate email address

50 Answered Questions

[SOLVED] How to replace all occurrences of a string in JavaScript

18 Answered Questions

[SOLVED] How do you use a variable in a regular expression?

  • 2009-01-30 00:11:05
  • JC Grubbs
  • 603792 View
  • 1132 Score
  • 18 Answer
  • Tags:   javascript regex

16 Answered Questions

[SOLVED] How do you access the matched groups in a JavaScript regular expression?

  • 2009-01-11 07:21:20
  • nickf
  • 651169 View
  • 1169 Score
  • 16 Answer
  • Tags:   javascript regex

17 Answered Questions

[SOLVED] What characters are allowed in an email address?

48 Answered Questions

[SOLVED] Validate decimal numbers in JavaScript - IsNumeric()

7 Answered Questions

[SOLVED] What is the maximum length of a valid email address?

Sponsored Content