By Lance Pollard


2010-08-24 22:22:33 8 Comments

I just want to create a regular expression out of any possible string.

var usersString = "Hello?!*`~World()[]";
var expression = new RegExp(RegExp.escape(usersString))
var matches = "Hello".match(expression);

Is there a built in method for that? If not, what do people use? Ruby has RegExp.escape. I don't feel like I'd need to write my own, there's gotta be something standard out there. Thanks!

14 comments

@user557597 2019-09-18 01:40:57

There has only ever been and ever will be 12 meta characters that need to be escaped
to be considered a literal.

Doesn't matter what is done with the escaped string, inserted into a balanced
regex wrapper, appended, doesn't matter.

Do a string replace using this

var escaped_string = oldstring.replace( /[\\^$.|?*+()[{]/g, '\\$&' );

@Thomasleveil 2019-10-05 23:36:12

what about ]?

@soheilpro 2019-08-18 03:31:20

Another (much safer) approach is to escape all the characters (and not just a few special ones that we currently know) using the unicode escape format \u{code}:

function escapeRegExp(text) {
    return Array.from(text)
           .map(char => `\\u{${char.charCodeAt(0).toString(16)}}`)
           .join('');
}

console.log(escapeRegExp('a.b')); // '\u{61}\u{2e}\u{62}'

Please note that you need to pass the u flag for this method to work:

var expression = new RegExp(escapeRegExp(usersString), 'u');

@quietmint 2014-05-13 17:22:50

Mozilla Developer Network's Guide to Regular Expressions provides this escaping function:

function escapeRegExp(string) {
  return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); // $& means the whole matched string
}

@Dan Dascalescu 2014-08-02 00:51:59

Why do they escape the =? AFAIK, this would be useful for Perl's lookahead regular expressions (?=), but if you escape the ?, you're good to go.

@quietmint 2014-08-07 16:31:01

@DanDascalescu You're right. The MDN page has been updated and = is no longer included.

@Ravi Gadhia 2016-04-29 09:42:23

escapeRegExp = function(str) {
  if (str == null) return '';
  return String(str).replace(/([.*+?^=!:${}()|[\]\/\\])/g, '\\$1');
};

@daluege 2016-11-12 11:35:51

Nothing should prevent you from just escaping every non-alphanumeric character:

usersString.replace(/(?=\W)/g, '\\');

You lose a certain degree of readability when doing re.toString() but you win a great deal of simplicity (and security).

According to ECMA-262, on the one hand, regular expression "syntax characters" are always non-alphanumeric, such that the result is secure, and special escape sequences (\d, \w, \n) are always alphanumeric such that no false control escapes will be produced.

@Tomas Langkaas 2017-08-10 07:20:08

Simple and effective. I like this much better than the accepted answer. For (really) old browsers, .replace(/[^\w]/g, '\\$&') would work in the same way.

@Alexey Lebedev 2018-02-02 10:29:06

This fails in Unicode mode. For example, new RegExp('🍎'.replace(/(?=\W)/g, '\\'), 'u') throws exception because \W matches each code unit of a surrogate pair separately, resulting in invalid escape codes.

@Miguel Pynto 2018-03-21 14:34:39

alternative: .replace(/\W/g, "\\$&");

@bashaus 2017-08-01 15:36:47

Rather than only escaping characters which will cause issues in your regular expression (e.g.: a blacklist), why not consider using a whitelist instead. This way each character is considered tainted unless it matches.

For this example, assume the following expression:

RegExp.escape('be || ! be');

This whitelists letters, number and spaces:

RegExp.escape = function (string) {
    return string.replace(/([^\w\d\s])/gi, '\\$1');
}

Returns:

"be \|\| \! be"

This may escape characters which do not need to be escaped, but this doesn't hinder your expression (maybe some minor time penalties - but it's worth it for safety).

@Antoine Dusséaux 2017-07-05 17:58:19

XRegExp has an escape function:

XRegExp.escape('Escaped? <.>'); // -> 'Escaped\?\ <\.>'

More on: http://xregexp.com/api/#escape

@gustavohenke 2015-04-17 13:14:44

For anyone using lodash, since v3.0.0 a _.escapeRegExp function is built-in:

_.escapeRegExp('[lodash](https://lodash.com/)');
// → '\[lodash\]\(https:\/\/lodash\.com\/\)'

And, in the event that you don't want to require the full lodash library, you may require just that function!

@Ted Pennings 2015-11-01 07:34:02

there's even an npm package of just this! npmjs.com/package/lodash.escaperegexp

@maddob 2016-07-28 12:39:00

Be aware that the escapeRegExp function lodash also adds \x3 to the beginning of the string, not really sure why.

@Rob Evans 2017-08-31 13:20:51

This imports loads of code that really doesn't need to be there for such a simple thing. Use bobince's answer... works for me and its so many less bytes to load than the lodash version!

@gustavohenke 2017-08-31 13:24:57

@RobEvans my answer starts with "For anyone using lodash", and I even mention that you can require only the escapeRegExp function.

@Rob Evans 2017-08-31 18:03:05

@gustavohenke Sorry I should have been slightly more clear, I included the module linked to in your "just that function" and that is what I was commenting on. If you take a look it's quite a lot of code for what should effectively be a single function with a single regexp in it. Agree if you are already using lodash then it makes sense to use it, but otherwise use the other answer. Sorry for the unclear comment.

@Federico Fissore 2018-05-31 15:10:59

@maddob I cannot see that \x3 you mentioned: my escaped strings are looking good, just what I expect

@Pi Marillion 2015-06-15 17:09:01

Most of the expressions here solve single specific use cases.

That's okay, but I prefer an "always works" approach.

function regExpEscape(literal_string) {
    return literal_string.replace(/[-[\]{}()*+!<=:?.\/\\^$|#\s,]/g, '\\$&');
}

This will "fully escape" a literal string for any of the following uses in regular expressions:

  • Insertion in a regular expression. E.g. new RegExp(regExpEscape(str))
  • Insertion in a character class. E.g. new RegExp('[' + regExpEscape(str) + ']')
  • Insertion in integer count specifier. E.g. new RegExp('x{1,' + regExpEscape(str) + '}')
  • Execution in non-JavaScript regular expression engines.

Special Characters Covered:

  • -: Creates a character range in a character class.
  • [ / ]: Starts / ends a character class.
  • { / }: Starts / ends a numeration specifier.
  • ( / ): Starts / ends a group.
  • * / + / ?: Specifies repetition type.
  • .: Matches any character.
  • \: Escapes characters, and starts entities.
  • ^: Specifies start of matching zone, and negates matching in a character class.
  • $: Specifies end of matching zone.
  • |: Specifies alternation.
  • #: Specifies comment in free spacing mode.
  • \s: Ignored in free spacing mode.
  • ,: Separates values in numeration specifier.
  • /: Starts or ends expression.
  • :: Completes special group types, and part of Perl-style character classes.
  • !: Negates zero-width group.
  • < / =: Part of zero-width group specifications.

Notes:

  • / is not strictly necessary in any flavor of regular expression. However, it protects in case someone (shudder) does eval("/" + pattern + "/");.
  • , ensures that if the string is meant to be an integer in the numerical specifier, it will properly cause a RegExp compiling error instead of silently compiling wrong.
  • #, and \s do not need to be escaped in JavaScript, but do in many other flavors. They are escaped here in case the regular expression will later be passed to another program.

If you also need to future-proof the regular expression against potential additions to the JavaScript regex engine capabilities, I recommend using the more paranoid:

function regExpEscapeFuture(literal_string) {
    return literal_string.replace(/[^A-Za-z0-9_]/g, '\\$&');
}

This function escapes every character except those explicitly guaranteed not be used for syntax in future regular expression flavors.


For the truly sanitation-keen, consider this edge case:

var s = '';
new RegExp('(choice1|choice2|' + regExpEscape(s) + ')');

This should compile fine in JavaScript, but will not in some other flavors. If intending to pass to another flavor, the null case of s === '' should be independently checked, like so:

var s = '';
new RegExp('(choice1|choice2' + (s ? '|' + regExpEscape(s) : '') + ')');

@Dan Dascalescu 2017-07-04 11:32:19

The / doesn't need to be escaped in the [...] character class.

@Qwertiy 2017-09-22 14:01:18

Most of these doesn't need to be escaped. "Creates a character range in a character class" - you are never in a character class inside of the string. "Specifies comment in free spacing mode, Ignored in free spacing mode" - not supported in javascript. "Separates values in numeration specifier" - you are never in numerarion specifier inside of the string. Also you can't write arbitrary text inside of nameration specification. "Starts or ends expression" - no need to escape. Eval is not a case, as it would require much more escaping. [will be continued in the next comment]

@Qwertiy 2017-09-22 14:01:22

"Completes special group types, and part of Perl-style character classes" - seems not available in javascript. "Negates zero-width group, Part of zero-width group specifications" - you never have groups inside of the string.

@Pi Marillion 2017-09-22 20:14:05

@Qwertiy The reason for these extra escapes is to eliminate edge cases which could cause problems in certain use cases. For instance, the user of this function may want to insert the escaped regex string into another regex as part of a group, or even for use in another language besides Javascript. The function does not make assumptions like "I will never be part of a character class", because it's meant to be general. For a more YAGNI approach, see any of the other answers here.

@madprops 2017-10-29 11:43:40

Very good. Why is _ not escaped though? What ensures it probably won't become regex syntax later?

@John 2017-12-07 03:23:57

@PiMarillion: In the comments to bobince's answer the user styfle suggested for use in a loop to first create a RegExp-object of the escape-string: var e = /[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g; and then the function is like return s.replace(e, '\\$&');. To avoid manually escaping the escape string I want first to use the original function regExpEscape to escape your escape string (ees), then use e = new RegExp(ees,"g") and then function regExpEscapeFast(literal_string) { return literal_string.replace(e, '\\$&');}. I can't get it working. How to correctly escape the escape string?

@user663031 2015-06-15 18:29:14

There is an ES7 proposal for RegExp.escape at https://github.com/benjamingr/RexExp.escape/, with a polyfill available at https://github.com/ljharb/regexp.escape.

@John 2017-04-29 07:30:28

Looks like this didn't make it into ES7. It also looks like it was rejected in favor of looking for a template tag.

@Dan Dascalescu 2014-08-02 01:06:46

The functions in the other answers are overkill for escaping entire regular expressions (they may be useful for escaping parts of regular expressions that will later be concatenated into bigger regexps).

If you escape an entire regexp and are done with it, quoting the metacharacters that are either standalone (., ?, +, *, ^, $, |, \) or start something ((, [, {) is all you need:

String.prototype.regexEscape = function regexEscape() {
  return this.replace(/[.?+*^$|({[\\]/g, '\\$&');
};

And yes, it's disappointing that JavaScript doesn't have a function like this built-in.

@nhahtdh 2014-11-27 02:58:01

Let's say you escape the user input (text)next and insert it in: (?: + input + ). Your method will give the resulting string (?:\(text)next) which fails to compile. Note that this is quite a reasonable insertion, not some crazy one like re\ + input + re (in this case, the programmer can be blamed for doing something stupid)

@Dan Dascalescu 2014-11-27 21:08:55

@nhahtdh: my answer specifically mentioned escaping entire regular expressions and "being done" with them, not parts (or future parts) of regexps. Kindly undo the downvote?

@nhahtdh 2014-11-28 01:24:05

It's rarely the case that you would escape the entire expression - there are string operation, which are much faster compared to regex if you want to work with literal string.

@nhahtdh 2014-11-28 01:30:27

This is not mentioning that it is incorrect - \ should be escaped, since your regex will leave \w intact. Also, JavaScript doesn't seem to allow trailing ), at least that is what Firefox throws error for.

@Dan Dascalescu 2014-11-28 01:33:05

I have escaped ` in the answer. Thanks!

@nhahtdh 2014-11-28 04:22:59

Please address the part about closing )

@Qwertiy 2017-09-22 20:34:36

It would be right to escape closing braces too, even if they are allowed by some dialect. As I remember, that's an extension, not a rule.

@bobince 2010-08-24 23:09:07

The function linked above is insufficient. It fails to escape ^ or $ (start and end of string), or -, which in a character group is used for ranges.

Use this function:

RegExp.escape= function(s) {
    return s.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
};

While it may seem unnecessary at first glance, escaping - (as well as ^) makes the function suitable for escaping characters to be inserted into a character class as well as the body of the regex.

Escaping / makes the function suitable for escaping characters to be used in a JS regex literal for later eval.

As there is no downside to escaping either of them it makes sense to escape to cover wider use cases.

And yes, it is a disappointing failing that this is not part of standard JavaScript.

@spinningarrow 2012-08-07 09:04:21

What does the $& do?

@bobince 2012-08-08 08:11:31

@spinningarrow: It represents the whole matched string, like 'group 0' in many other regex systems. doc

@goodeye 2013-02-13 01:31:35

I believe the original answer was correct, before the edit. I'm pretty sure escaping the forward slash inside the character class is not necessary. It seems to do no harm, but isn't required.

@thorn̈ 2013-02-14 20:53:00

actually, we don't need to escape / at all

@Radu Maris 2013-07-23 11:25:42

@bobince Is this the expected behaviour: RegExp.escape('a\.b') === 'a\.b', I was expecting 'a\\\.b' (escape "\" and escape ".") ?

@bobince 2013-07-23 11:42:09

@Radu: you have a string literal problem, 'a\.b'==='a.b' :-)

@bobince 2013-07-23 11:48:23

BTW beware of debugger consoles: IE, Firefox and Chrome all display the string a\.b in a pseudo-literal form "a\.b", which is misleading as it is not a valid string literal for that value (should be "a\\.b". Thanks for the unnecessary extra confusion, browsers.

@Paul Draper 2013-10-03 03:26:15

"it is a disappointing failing that this is not part of standard JavaScript". What languages have something like this?

@bobince 2013-10-03 10:24:12

@Paul: Perl quotemeta (\Q), Python re.escape, PHP preg_quote, Ruby Regexp.quote...

@styfle 2013-10-17 21:14:39

If you are going to use this function in a loop, it's probably best to make the RegExp object it's own variable var e = /[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g; and then your function is return s.replace(e, '\\$&'); This way you only instantiate the RegExp once.

@lrn 2014-02-20 12:01:58

You don't need to escape '-'. When you escape '[', the '-' is not inside a character class, and it has no special meaning. The '/' isn't necessary either.

@Shaggydog 2014-06-12 08:33:41

Hi, can this be extended to also escape the double quote character (")?

@bobince 2014-06-12 10:09:18

@Shaggydog: it certainly could be, but I can't think of a place where " is special in regex syntax so I'm not sure what the benefit would be.

@bobince 2014-06-12 16:12:19

@Shaggydog: you're talking about JavaScript string literal escaping. That's a different thing to regex escaping. They both use backslashes but otherwise the rules are quite different. (If you have a string in a regex inside a string literal then you would have to use both types of escaping, one after another.)

@Mark Amery 2015-02-23 17:16:35

Standard arguments against augmenting built-in objects apply here, no? What happens if a future version of ECMAScript provides a RegExp.escape whose implementation differs from yours? Wouldn't it be better for this function not to be attached to anything?

@Gras Double 2015-09-06 04:34:01

Actually you are not required to escape the / in character classes, though it's better to escape it to accommodate some editors. See this question.

@Redu 2016-03-24 12:15:11

This is not working for decimal points. RegExp("1.3") returning /1.3/ which is totally unacceptable. Pi Marillion's answer below is working fine when fed with numbers that contain decimal points.

@bobince 2016-03-26 16:34:58

@Redu: ??? you appear to have called the RegExp constructor instead of Regexp.escape...

@Paul Cuddihy 2016-08-01 19:55:30

In a 'hostile' generic function, you probably want to consider protecting yourself from javascript typing by doing String(s).replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&'); This helps when your string is a number, for example.

@ChrisJJ 2016-12-09 16:12:31

"As there is no downside to escaping either of them... " Reduced readability.

@dude 2017-09-15 08:08:13

ESLint throws an error with this RegExp by default (no-useless-escape): Unnecessary escape character: \/

@bobince 2017-09-15 22:57:33

bobince cares not for eslint's opinion

@JP de la Torre 2017-10-06 19:06:13

The expression can be reduced to this: /[$(-+.\/?[-^{|}]/ (saves 5 characters). You don't need to escape - because you already escaped [ and ] and that means there won't be character groups. Also, there are two character sequences that can be written as ranges. One between ( and + (40 through 43) and another between [ and ^ (91 through 94).

@bobince 2017-10-12 20:54:22

But maybe you want to escape characters to put them inside a character range. IMO better to harmlessly overescape than to underescape and cause problems in niche cases. FWIW personally I'd rather see the characters explicitly here; we're not playing code golf.

@João Pimentel Ferreira 2018-01-21 21:48:45

why are not included the [double]quotes itself, I mean converting " to \" and ' to \'? str.replace(/[\-\[\]\/\{\}\(\)\"\'\*\+\?\.\\\^\$\|]/g, "\\$&");

@bobince 2018-01-29 22:19:10

@JoãoPimentelFerreira well, quotes have no special meaning to regex (see Shaggydog's comment above).

@João Pimentel Ferreira 2018-01-29 23:09:26

@bobince for me escaping quotes is important if you use this function for example as a handlebarsjs helper or any other rendering engine, for example if I use handlebarsjs to render a JS file where I have var JSstr = '{{{myStr}}}'; where myStr = "I'm here". If I don't escape the quotes I get var JSstr = 'I'm here'. But I am aware that this is a very particular and specific situation.

@bobince 2018-02-01 21:18:17

@JoãoPimentelFerreira that's not regex escaping, that's JavaScript string literal escaping. The rules for these syntaxes are different and not compatible; applying a regex escaper to myStr would not make the result correct even if quotes were escaped. If you are writing a string into a regex inside a string literal, you would need to regex-escape it first, and then string-literal-escape the results (so eg a backslash ends up as a quadruple-backslash).

@bobince 2018-02-01 21:20:51

Because getting nested escaping right is tricky, and the outcome of getting it wrong is so severe (cross-site-scripting vulns), it's generally a bad idea to inject data into JavaScript code. You are generally better off writing content into a data attribute (eg <html data-jsstr="{{myStr}}">, using handlebars's normal HTML-escaping), and then reading the content of that attribute from static JS.

@João Pimentel Ferreira 2018-02-01 21:42:11

@bobince just a short question: how can be a bad idea to inject data into Javascript with handlebars if everything is made server-side, for example inline with the <script> tag inside the html?

@bobince 2018-02-01 21:49:29

@JoãoPimentelFerreira if any of the data you're injecting comes from outside the application, then whoever is supplying the data can cause code of their own choosing to run on the browsers of anyone else using the application's output, allowing them to do anything that user can do on your site. This is Cross-Site Scripting and it's one of the worst, most widespread security issues on the web today.

@T.J. Crowder 2018-08-28 17:21:27

And the disappoint continues, either years and lots of other improvements later...

@REJH 2019-08-02 19:22:01

I have a string.replaceAll(haystack, needle, replace) function. It calls haystack.replace( new RegExp( escape(needle), 'g' ), replace); but I just found an edge case where it breaks: 'string as a parameter', so if the replace param contains, for example, $' my function returns weird results. Any idea how to fix this? See developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/…

@kzh 2013-09-04 17:23:49

This is a shorter version.

RegExp.escape = function(s) {
    return s.replace(/[$-\/?[-^{|}]/g, '\\$&');
}

This includes the non-meta characters of %, &, ', and ,, but the JavaScript RegExp specification allows this.

@nhahtdh 2014-11-27 03:03:52

I wouldn't use this "shorter" version, since the character ranges hide the list of characters, which makes it harder to verify the correctness at first glance.

@kzh 2014-11-27 12:15:02

@nhahtdh I probably wouldn't either, but it is posted here for information.

@Dan Dascalescu 2014-11-27 21:14:09

@kzh: posting "for information" helps less than posting for understanding. Would you not agree that my answer is clearer?

@Qwertiy 2017-09-22 20:35:42

At least, . is missed. And (). Or not? [-^ is strange. I don't remember what is there.

@kzh 2017-09-22 21:15:33

Those are in the specified range.

@Pierluc SS 2012-10-31 12:30:01

In jQueryUI's autocomplete widget (version 1.9.1) they use a slightly different regex (Line 6753), here's the regular expression combined with @bobince approach.

RegExp.escape = function( value ) {
     return value.replace(/[\-\[\]{}()*+?.,\\\^$|#\s]/g, "\\$&");
}

@Martin Ender 2013-07-08 10:22:12

The only difference is that they escape , (which is not a metacharacter), and # and whitespace which only matter in free-spacing mode (which is not supported by JavaScript). However, they do get it right not to escape the the forward slash.

@Scott Stafford 2013-08-19 18:37:27

If you want to reuse jquery UI's implementation rather than paste the code locally, go with $.ui.autocomplete.escapeRegex(myString).

@Ted Pennings 2015-11-01 07:35:12

lodash has this too, _. escapeRegExp and npmjs.com/package/lodash.escaperegexp

@Peter Krauss 2017-03-07 03:27:06

v1.12 the same, ok!

Related Questions

Sponsored Content

89 Answered Questions

[SOLVED] How do I remove a particular element from an array in JavaScript?

  • 2011-04-23 22:17:18
  • Walker
  • 6160859 View
  • 7683 Score
  • 89 Answer
  • Tags:   javascript arrays

88 Answered Questions

[SOLVED] How to validate an email address in JavaScript

27 Answered Questions

[SOLVED] What does "use strict" do in JavaScript, and what is the reasoning behind it?

41 Answered Questions

[SOLVED] How do I remove a property from a JavaScript object?

58 Answered Questions

[SOLVED] How do I include a JavaScript file in another JavaScript file?

86 Answered Questions

[SOLVED] How do JavaScript closures work?

3 Answered Questions

38 Answered Questions

[SOLVED] var functionName = function() {} vs function functionName() {}

40 Answered Questions

[SOLVED] Detecting an "invalid date" Date instance in JavaScript

  • 2009-08-30 11:34:40
  • orip
  • 749660 View
  • 1387 Score
  • 40 Answer
  • Tags:   javascript date

Sponsored Content