By nickf


2009-01-11 07:21:20 8 Comments

I want to match a portion of a string using a regular expression and then access that parenthesized substring:

var myString = "something format_abc"; // I want "abc"

var arr = /(?:^|\s)format_(.*?)(?:\s|$)/.exec(myString);

console.log(arr);     // Prints: [" format_abc", "abc"] .. so far so good.
console.log(arr[1]);  // Prints: undefined  (???)
console.log(arr[0]);  // Prints: format_undefined (!!!)

What am I doing wrong?


I've discovered that there was nothing wrong with the regular expression code above: the actual string which I was testing against was this:

"date format_%A"

Reporting that "%A" is undefined seems a very strange behaviour, but it is not directly related to this question, so I've opened a new one, Why is a matched substring returning "undefined" in JavaScript?.


The issue was that console.log takes its parameters like a printf statement, and since the string I was logging ("%A") had a special value, it was trying to find the value of the next parameter.

16 comments

@Wiktor Stribiżew 2019-03-08 08:39:39

String#matchAll (see the Stage 3 Draft / December 7, 2018 proposal), simplifies acccess to all groups in the match object (mind that Group 0 is the whole match, while further groups correspond to the capturing groups in the pattern):

With matchAll available, you can avoid the while loop and exec with /g... Instead, by using matchAll, you get back an iterator which you can use with the more convenient for...of, array spread, or Array.from() constructs

This method yields a similar output to Regex.Matches in C#, re.finditer in Python, preg_match_all in PHP.

See a JS demo (tested in Google Chrome 73.0.3683.67 (official build), beta (64-bit)):

var myString = "key1:value1, [email protected]=value3";
var matches = myString.matchAll(/(\w+)[:=-](\w+)/g);
console.log([...matches]); // All match with capturing group values

The console.log([...matches]) shows

enter image description here

You may also get match value or specific group values using

let matchData = "key1:value1, [email protected]=value3".matchAll(/(\w+)[:=-](\w+)/g)
var matches = [...matchData]; // Note matchAll result is not re-iterable

console.log(Array.from(matches, m => m[0])); // All match (Group 0) values
// => [ "key1:value1", "key2-value2", "key3=value3" ]
console.log(Array.from(matches, m => m[1])); // All match (Group 1) values
// => [ "key1", "key2", "key3" ]

NOTE: See the browser compatibility details.

@Jarrod McGuire 2019-03-29 14:54:43

Perfect example for key value pairs. Concise and easy to read, very simple to use. Also, better error handling, the spread will return an empty array rather than null, so no more 'error, no property "length" of null'

@ccpizza 2019-02-16 06:25:08

You don't really need an explicit loop to parse multiple matches — pass a replacement function as the second argument as described in: String.prototype.replace(regex, func):

var str = "Our chief weapon is {1}, {0} and {2}!"; 
var params= ['surprise', 'fear', 'ruthless efficiency'];
var patt = /{([^}]+)}/g;

str=str.replace(patt, function(m0, m1, position){return params[parseInt(m1)];});

document.write(str);

The m0 argument represents the full matched substring {0}, {1}, etc. m1 represents the first matching group, i.e. the part enclosed in brackets in the regex which is 0 for the first match. And position is the starting index within the string where the matching group was found — unused in this case.

@David Cheung 2019-02-11 16:47:47

With es2018 you can now String.match() with named groups, makes your regex more explicit of what it was trying to do.

const url =
  'https://stackoverflow.com/questions/432493/how-do-you-access-the-matched-groups-in-a-javascript-regular-expression?some=parameter';
const regex = /(?<protocol>https?):\/\/(?<hostname>[\w-\.]*)\/(?<pathname>[\w-\./]+)\??(?<querystring>.*?)?$/;
const { groups: segments } = url.match(regex);
console.log(segments);

and you'll get something like

{protocol: "https", hostname: "stackoverflow.com", pathname: "questions/432493/how-do-you-access-the-matched-groups-in-a-javascript-regular-expression", querystring: "some=parameter"}

@Sebastien H. 2017-01-03 14:40:56

Last but not least, I found one line of code that worked fine for me (JS ES6):

let reg = /#([\S]+)/igm; // Get hashtags.
let string = 'mi alegría es total! ✌🙌\n#fiestasdefindeaño #PadreHijo #buenosmomentos #france #paris';

let matches = (string.match(reg) || []).map(e => e.replace(reg, '$1'));
console.log(matches);

This will return:

['fiestasdefindeaño', 'PadreHijo', 'buenosmomentos', 'france', 'paris']

@Daniel Hallgren 2017-08-23 22:36:09

Terminology used in this answer:

  • Match indicates the result of running your RegEx pattern against your string like so: someString.match(regexPattern).
  • Matched patterns indicate all matched portions of the input string, which all reside inside the match array. These are all instances of your pattern inside the input string.
  • Matched groups indicate all groups to catch, defined in the RegEx pattern. (The patterns inside parentheses, like so: /format_(.*?)/g, where (.*?) would be a matched group.) These reside within matched patterns.

Description

To get access to the matched groups, in each of the matched patterns, you need a function or something similar to iterate over the match. There are a number of ways you can do this, as many of the other answers show. Most other answers use a while loop to iterate over all matched patterns, but I think we all know the potential dangers with that approach. It is necessary to match against a new RegExp() instead of just the pattern itself, which only got mentioned in a comment. This is because the .exec() method behaves similar to a generator functionit stops every time there is a match, but keeps its .lastIndex to continue from there on the next .exec() call.

Code examples

Below is an example of a function searchString which returns an Array of all matched patterns, where each match is an Array with all the containing matched groups. Instead of using a while loop, I have provided examples using both the Array.prototype.map() function as well as a more performant way – using a plain for-loop.

Concise versions (less code, more syntactic sugar)

These are less performant since they basically implement a forEach-loop instead of the faster for-loop.

// Concise ES6/ES2015 syntax
const searchString = 
    (string, pattern) => 
        string
        .match(new RegExp(pattern.source, pattern.flags))
        .map(match => 
            new RegExp(pattern.source, pattern.flags)
            .exec(match));

// Or if you will, with ES5 syntax
function searchString(string, pattern) {
    return string
        .match(new RegExp(pattern.source, pattern.flags))
        .map(match =>
            new RegExp(pattern.source, pattern.flags)
            .exec(match));
}

let string = "something format_abc",
    pattern = /(?:^|\s)format_(.*?)(?:\s|$)/;

let result = searchString(string, pattern);
// [[" format_abc", "abc"], null]
// The trailing `null` disappears if you add the `global` flag

Performant versions (more code, less syntactic sugar)

// Performant ES6/ES2015 syntax
const searchString = (string, pattern) => {
    let result = [];

    const matches = string.match(new RegExp(pattern.source, pattern.flags));

    for (let i = 0; i < matches.length; i++) {
        result.push(new RegExp(pattern.source, pattern.flags).exec(matches[i]));
    }

    return result;
};

// Same thing, but with ES5 syntax
function searchString(string, pattern) {
    var result = [];

    var matches = string.match(new RegExp(pattern.source, pattern.flags));

    for (var i = 0; i < matches.length; i++) {
        result.push(new RegExp(pattern.source, pattern.flags).exec(matches[i]));
    }

    return result;
}

let string = "something format_abc",
    pattern = /(?:^|\s)format_(.*?)(?:\s|$)/;

let result = searchString(string, pattern);
// [[" format_abc", "abc"], null]
// The trailing `null` disappears if you add the `global` flag

I have yet to compare these alternatives to the ones previously mentioned in the other answers, but I doubt this approach is less performant and less fail-safe than the others.

@Andre Carneiro 2017-06-19 19:47:44

There is no need to invoke the exec method! You can use "match" method directly on the string. Just don't forget the parentheses.

var str = "This is cool";
var matches = str.match(/(This is)( cool)$/);
console.log( JSON.stringify(matches) ); // will print ["This is cool","This is"," cool"] or something like that...

Position 0 has a string with all the results. Position 1 has the first match represented by parentheses, and position 2 has the second match isolated in your parentheses. Nested parentheses are tricky, so beware!

@Vidar 2018-05-29 11:54:01

This works and feels more natural.

@Shadymilkman01 2018-09-13 22:00:43

Without the global flag this returns all the matches, with it, you'll only get one big one so watch out for that.

@Jack 2016-11-25 09:46:59

function getMatches(string, regex, index) {
  index || (index = 1); // default to the first capturing group
  var matches = [];
  var match;
  while (match = regex.exec(string)) {
    matches.push(match[index]);
  }
  return matches;
}


// Example :
var myString = 'Rs.200 is Debited to A/c ...2031 on 02-12-14 20:05:49 (Clear Bal Rs.66248.77) AT ATM. TollFree 1800223344 18001024455 (6am-10pm)';
var myRegEx = /clear bal.+?(\d+\.?\d{2})/gi;

// Get an array containing the first capturing group for every match
var matches = getMatches(myString, myRegEx, 1);

// Log results
document.write(matches.length + ' matches found: ' + JSON.stringify(matches))
console.log(matches);

function getMatches(string, regex, index) {
  index || (index = 1); // default to the first capturing group
  var matches = [];
  var match;
  while (match = regex.exec(string)) {
    matches.push(match[index]);
  }
  return matches;
}


// Example :
var myString = 'something format_abc something format_def something format_ghi';
var myRegEx = /(?:^|\s)format_(.*?)(?:\s|$)/g;

// Get an array containing the first capturing group for every match
var matches = getMatches(myString, myRegEx, 1);

// Log results
document.write(matches.length + ' matches found: ' + JSON.stringify(matches))
console.log(matches);

@PhiLho 2009-01-11 09:10:34

var myString = "something format_abc";
var arr = myString.match(/\bformat_(.*?)\b/);
console.log(arr[0] + " " + arr[1]);

The \b isn't exactly the same thing. (It works on --format_foo/, but doesn't work on format_a_b) But I wanted to show an alternative to your expression, which is fine. Of course, the match call is the important thing.

@B.F. 2015-04-22 21:09:18

It's exactly reverse. '\b' delimits words. word= '\w' = [a-zA-Z0-9_] . "format_a_b" is a word.

@PhiLho 2015-04-23 07:41:39

@B.F.Honestly, I added "doesn't work on format_a_b" as an after thought 6 years ago, and I don't recall what I meant there... :-) I suppose it meant "doesn't work to capture a only", ie. the first alphabetical part after format_.

@B.F. 2015-04-23 10:43:44

I wanted to say that \b(--format_foo/}\b do not return "--format_foo/" because "-" and "/" are no \word characters. But \b(format_a_b)\b do return "format_a_b". Right? I refer to your text statement in round brackets. (Did no down vote!)

@CMS 2009-01-11 07:26:02

You can access capturing groups like this:

var myString = "something format_abc";
var myRegexp = /(?:^|\s)format_(.*?)(?:\s|$)/g;
var match = myRegexp.exec(myString);
console.log(match[1]); // abc

And if there are multiple matches you can iterate over them:

var myString = "something format_abc";
var myRegexp = /(?:^|\s)format_(.*?)(?:\s|$)/g;
match = myRegexp.exec(myString);
while (match != null) {
  // matched text: match[0]
  // match start: match.index
  // capturing group n: match[n]
  console.log(match[0])
  match = myRegexp.exec(myString);
}

@ianaz 2012-08-28 12:06:19

+1 Please note that in the second example you should use the RegExp object (not only "/myregexp/"), because it keeps the lastIndex value in the object. Without using the Regexp object it will iterate infinitely

@spinningarrow 2012-10-16 07:26:38

@ianaz: I don't believe 'tis true? http://jsfiddle.net/weEg9/ seems to work on Chrome, at least.

@JohnAllen 2013-12-30 17:39:04

Why do the above instead of: var match = myString.match(myRegexp); // alert(match[1])?

@George Chen 2014-06-06 18:33:21

No need for explicit "new RegExp", however the infinite loop will occur unless /g is specified

@Olga 2016-02-11 11:28:07

Another way not to run into infinite loop is to explicetly update string, e.g. string = string.substring(match.index + match[0].length)

@Anis 2017-05-09 17:41:35

@JohnAllen, one valid case of using RegExp.exec instead of String.match is when you need to access the sub-groups.

@wolfdawn 2017-05-25 13:31:56

I don't understand why you don't simply write something like while( match = myRegexp.exec(myString)) {console.log(match[0])})

@Andrew 2017-12-28 19:06:19

point of this answer is: use .exec() and wrap your groups in brackets () in the pattern

@1nfiniti 2018-09-11 16:49:42

match[0] is not a regexp subgroup, it's the first match in the string. If you need to access capture groups from a more detailed regular expression, that's a reason why you would use exec.

@Gustavo6046 2018-11-08 02:23:20

The iteration example can be shortened (by removing the preceding assignment) using do while, like this: do { let match = myRegexp.exec(myString); console.log(match[0]) } while (match != null);

@Ben Philipp 2018-12-24 07:44:17

What I don't understand is that match[i] for String.prototype.match() SHOULD get you the matching group at index i as per the documentation at developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/… -- but somehow it doesn't (at least not for me on Firefox 56)

@Pawel Kwiecien 2015-06-27 18:47:53

/*Regex function for extracting object from "window.location.search" string.
 */

var search = "?a=3&b=4&c=7"; // Example search string

var getSearchObj = function (searchString) {

    var match, key, value, obj = {};
    var pattern = /(\w+)=(\w+)/g;
    var search = searchString.substr(1); // Remove '?'

    while (match = pattern.exec(search)) {
        obj[match[0].split('=')[0]] = match[0].split('=')[1];
    }

    return obj;

};

console.log(getSearchObj(search));

@Mathias Bynens 2013-01-08 08:26:09

Here’s a method you can use to get the n​th capturing group for each match:

function getMatches(string, regex, index) {
  index || (index = 1); // default to the first capturing group
  var matches = [];
  var match;
  while (match = regex.exec(string)) {
    matches.push(match[index]);
  }
  return matches;
}


// Example :
var myString = 'something format_abc something format_def something format_ghi';
var myRegEx = /(?:^|\s)format_(.*?)(?:\s|$)/g;

// Get an array containing the first capturing group for every match
var matches = getMatches(myString, myRegEx, 1);

// Log results
document.write(matches.length + ' matches found: ' + JSON.stringify(matches))
console.log(matches);

@Rob Evans 2013-05-11 12:08:45

This a far superior answer to the others because it correctly shows iteration over all matches instead of only getting one.

@Druska 2013-09-04 18:45:51

mnn is right. This will produce an infinite loop if the 'g' flag is not present. Be very careful with this function.

@ravishi 2013-11-21 20:00:57

I improved this to make it similar to python's re.findall(). It groups up all matches into an array of arrays. It also fixes the global modifier infinite loop issue. jsfiddle.net/ravishi/MbwpV

@Michael Mikowski 2014-07-03 20:36:15

There is a good way to a avoid endless while loops: Don't use them :) Here is a good example where we limit the matches to a maximum of 10,000: _OUT_:for(i=0; i<10000; i++){ list=regex.exec(string);if ( list===null ){ break _OUT_; }; match_list.push( list[index] );}

@wallacer 2014-10-29 18:34:06

@MichaelMikowski now you've just hidden your infinite loop, but your code will run slow. I'd argue that it's better to have code break in a bad way so you catch it in development. Putting some bs maximum iterations break in is sloppy. Hiding issues instead of fixing their root cause is not the answer.

@Michael Mikowski 2014-11-11 23:09:33

@wallacer: I missed this in my last point in my last post: 5. The code can and should throw a warning if the loop exits without a break. This highlights the problem without the code breaking. And it's not "sloppy" at all.

@wallacer 2014-11-12 23:44:22

@MichaelMikowski that isn't meaningfully slower when you're not hitting the execution limit. When you are, it's clearly much slower. I'm not saying your code doesn't work, I'm saying that in practice I think it will cause more harm than good. People working in a dev environment will see the code working fine under no load despite doing 10,000 needless executions of some chunk of code. Then they'll push it out to a production environment and wonder why their app goes down under load. In my experience it's better if things break in an obvious way, and earlier in the development cycle.

@Michael Mikowski 2014-11-13 19:11:05

@wallacer Sure the code is slower when you hit the execution limit, but that's why you throw a warning (or even an exception) when that case occurs during development. But when you go into production, the code is much more durable. This is similar to using setInterval vs. setTimout. My advice is to never use setInterval, because it is an inherently dangerous construct. Instead use setTimeout and have the routine call itself if and when required.

@Michael Mikowski 2014-11-13 19:17:14

@wallacer here is another example: I patched uglify-js for some internal use. The mangler used to have a while loop to select unique keys, but it would become an endless loop when no unique key could be found. In refactored it to limit the search to 1000 loops, and if it still failed to find a unique key, it has this code: throw 'Cannot find unique key after 1000 iterations';. Now it fails obviously instead of silently looping forever.

@wallacer 2015-07-07 00:11:43

@MichaelMikowski was just poking around my old comments and saw I never responded here. Sure, in a worst case, using a max iterations break to prevent a loop, and having it fail in an obvious way is better than nothing. First though, how about taking a look at the actual algorithm and figuring out why it's not completing, and if it can be changed to always complete. Infinite loops are a symptom of poor algorithm design and in almost all cases there's a better fix than "break if we tried too many times". It's a lazy hack fix. Has it's places sure, but not the recommended solution generally

@wallacer 2015-07-07 00:17:34

@MichaelMikowski and the main issue I had with your initial comment is just the lack of performance under load, and the fact that it was silent about it. Throwing an exception instead of breaking is good, so you catch the issue. An example of patching uglify ignores the main issue altogether which is load. If the code is never going to be run under load, who cares. If it's running in nodeJS on a heavily loaded system, then it matters. A lot of people on SO will just blindly copy code they find, so I thought it was worth a word of caution...

@Michael Mikowski 2015-07-31 22:02:30

@wallecer I think we agree on all best practice. I just think one should avoid endless loop conditions if you can in the first place. Instead of using 'setInterval' use 'setTimeout', for example. Or use a for loop instead of a while loop. And yes, the Uglify change is an ugly - and very inexpensive - solution. I don't have the time to make the code more correct, and it isn't worth the expense. Now if I were serving billions of requests with the code, then of course that would change :)

@Alexz 2014-07-17 04:53:32

In regards to the multi-match parentheses examples above, I was looking for an answer here after not getting what I wanted from:

var matches = mystring.match(/(?:neededToMatchButNotWantedInResult)(matchWanted)/igm);

After looking at the slightly convoluted function calls with while and .push() above, it dawned on me that the problem can be solved very elegantly with mystring.replace() instead (the replacing is NOT the point, and isn't even done, the CLEAN, built-in recursive function call option for the second parameter is!):

var yourstring = 'something format_abc something format_def something format_ghi';

var matches = [];
yourstring.replace(/format_([^\s]+)/igm, function(m, p1){ matches.push(p1); } );

After this, I don't think I'm ever going to use .match() for hardly anything ever again.

@Nabil Kadimi 2014-07-12 15:41:47

A one liner that is practical only if you have a single pair of parenthesis:

while ( ( match = myRegex.exec( myStr ) ) && matches.push( match[1] ) ) {};

@willlma 2017-04-06 18:44:06

Why not while (match = myRegex.exec(myStr)) matches.push(match[1])

@Nabil Kadimi 2017-05-23 20:00:41

@willlma Yep!..

@Jonathan Lonowski 2009-01-11 12:55:15

Your syntax probably isn't the best to keep. FF/Gecko defines RegExp as an extension of Function.
(FF2 went as far as typeof(/pattern/) == 'function')

It seems this is specific to FF -- IE, Opera, and Chrome all throw exceptions for it.

Instead, use either method previously mentioned by others: RegExp#exec or String#match.
They offer the same results:

var regex = /(?:^|\s)format_(.*?)(?:\s|$)/;
var input = "something format_abc";

regex(input);        //=> [" format_abc", "abc"]
regex.exec(input);   //=> [" format_abc", "abc"]
input.match(regex);  //=> [" format_abc", "abc"]

@PEZ 2009-01-11 10:39:47

Your code works for me (FF3 on Mac) even if I agree with PhiLo that the regex should probably be:

/\bformat_(.*?)\b/

(But, of course, I'm not sure because I don't know the context of the regex.)

@nickf 2009-01-11 12:04:12

it's a space-separated list so I figured \s would be fine. strange that that code wasn't working for me (FF3 Vista)

@PEZ 2009-01-11 12:21:33

Yes, truly strange. Have you tried it on its own in the Firebug console? From an otherwise empty page I mean.

@eyelidlessness 2009-01-11 07:27:02

Using your code:

console.log(arr[1]);  // prints: abc
console.log(arr[0]);  // prints:  format_abc

Edit: Safari 3, if it matters.

Related Questions

Sponsored Content

3 Answered Questions

28 Answered Questions

78 Answered Questions

[SOLVED] How do I remove a particular element from an array in JavaScript?

  • 2011-04-23 22:17:18
  • Walker
  • 5656000 View
  • 7031 Score
  • 78 Answer
  • Tags:   javascript arrays

38 Answered Questions

[SOLVED] How do I remove a property from a JavaScript object?

50 Answered Questions

[SOLVED] How to replace all occurrences of a string in JavaScript

79 Answered Questions

[SOLVED] How to validate an email address in JavaScript?

54 Answered Questions

[SOLVED] How do I include a JavaScript file in another JavaScript file?

71 Answered Questions

86 Answered Questions

[SOLVED] How do JavaScript closures work?

37 Answered Questions

[SOLVED] How do you get a timestamp in JavaScript?

Sponsored Content