By andyuk


2008-10-01 18:48:22 8 Comments

For example, this regex

(.*)<FooBar>

will match:

abcde<FooBar>

But how do I get it to match across multiple lines?

abcde
fghij<FooBar>

21 comments

@Paul Jones 2019-02-27 12:50:43

In Javascript you can use [^]* to search for zero to infinite characters, including line breaks.

$("#find_and_replace").click(function() {
  var text = $("#textarea").val();
  search_term = new RegExp("[^]*<Foobar>", "gi");;
  replace_term = "Replacement term";
  var new_text = text.replace(search_term, replace_term);
  $("#textarea").val(new_text);
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<button id="find_and_replace">Find and replace</button>
<br>
<textarea ID="textarea">abcde
fghij&lt;Foobar&gt;</textarea>

@Wiktor Stribiżew 2017-08-31 12:47:20

The question is, can . pattern match any character? The answer varies from engine to engine. The main difference is whether the pattern is used by a POSIX or non-POSIX regex library.

Special note about : they are not considered regular expressions, but . matches any char there, same as POSIX based engines.

Another note on and : the . matches any char by default (demo): str = "abcde\n fghij<Foobar>"; expression = '(.*)<Foobar>*'; [tokens,matches] = regexp(str,expression,'tokens','match'); (tokens contain a abcde\n fghij item).

Also, in all of 's regex grammars the dot matches line breaks by default. Boost's ECMAScript grammar allows you to turn this off with regex_constants::no_mod_m (source).

As for (it is POSIX based), use n option (demo): select regexp_substr('abcde' || chr(10) ||' fghij<Foobar>', '(.*)<Foobar>', 1, 1, 'n', 1) as results from dual

POSIX-based engines:

A mere . already matches line breaks, no need to use any modifiers, see (demo).

The (demo), (demo), (TRE, base R default engine with no perl=TRUE, for base R with perl=TRUE or for stringr/stringi patterns, use the (?s) inline modifier) (demo) also treat . the same way.

However, most POSIX based tools process input line by line. Hence, . does not match the line breaks just because they are not in scope. Here are some examples how to override this:

  • - There are multiple workarounds, the most precise but not very safe is sed 'H;1h;$!d;x; s/\(.*\)><Foobar>/\1/' (H;1h;$!d;x; slurps the file into memory). If whole lines must be included, sed '/start_pattern/,/end_pattern/d' file (removing from start will end with matched lines included) or sed '/start_pattern/,/end_pattern/{{//!d;};}' file (with matching lines excluded) can be considered.
  • - perl -0pe 's/(.*)<FooBar>/$1/gs' <<< "$str" (-0 slurps the whole file into memory, -p prints the file after applying the script given by -e). Note that using -000pe will slurp the file and activate 'paragraph mode' where Perl uses consecutive newlines (\n\n) as the record separator.
  • - grep -Poz '(?si)abc\K.*?(?=<Foobar>)' file. Here, z enables file slurping, (?s) enables the DOTALL mode for the . pattern, (?i) enables case insensitive mode, \K omits the text matched so far, *? is a lazy quantifier, (?=<Foobar>) matches the location before <Foobar>.
  • - pcregrep -Mi "(?si)abc\K.*?(?=<Foobar>)" file (M enables file slurping here). Note pcregrep is a good solution for Mac OS grep users.

See demos.

Non-POSIX-based engines:

  • - Use s modifier PCRE_DOTALL modifier: preg_match('~(.*)<Foobar>~s', $s, $m) (demo)
  • - Use RegexOptions.Singleline flag (demo):
    - var result = Regex.Match(s, @"(.*)<Foobar>", RegexOptions.Singleline).Groups[1].Value;
    - var result = Regex.Match(s, @"(?s)(.*)<Foobar>").Groups[1].Value;
  • - Use (?s) inline option: $s = "abcde`nfghij<FooBar>"; $s -match "(?s)(.*)<Foobar>"; $matches[1]
  • - Use s modifier (or (?s) inline version at the start) (demo): /(.*)<FooBar>/s
  • - Use re.DOTALL (or re.S) flags or (?s) inline modifier (demo): m = re.search(r"(.*)<FooBar>", s, flags=re.S) (and then if m:, print(m.group(1)))
  • - Use Pattern.DOTALL modifier (or inline (?s) flag) (demo): Pattern.compile("(.*)<FooBar>", Pattern.DOTALL)
  • - Use (?s) in-pattern modifier (demo): regex = /(?s)(.*)<FooBar>/
  • - Use (?s) modifier (demo): "(?s)(.*)<Foobar>".r.findAllIn("abcde\n fghij<Foobar>").matchData foreach { m => println(m.group(1)) }
  • - Use [^] or workarounds [\d\D] / [\w\W] / [\s\S] (demo): s.match(/([\s\S]*)<FooBar>/)[1]
  • (std::regex) Use [\s\S] or the JS workarounds (demo): regex rex(R"(([\s\S]*)<FooBar>)");
  • - Use the same approach as in JavaScript, ([\s\S]*)<Foobar>.
  • - Use /m MULTILINE modifier (demo): s[/(.*)<Foobar>/m, 1]
  • - Use the inline modifier (?s) at the start (demo): re: = regexp.MustCompile(`(?s)(.*)<FooBar>`)
  • - Use dotMatchesLineSeparators or (easier) pass the (?s) inline modifier to the pattern: let rx = "(?s)(.*)<Foobar>"
  • - Same as Swift, (?s) works the easiest, but here is how the option can be used: NSRegularExpression* regex = [NSRegularExpression regularExpressionWithPattern:pattern options:NSRegularExpressionDotMatchesLineSeparators error:&regexError];
  • , - Use (?s) modifier (demo): "(?s)(.*)<Foobar>" (in Google Spreadsheets, =REGEXEXTRACT(A2,"(?s)(.*)<Foobar>"))

NOTES ON (?s):

In most non-POSIX engines, (?s) inline modifier (or embedded flag option) can be used to enforce . to match line breaks.

If placed at the start of the pattern, (?s) changes the bahavior of all . in the pattern. If the (?s) is placed somewhere after the beginning, only those . will be affected that are located to the right of it unless this is a pattern passed to Python re. In Python re, regardless of the (?s) location, the whole pattern . are affected. The (?s) effect is stopped using (?-s). A modified group can be used to only affect a specified range of a regex pattern (e.g. Delim1(?s:.*?)\nDelim2.* will make the first .*? match across newlines and the second .* will only match the rest of the line).

POSIX note:

In non-regex engines, to match any char, [\s\S] / [\d\D] / [\w\W] constructs can be used.

In POSIX, [\s\S] is not matching any char (as in JavaScript or any non-POSIX engine) because regex escape sequences are not supported inside bracket expressions. [\s\S] is parsed as bracket expressions that match a single char, \ or s or S.

@Jan 2017-10-15 20:15:24

You should link to this excellent overview from your profile page or something (+1).

@sln 2018-04-26 21:30:09

You may want to add this to the boost item: In the regex_constants namespace, flag_type_'s : perl = ECMAScript = JavaScript = JScript = ::boost::regbase::normal = 0 which defaults to Perl. Programmers will set a base flag definition #define MOD regex_constants::perl | boost::regex::no_mod_s | boost::regex::no_mod_m for thier regex flags to reflect that. And the arbitor is always the inline modifiers. Where (?-sm)(?s).* resets.

@Pasupathi Rajamanickam 2018-12-19 02:12:47

Can you also add for bash please?

@Wiktor Stribiżew 2018-12-19 07:33:26

@PasupathiRajamanickam Bash uses a POSIX regex engine, the . matches any char there (including line breaks). See this online Bash demo.

@RAN_0915 2018-08-06 07:48:29

we can also use

(.*?\n)*?

to match everything including newline without greedy

This will make the new line optional

(.*?|\n)*?

@Kamahire 2013-06-03 06:22:19

In java based regular expression you can use [\s\S]

@Paul Draper 2013-10-19 06:48:54

Shouldn't those be backslashes?

@RandomInsano 2013-12-21 20:12:24

They go at the end of the Regular Expression, not within in. Example: /blah/s

@3limin4t0r 2018-09-25 17:47:28

I guess you mean JavaScript, not Java? Since you can just add the s flag to the pattern in Java and JavaScript doesn't have the s flag.

@vibaiher 2012-08-03 07:52:16

In Ruby you can use the 'm' option (multiline):

/YOUR_REGEXP/m

See the Regexp documentation on ruby-doc.org for more information.

@Abbas Shahzadeh 2011-07-30 13:03:56

In JavaScript, use /[\S\s]*<Foobar>/. Source

@Allen 2013-05-09 15:34:57

From that link: "JavaScript and VBScript do not have an option to make the dot match line break characters. In those languages, you can use a character class such as [\s\S] to match any character." Instead of the . use [\s\S] (match spaces and non-spaces) instead.

@Gordon 2013-01-03 11:32:13

For Eclipse worked following expression:

Foo

jadajada Bar"

Regular-Expression:

Foo[\S\s]{1,10}.*Bar*

@user1348737 2012-04-21 20:05:32

Often we have to modify a substring with a few keywords spread across lines preceding the substring. Consider an xml element:

<TASK>
  <UID>21</UID>
  <Name>Architectural design</Name>
  <PercentComplete>81</PercentComplete>
</TASK>

Suppose we want to modify the 81, to some other value, say 40. First identify .UID.21..UID., then skip all characters including \n till .PercentCompleted.. The regular expression pattern and the replace specification are:

String hw = new String("<TASK>\n  <UID>21</UID>\n  <Name>Architectural design</Name>\n  <PercentComplete>81</PercentComplete>\n</TASK>");
String pattern = new String ("(<UID>21</UID>)((.|\n)*?)(<PercentComplete>)(\\d+)(</PercentComplete>)");
String replaceSpec = new String ("$1$2$440$6");
//note that the group (<PercentComplete>) is $4 and the group ((.|\n)*?) is $2.

String  iw = hw.replaceFirst(pattern, replaceSpec);
System.out.println(iw);

<TASK>
  <UID>21</UID>
  <Name>Architectural design</Name>
  <PercentComplete>40</PercentComplete>
</TASK>

The subgroup (.|\n) is probably the missing group $3. If we make it non-capturing by (?:.|\n) then the $3 is (<PercentComplete>). So the pattern and replaceSpec can also be:

pattern = new String("(<UID>21</UID>)((?:.|\n)*?)(<PercentComplete>)(\\d+)(</PercentComplete>)");
replaceSpec = new String("$1$2$340$5")

and the replacement works correctly as before.

@samwize 2012-07-19 17:59:45

([\s\S]*)<FooBar>

The dot matches all except newlines (\r\n). So use \s\S, which will match ALL characters.

@J. Costa 2012-08-24 22:29:41

This solve the problem if you are using the Objective-C [text rangeOfString:regEx options:NSRegularExpressionSearch]. Thanks!

@barclay 2015-09-16 22:14:46

This works in intelliJ's find&replace regex, thanks.

@Ozkan 2017-09-26 14:16:23

This works. But it needs to be the first occurrence of <FooBar>

@Sian Lerk Lau 2012-04-04 11:00:26

Solution:

Use pattern modifier sU will get the desired matching in PHP.

example:

preg_match('/(.*)/sU',$content,$match);

Source:

http://dreamluverz.com/developers-tools/regex-match-all-including-new-line http://php.net/manual/en/reference.pcre.pattern.modifiers.php

@Paulo Merson 2011-11-25 13:16:55

If you're using Eclipse search, you can enable the "DOTALL" option to make '.' match any character including line delimiters: just add "(?s)" at the beginning of your search string. Example:

(?s).*<FooBar>

@Steven Soroka 2013-10-08 16:50:23

This is not eclipse-specific, should work anywhere.

@Wiktor Stribiżew 2016-07-18 11:06:56

Not anywhere, only in regex flavors supporting inline modifiers, and certainly not in Ruby where (?s) => (?m)

@Pasupathi Rajamanickam 2018-12-19 02:12:05

Anything for bash?

@Spangen 2011-01-18 09:31:21

I wanted to match a particular if block in java

   ...
   ...
   if(isTrue){
       doAction();

   }
...
...
}

If I use the regExp

if \(isTrue(.|\n)*}

it included the closing brace for the method block so I used

if \(!isTrue([^}.]|\n)*}

to exclude the closing brace from the wildcard match.

@shmall 2010-04-13 00:42:03

Use RegexOptions.Singleline, it changes the meaning of . to include newlines

Regex.Replace(content, searchText, replaceText, RegexOptions.Singleline);

@Markus Jarderot 2008-10-01 18:52:28

"." normally doesn't match line-breaks. Most regex engines allows you to add the S-flag (also called DOTALL and SINGLELINE) to make "." also match newlines. If that fails, you could do something like [\S\s].

@Slee 2009-03-26 14:57:08

I had the same problem and solved it in probably not the best way but it works. I replaced all line breaks before I did my real match:

mystring= Regex.Replace(mystring, "\r\n", "")

I am manipulating HTML so line breaks don't really matter to me in this case.

I tried all of the suggestions above with no luck, I am using .Net 3.5 FYI

@Vamshi Krishna 2018-05-18 07:26:25

I am using .NET too and (\s|\S) seems to do the trick for me!

@Wiktor Stribiżew 2018-09-14 20:35:54

@VamshiKrishna In .NET, use (?s) to make . match any chars. Do not use (\s|\S) that will slow down performance.

@tye 2008-10-02 03:31:26

Note that (.|\n)* can be less efficient than (for example) [\s\S]* (if your language's regexes support such escapes) and than finding how to specify the modifier that makes . also match newlines. Or you can go with POSIXy alternatives like [[:space:][:^space:]]*.

@nsayer 2008-10-01 18:49:42

In the context of use within languages, regular expressions act on strings, not lines. So you should be able to use the regex normally, assuming that the input string has multiple lines.

In this case, the given regex will match the entire string, since "<FooBar>" is present. Depending on the specifics of the regex implementation, the $1 value (obtained from the "(.*)") will either be "fghij" or "abcde\nfghij". As others have said, some implementations allow you to control whether the "." will match the newline, giving you the choice.

Line-based regular expression use is usually for command line things like egrep.

@Bill 2008-10-01 18:54:07

/(.*)<FooBar>/s

the s causes Dot (.) to match carriage returns

@Allen 2013-05-09 15:31:11

Seems like this is invalid (Chrome): text.match(/a/s) SyntaxError: Invalid flags supplied to RegExp constructor 's'

@Morgan Touverey Quilling 2016-04-20 18:51:07

Because it is unsupported in JavaScript RegEx engines. The s flags exists in PCRE, the most complete engine (available in Perl and PHP). PCRE has 10 flags (and a lot of other features) while JavaScript has only 3 flags (gmi).

@tloach 2008-10-01 18:52:56

generally . doesn't match newlines, so try ((.|\n)*)<foobar>

@Alan Moore 2009-04-26 03:17:04

No, don't do that. If you need to match anything including line separators, use the DOTALL (a.k.a. /s or SingleLine) modifier. Not only does the (.|\n) hack make the regex less efficient, it's not even correct. At the very least, it should match \r (carriage return) as well as \n (linefeed). There are other line separator characters, too, albeit rarely used. But if you use the DOTALL flag, you don't have to worry about them.

@opyate 2009-11-30 11:13:50

\R is the platform-independent match for newlines in Eclipse.

@jeckhart 2012-10-15 21:29:05

@opyate You should post this as an answer as this little gem is incredibly useful.

@ssc-hrep3 2016-11-29 09:52:21

You could try this instead. It won't match the inner brackets and also consider the optional\r.: ((?:.|\r?\n)*)<foobar>

@levik 2008-10-01 18:52:27

Try this:

((.|\n)*)<FooBar>

It basically says "any character or a newline" repeated zero or more times.

@Ben Doom 2008-10-01 18:57:49

This is dependent on the language and/or tool you are using. Please let us know what you are using, eg Perl, PHP, CF, C#, sed, awk, etc.

@Potherca 2012-03-09 17:27:53

Depending on your line endings you might need ((.|\n|\r)*)<FooBar>

@Danubian Sailor 2012-04-18 08:14:45

He said he is using Eclipse. This is correct solution in my opinion. I have same problem and this solved it.

@acme 2012-06-13 12:04:41

Right - the question is about eclipse and so are the tags. But the accepted solution is a PHP solution. Yours should be the accepted solution...

@fr13d 2015-10-05 20:20:29

\R matches line endings in a platform-independent manner. In eclipse, at least, and some other tools.

@Manolis Agkopian 2015-10-13 01:06:56

Very funny, I tried this on gedit and I got a segmentation fault. Murphy's law at its finest.

@Wiktor Stribiżew 2016-07-18 11:05:55

This is the worst regex for matching multiple line input. Please never use it unless you are using ElasticSearch. Use [\s\S]* or (?s).*.

@Snow 2019-04-25 02:24:06

Such needless alternation can result in catastrophic backtracking in some situations. This isn't a good general pattern.

@Jeremy Ruten 2008-10-01 18:52:18

It depends on the language, but there should be a modifier that you can add to the regex pattern. In PHP it is:

/(.*)<FooBar>/s

The s at the end causes the dot to match all characters including newlines.

@Grace 2011-04-11 12:02:23

and what if i wanted just a new line and not all characters ?

@Jeremy Ruten 2011-04-11 21:05:54

@Grace: use \n to match a newline

@Grace 2011-04-12 05:45:02

I know..Im trying but its not working. I dont know why

@Grace 2011-04-12 08:08:26

\r\n works perfectly

@Josef Sábl 2013-04-30 09:01:36

@Allen 2013-05-09 15:37:53

The s flag is (now?) invalid, at least in Chrome/V8. Instead use /([\s\S]*)<FooBar>/ character class (match space and non-space] instead of the period matcher. See other answers for more info.

@Derek 朕會功夫 2015-07-12 22:26:26

@Allen - JavaScript doesn't support the s modifier. Instead, do [^]* for the same effect.

@Ryan Buckley 2015-07-15 22:57:17

In Ruby, use the m modifier

@Mohamad Hamouday 2018-04-21 03:44:22

If there are multiple values of <FooBar>, it will ignore all the values in the middle and only match the last <FooBar>

@NealWalters 2018-08-10 20:25:21

What to use for Powershell?

@Haddock-san 2018-10-10 20:00:18

I love you, sir.

Related Questions

Sponsored Content

73 Answered Questions

10 Answered Questions

[SOLVED] Regular Expressions- Match Anything

  • 2011-07-15 19:04:52
  • Walker
  • 422891 View
  • 218 Score
  • 10 Answer
  • Tags:   regex

28 Answered Questions

7 Answered Questions

[SOLVED] How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops

  • 2014-03-20 19:09:13
  • Portland Runner
  • 744847 View
  • 506 Score
  • 7 Answer
  • Tags:   regex excel vba

8 Answered Questions

[SOLVED] Is there a regular expression to detect a valid regular expression?

  • 2008-10-05 17:07:35
  • psytek
  • 103199 View
  • 672 Score
  • 8 Answer
  • Tags:   regex

18 Answered Questions

[SOLVED] How do you use a variable in a regular expression?

  • 2009-01-30 00:11:05
  • JC Grubbs
  • 622613 View
  • 1159 Score
  • 18 Answer
  • Tags:   javascript regex

16 Answered Questions

[SOLVED] How do you access the matched groups in a JavaScript regular expression?

  • 2009-01-11 07:21:20
  • nickf
  • 668650 View
  • 1196 Score
  • 16 Answer
  • Tags:   javascript regex

10 Answered Questions

[SOLVED] How to match "anything up until this sequence of characters" in a regular expression?

  • 2011-08-19 16:45:44
  • callum
  • 521390 View
  • 405 Score
  • 10 Answer
  • Tags:   regex

6 Answered Questions

[SOLVED] Regular expression to stop at first match

  • 2010-03-23 20:36:35
  • publicRavi
  • 480518 View
  • 437 Score
  • 6 Answer
  • Tags:   regex

4 Answered Questions

[SOLVED] Regex to match any character including new lines

  • 2011-11-28 22:47:42
  • kurotsuki
  • 210010 View
  • 180 Score
  • 4 Answer
  • Tags:   regex perl

Sponsored Content