By ajsie


2010-02-20 06:17:34 8 Comments

Could someone explain these two terms in an understandable way?

12 comments

@Eugene 2016-11-09 16:39:06

As far as I know, most regex engine is greedy by default. Add a question mark at the end of quantifier will enable lazy match.

As @Andre S mentioned in comment.

  • Greedy: Keep searching until condition is not satisfied.
  • Lazy: Stop searching once condition is satisfied.

Refer to the example below for what is greedy and what is lazy.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {
    public static void main(String args[]){
        String money = "100000000999";
        String greedyRegex = "100(0*)";
        Pattern pattern = Pattern.compile(greedyRegex);
        Matcher matcher = pattern.matcher(money);
        while(matcher.find()){
            System.out.println("I'm greeedy and I want " + matcher.group() + " dollars. This is the most I can get.");
        }

        String lazyRegex = "100(0*?)";
        pattern = Pattern.compile(lazyRegex);
        matcher = pattern.matcher(money);
        while(matcher.find()){
            System.out.println("I'm too lazy to get so much money, only " + matcher.group() + " dollars is enough for me");
        }
    }
}


The result is:

I'm greeedy and I want 100000000 dollars. This is the most I can get.

I'm too lazy to get so much money, only 100 dollars is enough for me

@Xatenev 2017-03-13 11:46:22

I really like your example.

@User_coder 2019-08-03 23:53:15

If anyone gets here looking for what is faster when parsing:

A common misconception about regular expression performance is that lazy quantifiers (also called non-greedy, reluctant, minimal, or ungreedy) are faster than their greedy equivalents. That's generally not true, but with an important qualifier: in practice, lazy quantifiers often are faster.

Excerpt from Flagrant Badassery

@Jason Alcock 2018-03-12 10:54:44

Best shown by example. String. 192.168.1.1 and a greedy regex \b.+\b You might think this would give you the 1st octet but is actually matches against the whole string. WHY!!! Because the.+ is greedy and a greedy match matches every character in '192.168.1.1' until it reaches the end of the string. This is the important bit!!! Now it starts to backtrack one character at a time until it finds a match for the 3rd token (\b).

If the string a 4GB text file and 192.168.1.1 was at the start you could easily see how this backtracking would cause an issue.

To make a regex non greedy (lazy) put a question mark after your greedy search e.g *? ?? +? What happens now is token 2 (+?) finds a match, regex moves along a character and then tries the next token (\b) rather than token 2 (+?). So it creeps along gingerly.

@stackFan 2018-02-06 15:41:32

Greedy means it will consume your pattern until there are none of them left and it can look no further.

Lazy will stop as soon as it will encounter the first pattern you requested.

One common example that I often encounter is \s*-\s*? of a regex ([0-9]{2}\s*-\s*?[0-9]{7})

The first \s* is classified as greedy because of * and will look as many white spaces as possible after the digits are encountered and then look for a dash character "-". Where as the second \s*? is lazy because of the present of *? which means that it will look the first white space character and stop right there.

@Selva 2018-01-21 05:35:59

Greedy matching. The default behavior of regular expressions is to be greedy. That means it tries to extract as much as possible until it conforms to a pattern even when a smaller part would have been syntactically sufficient.

Example:

import re
text = "<body>Regex Greedy Matching Example </body>"
re.findall('<.*>', text)
#> ['<body>Regex Greedy Matching Example </body>']

Instead of matching till the first occurrence of ‘>’, it extracted the whole string. This is the default greedy or ‘take it all’ behavior of regex.

Lazy matching, on the other hand, ‘takes as little as possible’. This can be effected by adding a ? at the end of the pattern.

Example:

re.findall('<.*?>', text)
#> ['<body>', '</body>']

If you want only the first match to be retrieved, use the search method instead.

re.search('<.*?>', text).group()
#> '<body>'

Source: Python Regex Examples

@slebetman 2010-02-20 06:19:41

'Greedy' means match longest possible string.

'Lazy' means match shortest possible string.

For example, the greedy h.+l matches 'hell' in 'hello' but the lazy h.+?l matches 'hel'.

@Andrew S 2014-02-23 21:27:00

Brilliant, so lazy will stop as soon as the condition l is satisfied, but greedy means it will stop only once the condition l is not satisfied any more?

@Wiktor Stribiżew 2016-10-15 21:29:59

For all people reading the post: greedy or lazy quantifiers by themselves won't match the longest/shortest possible substring. You would have to use either a tempered greedy token, or use non-regex approaches.

@v.shashenko 2017-03-21 16:38:58

@AndrewS Don't be confused by the double ll in the example. It's rather lazy will match the shortest possible substring while greedy will match the longest possible. Greedy h.+l matches 'helol' in 'helolo' but the lazy h.+?l matches 'hel'.

@FloatingRock 2017-04-14 10:42:12

Doesn't the ? make the .+ optional in h.+?l. Isn't that what the ? is for? Also, how would you differentiate between the two functions of ?

@slebetman 2017-04-14 12:56:34

@FloatingRock: No. x? means x is optional but +? is a different syntax. It means stop looking after you find something that matches - lazy matching.

@slebetman 2017-04-14 12:57:45

@FloatingRock: As for how you differentiate the different syntax, simple: ? means optional and +? means lazy. Therefore \+? means + is optional.

@Sampson 2010-02-20 06:22:32

Greedy will consume as much as possible. From http://www.regular-expressions.info/repeat.html we see the example of trying to match HTML tags with <.+>. Suppose you have the following:

<em>Hello World</em>

You may think that <.+> (. means any non newline character and + means one or more) would only match the <em> and the </em>, when in reality it will be very greedy, and go from the first < to the last >. This means it will match <em>Hello World</em> instead of what you wanted.

Making it lazy (<.+?>) will prevent this. By adding the ? after the +, we tell it to repeat as few times as possible, so the first > it comes across, is where we want to stop the matching.

I'd encourage you to download RegExr, a great tool that will help you explore Regular Expressions - I use it all the time.

@ajsie 2010-02-20 06:27:12

so if you use greedy will u have 3 (1 element + 2 tags) matches or just 1 match (1 element)?

@Sampson 2010-02-20 06:28:15

It would match only 1 time, starting from the first < and ending with the last >.

@Sampson 2010-02-20 06:29:13

But making it lazy would match twice, giving us both the opening and closing tag, ignoring the text in between (since it doesn't fit the expression).

@Ron van der Heijden 2014-05-27 11:21:34

Another great tool I always use: debuggex.com It also has a "Embed on StackOverflow" function.

@alanbuchanan 2015-06-15 12:57:18

Just to add that there is a greedy way to go about it, too: <[^>]+> regex101.com/r/lW0cY6/1

@Wiktor Stribiżew 2016-04-22 07:35:01

In case one wants to delve a bit deeper into how lazy and greedy quantifiers work with optional subpatterns in-between them, check Perl regex matching optional phrase in longer sentence.

@Premraj 2016-01-15 07:26:36

+-------------------+-----------------+------------------------------+
| Greedy quantifier | Lazy quantifier |        Description           |
+-------------------+-----------------+------------------------------+
| *                 | *?              | Star Quantifier: 0 or more   |
| +                 | +?              | Plus Quantifier: 1 or more   |
| ?                 | ??              | Optional Quantifier: 0 or 1  |
| {n}               | {n}?            | Quantifier: exactly n        |
| {n,}              | {n,}?           | Quantifier: n or more        |
| {n,m}             | {n,m}?          | Quantifier: between n and m  |
+-------------------+-----------------+------------------------------+

Add a ? to a quantifier to make it ungreedy i.e lazy.

Example:
test string : stackoverflow
greedy reg expression : s.*o output: stackoverflow
lazy reg expression : s.*?o output: stackoverflow

@Breaking Benjamin 2016-09-02 08:07:04

is not ?? equivalent to ? . Similarly , isn't {n}? equivalen to {n}

@smci 2017-11-16 00:42:24

@BreakingBenjamin: no ?? is not equivalent to ?, when it has a choice to either return 0 or 1 occurrence, it will pick the 0 (lazy) alternative. To see the difference, compare re.match('(f)?(.*)', 'food').groups() to re.match('(f)??(.*)', 'food').groups(). In the latter, (f)?? will not match the leading 'f' even though it could. Hence the 'f' will get matched by the second '.*' capture group. I'm sure you can construct an example with '{n}?' too. Admittedly these two are very-rarely-used.

@FrankyHollywood 2016-10-30 06:31:14

try to understand the following behavior:

    var input = "0014.2";

Regex r1 = new Regex("\\d+.{0,1}\\d+");
Regex r2 = new Regex("\\d*.{0,1}\\d*");

Console.WriteLine(r1.Match(input).Value); // "0014.2"
Console.WriteLine(r2.Match(input).Value); // "0014.2"

input = " 0014.2";

Console.WriteLine(r1.Match(input).Value); // "0014.2"
Console.WriteLine(r2.Match(input).Value); // " 0014"

input = "  0014.2";

Console.WriteLine(r1.Match(input).Value); // "0014.2"
Console.WriteLine(r2.Match(input).Value); // ""

@Suganthan Madhavan Pillai 2014-10-19 08:34:34

Taken From www.regular-expressions.info

Greediness: Greedy quantifiers first tries to repeat the token as many times as possible, and gradually gives up matches as the engine backtracks to find an overall match.

Laziness: Lazy quantifier first repeats the token as few times as required, and gradually expands the match as the engine backtracks through the regex to find an overall match.

@Carl Norum 2010-02-20 06:19:18

Greedy means your expression will match as large a group as possible, lazy means it will match the smallest group possible. For this string:

abcdefghijklmc

and this expression:

a.*c

A greedy match will match the whole string, and a lazy match will match just the first abc.

@Adriaan Stander 2010-02-20 06:21:22

From Regular expression

The standard quantifiers in regular expressions are greedy, meaning they match as much as they can, only giving back as necessary to match the remainder of the regex.

By using a lazy quantifier, the expression tries the minimal match first.

Related Questions

Sponsored Content

19 Answered Questions

[SOLVED] How do you use a variable in a regular expression?

  • 2009-01-30 00:11:05
  • JC Grubbs
  • 686590 View
  • 1256 Score
  • 19 Answer
  • Tags:   javascript regex

73 Answered Questions

52 Answered Questions

[SOLVED] What is the best regular expression to check if a string is a valid URL?

29 Answered Questions

[SOLVED] Regular expression to match a line that doesn't contain a word

12 Answered Questions

[SOLVED] Regular Expressions: Is there an AND operator?

  • 2009-01-22 16:49:14
  • Hugoware
  • 710335 View
  • 659 Score
  • 12 Answer
  • Tags:   regex lookahead

18 Answered Questions

[SOLVED] How do you access the matched groups in a JavaScript regular expression?

  • 2009-01-11 07:21:20
  • nickf
  • 722073 View
  • 1279 Score
  • 18 Answer
  • Tags:   javascript regex

8 Answered Questions

[SOLVED] Is there a regular expression to detect a valid regular expression?

  • 2008-10-05 17:07:35
  • psytek
  • 199427 View
  • 970 Score
  • 8 Answer
  • Tags:   regex

15 Answered Questions

[SOLVED] What is a non-capturing group in regular expressions?

17 Answered Questions

[SOLVED] Regular Expression for alphanumeric and underscores

  • 2008-12-03 04:25:27
  • Jim
  • 1033355 View
  • 548 Score
  • 17 Answer
  • Tags:   regex

5 Answered Questions

[SOLVED] \d is less efficient than [0-9]

Sponsored Content