By Daryl Spitzer

2008-10-08 00:51:36 8 Comments

What is the difference between the search() and match() functions in the Python re module?

I've read the documentation (current documentation), but I never seem to remember it. I keep having to look it up and re-learn it. I'm hoping that someone will answer it clearly with examples so that (perhaps) it will stick in my head. Or at least I'll have a better place to return with my question and it will take less time to re-learn it.


@ldR 2015-07-30 05:27:27

You can refer the below example to understand the working of re.match and

a = "123abc"
t = re.match("[a-z]+",a)
t ="[a-z]+",a)

re.match will return none, but will return abc.

@SanD 2017-03-01 15:09:40

Would just like to add that search will return _sre.SRE_Match object (or None if not found). To get 'abc', you need to call

@Jeyekomon 2018-04-07 19:03:18

match is much faster than search, so instead of doing"word") you can do regex.match((.*?)word(.*?)) and gain tons of performance if you are working with millions of samples.

This comment from @ivan_bilan under the accepted answer above got me thinking if such hack is actually speeding anything up, so let's find out how many tons of performance you will really gain.

I prepared the following test suite:

import random
import re
import string
import time

LIST_SIZE = 1000000

def generate_word():
    word = [random.choice(string.ascii_lowercase) for _ in range(LENGTH)]
    word = ''.join(word)
    return word

wordlist = [generate_word() for _ in range(LIST_SIZE)]

start = time.time()
['python', word) for word in wordlist]
print('search:', time.time() - start)

start = time.time()
[re.match('(.*?)python(.*?)', word) for word in wordlist]
print('match:', time.time() - start)

I made 10 measurements (1M, 2M, ..., 10M words) which gave me the following plot:

match vs. search regex speedtest line plot

The resulting lines are surprisingly (actually not that surprisingly) straight. And the search function is (slightly) faster given this specific pattern combination. The moral of this test: Avoid overoptimizing your code.

@Robert Dodier 2018-10-30 16:37:01

+1 for actually investigating the assumptions behind a statement meant to be taken at face value -- thanks.

@baptx 2019-01-21 18:36:31

Indeed the comment of @ivan_bilan looks wrong but the match function is still faster than the search function if you compare the same regular expression. You can check in your script by comparing'^python', word) to re.match('python', word) (or re.match('^python', word) which is the same but easier to understand if you don't read the documentation and seems not to affect the performance)

@Jeyekomon 2019-01-22 10:57:10

@baptx I disagree with the statement that the match function is generally faster. The match is faster when you want to search at the beginning of the string, the search is faster when you want to search throughout the string. Which corresponds with the common sense. That's why @ivan_bilan was wrong - he used match to search throughout the string. That's why you are right - you used match to search at the beginning of the string. If you disagree with me, try to find regex for match that is faster than'python', word) and does the same job.

@Jeyekomon 2019-01-22 11:26:03

@baptx Also, as a footnote, the re.match('python') is marginally faster than re.match('^python'). It has to be.

@baptx 2019-01-23 20:23:10

@Jeyekomon yes that's what I meant, match function is a bit faster if you want to search at the beginning of a string (compared to using search function to find a word at the beginning of a string with'^python', word) for example). But I find this weird, if you tell the search function to search at the beginning of a string, it should be as fast as the match function.

@Jeyekomon 2019-01-24 19:58:13

@baptx My guess is that the search function has to parse and process the ^ information while match has it already hardcoded down in the c binary. The speed difference is only about 10 % on my PC anyway.

@baptx 2019-01-26 09:20:38

@Jeyekomon it could have come from here but I don't think it is the case since if we give the unnecessary ^ character to the match function, it does not take more time to read it (sometimes it was even a bit faster).

@Denis de Bernardy 2019-07-21 11:59:33

That is faster for this specific regex is not surprising at all. The re.match() pattern is longer to process if only because it's capturing the beginning of the string.

@U10-Forward - Reinstate Monica 2018-10-31 00:22:53

Much shorter:

  • search scans trough whole string.

  • match Does only beginning of string.

Following Ex says it:

>>> a = "123abc"
>>> re.match("[a-z]+",a)

@nosklo 2008-10-08 00:53:12

re.match is anchored at the beginning of the string. That has nothing to do with newlines, so it is not the same as using ^ in the pattern.

As the re.match documentation says:

If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObject instance. Return None if the string does not match the pattern; note that this is different from a zero-length match.

Note: If you want to locate a match anywhere in string, use search() instead. searches the entire string, as the documentation says:

Scan through string looking for a location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

So if you need to match at the beginning of the string, or to match the entire string use match. It is faster. Otherwise use search.

The documentation has a specific section for match vs. search that also covers multiline strings:

Python offers two different primitive operations based on regular expressions: match checks for a match only at the beginning of the string, while search checks for a match anywhere in the string (this is what Perl does by default).

Note that match may differ from search even when using a regular expression beginning with '^': '^' matches only at the start of the string, or in MULTILINE mode also immediately following a newline. The “match” operation succeeds only if the pattern matches at the start of the string regardless of mode, or at the starting position given by the optional pos argument regardless of whether a newline precedes it.

Now, enough talk. Time to see some example code:

# example code:
string_with_newlines = """something

import re

print re.match('some', string_with_newlines) # matches
print re.match('someother', 
               string_with_newlines) # won't match
print re.match('^someother', string_with_newlines, 
               re.MULTILINE) # also won't match
                string_with_newlines) # finds something
print'^someother', string_with_newlines, 
                re.MULTILINE) # also finds something

m = re.compile('thing$', re.MULTILINE)

print m.match(string_with_newlines) # no match
print m.match(string_with_newlines, pos=4) # matches
               re.MULTILINE) # also matches

@Daryl Spitzer 2008-10-08 01:01:08

What about strings containing newlines?

@nosklo 2008-10-08 01:05:57

even with strings containing newlines, match() matches only at the BEGINNING of the string.

@Daryl Spitzer 2008-10-08 01:19:46

That's the answer I was hoping for! (Especially now that you provided an example.)

@Alby 2014-07-23 02:55:10

Why would anyone use limited match rather than more general search then? is it for speed?

@ivan_bilan 2016-05-24 09:34:45

@Alby match is much faster than search, so instead of doing"word") you can do regex.match((.*?)word(.*?)) and gain tons of performance if you are working with millions of samples.

@Sammaron 2016-09-16 15:14:28

Well, that's goofy. Why call it match? Is it a clever maneuver to seed the API's with unintuitive names to force me to read the documentation? I still won't do it! Rebel!

@baptx 2019-01-21 18:56:03

@ivan_bilan match looks a bit faster than search when using the same regular expression but your example seems wrong according to a performance test:…

@jxpython 2019-02-11 15:17:16

@nosklo hey can i contact you personally for a job regarding regex python?

@Zitao Wang 2019-08-19 14:23:02

When using a regular expression beginning with '^', and with MULTILINE unspecified, is match the same as search (produce the same result)?

@CODE-REaD 2016-05-21 13:28:47

The difference is, re.match() misleads anyone accustomed to Perl, grep, or sed regular expression matching, and does not. :-)

More soberly, As John D. Cook remarks, re.match() "behaves as if every pattern has ^ prepended." In other words, re.match('pattern') equals'^pattern'). So it anchors a pattern's left side. But it also doesn't anchor a pattern's right side: that still requires a terminating $.

Frankly given the above, I think re.match() should be deprecated. I would be interested to know reasons it should be retained.

@JoelFan 2017-06-27 23:38:37

"behaves as if every pattern has ^ prepended." is only true if you don't use the multiline option. The correct statement is "... has \A prepended"

@Dhanasekaran Anbalagan 2011-12-31 12:05:43

search ⇒ find something anywhere in the string and return a match object.

match ⇒ find something at the beginning of the string and return a match object.

@xilun 2008-10-08 01:07:26 searches for the pattern throughout the string, whereas re.match does not search the pattern; if it does not, it has no other choice than to match it at start of the string.

@Smit Johnth 2015-07-14 19:21:11

Why match at start, but not till end of string (fullmatch in phyton 3.4)?

@cschol 2008-10-08 00:54:57

re.match attempts to match a pattern at the beginning of the string. attempts to match the pattern throughout the string until it finds a match.

Related Questions

Sponsored Content

16 Answered Questions

[SOLVED] What are metaclasses in Python?

25 Answered Questions

[SOLVED] Difference between staticmethod and classmethod

29 Answered Questions

[SOLVED] What does if __name__ == "__main__": do?

20 Answered Questions

12 Answered Questions

[SOLVED] Is there a portable way to get the current username in Python?

38 Answered Questions

[SOLVED] What does the "yield" keyword do?

11 Answered Questions

[SOLVED] What is for?

22 Answered Questions

[SOLVED] Difference between __str__ and __repr__?

7 Answered Questions

[SOLVED] What are the differences between type() and isinstance()?

Sponsored Content