By Blankman


2010-08-09 02:52:50 8 Comments

I'm looking for a string.contains or string.indexof method in Python.

I want to do:

if not somestring.contains("blah"):
   continue

10 comments

@Jeffrey04 2018-03-28 09:59:15

If you are happy with "blah" in somestring but want it to be a function/method call, you can probably do this

import operator

if not operator.contains(somestring, "blah"):
    continue

All operators in Python can be more or less found in the operator module including in.

@Muskovets 2018-11-23 07:15:14

You can use regular expressions to get the occurrences:

>>> import re
>>> print(re.findall(r'( |t)', to_search_in)) # searches for t or space
['t', ' ', 't', ' ', ' ']

@Brandon Bailey 2019-02-06 11:06:27

You can use y.count()

it will return the integer value of the number of times a sub string appears in a string.

E.g:

string.count("bah") >> 0
string.count("Hello") >> 1

@Jean-François Fabre 2019-05-16 05:53:53

counting a string is costly when you just want to check if it's there...

@Brandon Bailey 2019-05-16 09:24:23

thats why I provided multiple methods

@Jean-François Fabre 2019-05-16 11:38:44

methods that exist in the original post from 2010 so I ended up editing them out, with consensus from community (see meta post meta.stackoverflow.com/questions/385063/…)

@Brandon Bailey 2019-05-16 11:46:00

Well there's only a finite number of ways to achieve what OP asked. Are you planning on developing the Python language and introducing a NEW method for sub-string querying?

@Jean-François Fabre 2019-05-16 11:48:28

no. My point is "why answering the exact same thing as others did 9 years ago" ?

@Brandon Bailey 2019-05-16 11:54:21

I've posted 3 valid methods to achieve the goal OP and other viewers intend(ed) to achieve. Given the nature of the information provided, its unreasonable to expect all answers to be %100 unique. The accepted answer didn't contain the 2 other methods I provided and so I expanded the valid answers list to further help anyone coming across this issue. Why are you breathing when there are already people that breath?

@Jean-François Fabre 2019-05-16 11:55:30

because I'm moderating the site... I've asked the question on meta meta.stackoverflow.com/questions/385063/…

@Brandon Bailey 2019-05-16 12:06:10

then If you have the authority to remove it then remove it, else do what you must and move on. IMO this answer adds value, which is reflected by up-votes from users.

@cs95 2019-06-05 03:03:08

DO NOT USE THIS. something in string is a lot cleaner than string.count(something) > 0, especially if "something" occurs more than once. in short circuits. count does not. This is a poor answer simply because you have presented a "new" method without informing users about its pitfalls. Upvotes mean nothing more than confirm that the button works.

@Brandon Bailey 2019-06-05 09:34:17

I agree, I had an in depth answer that proposed 3 possible solutions. but this was changed by Jean-Francois Fabre to be what it currently is. Not sure why he would change it so.

@eldarerathis 2010-08-09 02:55:04

If it's just a substring search you can use string.find("substring").

You do have to be a little careful with find, index, and in though, as they are substring searches. In other words, this:

s = "This be a string"
if s.find("is") == -1:
    print "No 'is' here!"
else:
    print "Found 'is' in the string."

It would print Found 'is' in the string. Similarly, if "is" in s: would evaluate to True. This may or may not be what you want.

@aaronasterling 2010-08-09 03:22:50

+1 for highlighting the gotchas involved in substring searches. the obvious solution is if ' is ' in s: which will return False as is (probably) expected.

@Bob 2012-11-08 00:07:37

@aaronasterling Obvious it may be, but not entirely correct. What if you have punctuation or it's at the start or end? What about capitalisation? Better would be a case insensitive regex search for \bis\b (word boundaries).

@Jamie Bull 2018-01-13 15:26:40

@bob Or if 'is' in (w.lower() for w in s.split())

@Bob 2018-01-13 15:46:41

@JamieBull Once again, you must consider if you want to include punctuation as a delimiter for a word. Splitting would have largely the same effect as the naive solution of checking for ' is ', notably, it won't catch This is, a comma' or 'It is.'.

@Jamie Bull 2018-01-13 15:50:14

Good point. It starts to get a bit unwieldy now, but 'is' not in (w.lower() for w in s.split(string.punctuation + string.whitespace)) is better

@ShadowRanger 2018-02-01 01:39:31

@JamieBull: I highly doubt any real input split with s.split(string.punctuation + string.whitespace) would split even once; split isn't like the strip/rstrip/lstrip family of functions, it only splits when it sees all of the delimiter characters, contiguously, in that exact order. If you want to split on character classes, you're back to regular expressions (at which point, searching for r'\bis\b' without splitting is the simpler, faster way to go).

@Jamie Bull 2018-02-01 11:52:02

'is' not in (w.lower() for w in s.translate(string.maketrans(' ' * len(string.punctuation + string.whitespace), string.punctuation + string.whitespace)).split() - ok, point taken. This is now ridiculous...

@Jamie Bull 2018-08-01 20:45:51

Although if anyone needs it, there's a working version here... stackoverflow.com/a/48241340/1706564

@ytpillai 2015-05-25 22:50:59

Here is your answer:

if "insert_char_or_string_here" in "insert_string_to_search_here":
    #DOSTUFF

For checking if it is false:

if not "insert_char_or_string_here" in "insert_string_to_search_here":
    #DOSTUFF

OR:

if "insert_char_or_string_here" not in "insert_string_to_search_here":
    #DOSTUFF

@Aaron Hall 2014-11-25 22:33:48

Does Python have a string contains substring method?

Yes, but Python has a comparison operator that you should use instead, because the language intends its usage, and other programmers will expect you to use it. That keyword is in, which is used as a comparison operator:

>>> 'foo' in '**foo**'
True

The opposite (complement), which the original question asks for, is not in:

>>> 'foo' not in '**foo**' # returns False
False

This is semantically the same as not 'foo' in '**foo**' but it's much more readable and explicitly provided for in the language as a readability improvement.

Avoid using __contains__, find, and index

As promised, here's the contains method:

str.__contains__('**foo**', 'foo')

returns True. You could also call this function from the instance of the superstring:

'**foo**'.__contains__('foo')

But don't. Methods that start with underscores are considered semantically private. The only reason to use this is when extending the in and not in functionality (e.g. if subclassing str):

class NoisyString(str):
    def __contains__(self, other):
        print('testing if "{0}" in "{1}"'.format(other, self))
        return super(NoisyString, self).__contains__(other)

ns = NoisyString('a string with a substring inside')

and now:

>>> 'substring' in ns
testing if "substring" in "a string with a substring inside"
True

Also, avoid the following string methods:

>>> '**foo**'.index('foo')
2
>>> '**foo**'.find('foo')
2

>>> '**oo**'.find('foo')
-1
>>> '**oo**'.index('foo')

Traceback (most recent call last):
  File "<pyshell#40>", line 1, in <module>
    '**oo**'.index('foo')
ValueError: substring not found

Other languages may have no methods to directly test for substrings, and so you would have to use these types of methods, but with Python, it is much more efficient to use the in comparison operator.

Performance comparisons

We can compare various ways of accomplishing the same goal.

import timeit

def in_(s, other):
    return other in s

def contains(s, other):
    return s.__contains__(other)

def find(s, other):
    return s.find(other) != -1

def index(s, other):
    try:
        s.index(other)
    except ValueError:
        return False
    else:
        return True



perf_dict = {
'in:True': min(timeit.repeat(lambda: in_('superstring', 'str'))),
'in:False': min(timeit.repeat(lambda: in_('superstring', 'not'))),
'__contains__:True': min(timeit.repeat(lambda: contains('superstring', 'str'))),
'__contains__:False': min(timeit.repeat(lambda: contains('superstring', 'not'))),
'find:True': min(timeit.repeat(lambda: find('superstring', 'str'))),
'find:False': min(timeit.repeat(lambda: find('superstring', 'not'))),
'index:True': min(timeit.repeat(lambda: index('superstring', 'str'))),
'index:False': min(timeit.repeat(lambda: index('superstring', 'not'))),
}

And now we see that using in is much faster than the others. Less time to do an equivalent operation is better:

>>> perf_dict
{'in:True': 0.16450627865128808,
 'in:False': 0.1609668098178645,
 '__contains__:True': 0.24355481654697542,
 '__contains__:False': 0.24382793854783813,
 'find:True': 0.3067379407923454,
 'find:False': 0.29860888058124146,
 'index:True': 0.29647137792585454,
 'index:False': 0.5502287584545229}

@coderforlife 2015-06-10 03:35:42

Why should one avoid str.index and str.find? How else would you suggest someone find the index of a substring instead of just whether it exists or not? (or did you mean avoid using them in place of contains - so don't use s.find(ss) != -1 instead of ss in s?)

@Aaron Hall 2015-06-10 03:39:03

Precisely so, although the intent behind the use of those methods may be better addressed by elegant use of the re module. I have not yet found a use for str.index or str.find myself in any code I have written yet.

@cs95 2019-06-05 03:05:31

Please extend your answer to advice against using str.count as well (string.count(something) != 0). shudder

@firelynx 2017-04-28 18:52:16

in Python strings and lists

Here are a few useful examples that speak for themselves concerning the in method:

"foo" in "foobar"
True

"foo" in "Foobar"
False

"foo" in "Foobar".lower()
True

"foo".capitalize() in "Foobar"
True

"foo" in ["bar", "foo", "foobar"]
True

"foo" in ["fo", "o", "foobar"]
False

Caveat. Lists are iterables, and the in method acts on iterables, not just strings.

@CaffeinatedCoder 2017-06-09 18:41:24

Could the list iterable be switched around to look for any of the list in a single string? Ex: ["bar", "foo", "foobar"] in "foof"?

@firelynx 2017-06-09 19:36:09

@CaffeinatedCoder, no, this requires nested iteration. Best done by joining the list with pipes "|".join(["bar","foo", "foobar"]) and compiling a regex out of it, then matching on "foof"

@CaffeinatedCoder 2017-06-09 21:25:04

I figured out early that it could also be done with a generator, which allowed me to avoid regex. Thanks for an alternative though!

@Izaak Weiss 2017-08-28 22:00:22

any([x in "foof" for x in ["bar", "foo", "foobar"]])

@firelynx 2017-08-30 11:29:52

@IzaakWeiss Your one liner works, but it's not very readable, and it does nested iteration. I would advice against doing this

@Piyush S. Wanare 2017-10-13 16:54:22

Can you compare complexity of using regex with in operator?

@firelynx 2017-10-13 17:49:03

@PiyushS.Wanare what do you mean by complexity? The "WTF/min" is a lot higher with regex.

@Piyush S. Wanare 2017-10-13 17:51:35

@firelynx,was asking for the same can you please explain why complexity of regex is higher than in.

@firelynx 2017-10-13 17:57:46

@PiyushS.Wanare The regex syntax is unclear, so it produces more WTF/min. Even if you know it, it is not as clean as using in. Your code should be readable to others, not just to you. Also, using regex may mean you have to sanitise your inputs, which would be an extra step. There's a lot of good reasons not to use regex.

@Ufos 2015-07-17 13:19:36

So apparently there is nothing similar for vector-wise comparison. An obvious Python way to do so would be:

names = ['bob', 'john', 'mike']
any(st in 'bob and john' for st in names) 
>> True

any(st in 'mary and jane' for st in names) 
>> False

@Niriel 2015-08-10 09:50:07

That's because there is a bajillion ways of creating a Product from atomic variables. You can stuff them in a tuple, a list (which are forms of Cartesian Products and come with an implied order), or they can be named properties of a class (no a priori order) or dictionary values, or they can be files in a directory, or whatever. Whenever you can uniquely identify (iter or getitem) something in a 'container' or 'context', you can see that 'container' as a sort of vector and define binary ops on it. en.wikipedia.org/wiki/…

@cs95 2019-06-05 03:06:44

Worth nothing that in should not be used with lists because it does a linear scan of the elements and is slow compared. Use a set instead, especially if membership tests are to be done repeatedly.

@Michael Mrozek 2010-08-09 02:56:21

You can use the in operator:

if "blah" not in somestring: 
    continue

@BallpointBen 2018-08-17 07:02:56

Under the hood, Python will use __contains__(self, item), __iter__(self), and __getitem__(self, key) in that order to determine whether an item lies in a given contains. Implement at least one of those methods to make in available to your custom type.

@Nan Zhong 2018-10-10 22:44:36

Just make sure that somestring won't be None. Otherwise you get a TypeError: argument of type 'NoneType' is not iterable

@Trenton 2018-11-13 21:41:47

FWIW, this is the idiomatic way to accomplish said goal.

@Sam Chats 2018-12-18 20:23:24

For strings, does the Python in operator use the Rabin-Carp algorithm?

@Kaz 2019-02-12 20:24:22

This is inconsistent and ugly in code like ".so." in filename or filename.endswith(".blah").

@Kaz 2019-02-14 17:54:07

^ I meant filename.endswith(".so").

@Christoph Burschka 2019-02-28 15:34:20

@SamChats see stackoverflow.com/questions/18139660/… for the implementation details (in CPython; afaik the language specification does not mandate any particular algorithm here).

@Alex Martelli 2010-08-09 03:19:09

if needle in haystack: is the normal use, as @Michael says -- it relies on the in operator, more readable and faster than a method call.

If you truly need a method instead of an operator (e.g. to do some weird key= for a very peculiar sort...?), that would be 'haystack'.__contains__. But since your example is for use in an if, I guess you don't really mean what you say;-). It's not good form (nor readable, nor efficient) to use special methods directly -- they're meant to be used, instead, through the operators and builtins that delegate to them.

Related Questions

Sponsored Content

40 Answered Questions

[SOLVED] How to merge two dictionaries in a single expression?

38 Answered Questions

[SOLVED] What does the "yield" keyword do?

10 Answered Questions

[SOLVED] Static methods in Python?

3 Answered Questions

22 Answered Questions

[SOLVED] How to check if a string contains a substring in Bash

  • 2008-10-23 12:37:31
  • davidsheldon
  • 1718152 View
  • 2134 Score
  • 22 Answer
  • Tags:   string bash substring

21 Answered Questions

11 Answered Questions

[SOLVED] How to substring a string in Python?

  • 2009-03-19 17:29:41
  • Joan Venge
  • 2526591 View
  • 1875 Score
  • 11 Answer
  • Tags:   python string

36 Answered Questions

[SOLVED] How do I check if a string contains a specific word?

24 Answered Questions

[SOLVED] Case insensitive 'Contains(string)'

7 Answered Questions

[SOLVED] Understanding Python super() with __init__() methods

Sponsored Content