By Agos


2010-06-10 10:14:00 8 Comments

I happened to find myself having a basic filtering need: I have a list and I have to filter it by an attribute of the items.

My code looked like this:

my_list = [x for x in my_list if x.attribute == value]

But then I thought, wouldn't it be better to write it like this?

my_list = filter(lambda x: x.attribute == value, my_list)

It's more readable, and if needed for performance the lambda could be taken out to gain something.

Question is: are there any caveats in using the second way? Any performance difference? Am I missing the Pythonic Way™ entirely and should do it in yet another way (such as using itemgetter instead of the lambda)?

14 comments

@Duncan 2010-06-10 10:52:49

It is strange how much beauty varies for different people. I find the list comprehension much clearer than filter+lambda, but use whichever you find easier.

There are two things that may slow down your use of filter.

The first is the function call overhead: as soon as you use a Python function (whether created by def or lambda) it is likely that filter will be slower than the list comprehension. It almost certainly is not enough to matter, and you shouldn't think much about performance until you've timed your code and found it to be a bottleneck, but the difference will be there.

The other overhead that might apply is that the lambda is being forced to access a scoped variable (value). That is slower than accessing a local variable and in Python 2.x the list comprehension only accesses local variables. If you are using Python 3.x the list comprehension runs in a separate function so it will also be accessing value through a closure and this difference won't apply.

The other option to consider is to use a generator instead of a list comprehension:

def filterbyvalue(seq, value):
   for el in seq:
       if el.attribute==value: yield el

Then in your main code (which is where readability really matters) you've replaced both list comprehension and filter with a hopefully meaningful function name.

@Wayne Werner 2010-06-10 13:03:39

+1 for the generator. I have a link at home to a presentation that shows how amazing generators can be. You can also replace the list comprehension with a generator expression just by changing [] to (). Also, I agree that the list comp is more beautiful.

@skqr 2015-06-15 17:47:36

Actually, no - filter is faster. Just run a couple of quick benchmarks using something like stackoverflow.com/questions/5998245/…

@Duncan 2015-06-17 10:32:30

@skqr better to just use timeit for benchmarks, but please give an example where you find filter to be faster using a Python callback function.

@Alf47 2015-08-06 20:53:35

I have a question about this generator - according to this link python.net/~goodger/projects/pycon/2007/idiomatic/handout.ht‌​ml the comprehension mentioned in the question performs much better and is "idiomatic". So shouldn't the generator not be the preferred method? (I'm new at Python)

@Duncan 2015-08-07 08:10:31

@Alf47 nothing when programming (Python or any other language) is absolute. The link you reference points out the list comprehension is best when kept simple and if too complex you should use an explicit for loop. A full-blown generator allows you to abstract some complex looping conditions and keep the loop construct separate from the body that processes whatever it yields. At the level of the example given the list comprehension is just fine, but the generator is a useful tool to know about for when the list comprehension would be over complicated.

@tnq177 2016-06-08 12:46:40

@WayneWerner do you mind share the presentation please?

@Wayne Werner 2016-06-08 13:03:26

@tnq177 It's David Beasley's presentation on generators - dabeaz.com/generators

@Victor Schröder 2019-01-17 08:45:03

"...which is where readability really matters...". Sorry, but readability always matters, even in the (rare) cases when you -- crying -- have to give up of it.

@Duncan 2019-01-17 09:15:04

@VictorSchröder yes, perhaps I was unclear. What I was trying to say was that in the main code you need to be able to see the bigger picture. In the little helper function you only need to care about that one function, what else is going on outside can be ignored.

@Rod Senra 2018-10-03 19:13:25

Curiously on Python 3, I see filter performing faster than list comprehensions.

I always thought that the list comprehensions would be more performant. Something like: [name for name in brand_names_db if name is not None] The bytecode generated is a bit better.

>>> def f1(seq):
...     return list(filter(None, seq))
>>> def f2(seq):
...     return [i for i in seq if i is not None]
>>> disassemble(f1.__code__)
2         0 LOAD_GLOBAL              0 (list)
          2 LOAD_GLOBAL              1 (filter)
          4 LOAD_CONST               0 (None)
          6 LOAD_FAST                0 (seq)
          8 CALL_FUNCTION            2
         10 CALL_FUNCTION            1
         12 RETURN_VALUE
>>> disassemble(f2.__code__)
2           0 LOAD_CONST               1 (<code object <listcomp> at 0x10cfcaa50, file "<stdin>", line 2>)
          2 LOAD_CONST               2 ('f2.<locals>.<listcomp>')
          4 MAKE_FUNCTION            0
          6 LOAD_FAST                0 (seq)
          8 GET_ITER
         10 CALL_FUNCTION            1
         12 RETURN_VALUE

But they are actually slower:

   >>> timeit(stmt="f1(range(1000))", setup="from __main__ import f1,f2")
   21.177661532000116
   >>> timeit(stmt="f2(range(1000))", setup="from __main__ import f1,f2")
   42.233950221000214

@Victor Schröder 2019-01-17 09:27:42

Invalid comparison. First, you are not passing a lambda function to the filter version, which makes it default to the identity function. When defining if not None in the list comprehension you are defining a lambda function (notice the MAKE_FUNCTION statement). Second, the results are different, as the list comprehension version will remove only None value, whereas the filter version will remove all "falsy" values. Having that said, the whole purpose of microbenchmarking is useless. Those are one million iterations, times 1k items! The difference is negligible.

@C.W.praen 2018-02-28 21:16:58

In addition to the accepted answer, there is a corner case when you should use filter instead of a list comprehension. If the list is unhashable you cannot directly process it with a list comprehension. A real world example is if you use pyodbc to read results from a database. The fetchAll() results from cursor is an unhashable list. In this situation, to directly manipulating on the returned results, filter should be used:

cursor.execute("SELECT * FROM TABLE1;")
data_from_db = cursor.fetchall()
processed_data = filter(lambda s: 'abc' in s.field1 or s.StartTime >= start_date_time, data_from_db) 

If you use list comprehension here you will get the error:

TypeError: unhashable type: 'list'

@user1767754 2017-11-28 00:27:01

It took me some time to get familiarized with the higher order functions filter and map. So i got used to them and i actually liked filter as it was explicit that it filters by keeping whatever is truthy and I've felt cool that I knew some functional programming terms.

Then I read this passage (Fluent Python Book):

The map and filter functions are still builtins in Python 3, but since the introduction of list comprehensions and generator ex‐ pressions, they are not as important. A listcomp or a genexp does the job of map and filter combined, but is more readable.

And now I think, why bother with the concept of filter / map if you can achieve it with already widely spread idioms like list comprehensions. Furthermore maps and filters are kind of functions. In this case I prefer using Anonymous functions lambdas.

Finally, just for the sake of having it tested, I've timed both methods (map and listComp) and I didn't see any relevant speed difference that would justify making arguments about it.

from timeit import Timer

timeMap = Timer(lambda: list(map(lambda x: x*x, range(10**7))))
print(timeMap.timeit(number=100))

timeListComp = Timer(lambda:[(lambda x: x*x) for x in range(10**7)])
print(timeListComp.timeit(number=100))

#Map:                 166.95695265199174
#List Comprehension   177.97208347299602

@Jim50 2016-09-06 06:26:35

I thought I'd just add that in python 3, filter() is actually an iterator object, so you'd have to pass your filter method call to list() in order to build the filtered list. So in python 2:

lst_a = range(25) #arbitrary list
lst_b = [num for num in lst_a if num % 2 == 0]
lst_c = filter(lambda num: num % 2 == 0, lst_a)

lists b and c have the same values, and were completed in about the same time as filter() was equivalent [x for x in y if z]. However, in 3, this same code would leave list c containing a filter object, not a filtered list. To produce the same values in 3:

lst_a = range(25) #arbitrary list
lst_b = [num for num in lst_a if num % 2 == 0]
lst_c = list(filter(lambda num: num %2 == 0, lst_a))

The problem is that list() takes an iterable as it's argument, and creates a new list from that argument. The result is that using filter in this way in python 3 takes up to twice as long as the [x for x in y if z] method because you have to iterate over the output from filter() as well as the original list.

@I. J. Kennedy 2014-11-13 20:00:35

Since any speed difference is bound to be miniscule, whether to use filters or list comprehensions comes down to a matter of taste. In general I'm inclined to use comprehensions (which seems to agree with most other answers here), but there is one case where I prefer filter.

A very frequent use case is pulling out the values of some iterable X subject to a predicate P(x):

[x for x in X if P(x)]

but sometimes you want to apply some function to the values first:

[f(x) for x in X if P(f(x))]


As a specific example, consider

primes_cubed = [x*x*x for x in range(1000) if prime(x)]

I think this looks slightly better than using filter. But now consider

prime_cubes = [x*x*x for x in range(1000) if prime(x*x*x)]

In this case we want to filter against the post-computed value. Besides the issue of computing the cube twice (imagine a more expensive calculation), there is the issue of writing the expression twice, violating the DRY aesthetic. In this case I'd be apt to use

prime_cubes = filter(prime, [x*x*x for x in range(1000)])

@viki.omega9 2015-03-12 02:22:55

Would you not consider using the prime via another list comprehension? Such as [prime(i) for i in [x**3 for x in range(1000)]]

@Anton 2015-05-19 11:14:15

How would one go about getting [f(x) for x in X if f(x)] with one call to f(x) per element? If f(x) can return None, I would like to filter out those values in the first pass.

@Zelphir Kaltstahl 2015-09-16 12:06:18

x*x*x cannot be a prime number, as it has x^2 and x as a factor, the example doesn't really make sense in a mathematical way, but maybe it's still helpul. (Maybe we could find something better though?)

@Mateen Ulhaq 2016-08-27 08:13:05

Note that we may use a generator expression instead for the last example if we don't want to eat up memory: prime_cubes = filter(prime, (x*x*x for x in range(1000)))

@Dennis Krupenik 2018-03-12 10:21:37

@MateenUlhaq this can be optimized to prime_cubes = [1] to save both memory and cpu cycles ;-)

@Mateen Ulhaq 2018-03-12 15:11:19

@DennisKrupenik Or rather, []

@Dennis Krupenik 2018-03-13 16:09:20

@MateenUlhaq indeed

@François Leblanc 2018-06-14 15:06:00

To look at it from another angle, this can also be written as [x for x in map(f, X) if P(x)] and we only apply f() once. Changing the square brackets to parentheses makes it a generator comprehension.

@GeeTransit 2019-10-04 22:14:41

Now that Python 3.8 is almost out, you can store the result of prime(x*x*x) in an assignment expression (walrus boi). Here: prime_cubes = [x_cubed for x in range(1000) if prime(x_cubed := x*x*x)]

@rharder 2015-08-28 19:31:41

Here's a short piece I use when I need to filter on something after the list comprehension. Just a combination of filter, lambda, and lists (otherwise known as the loyalty of a cat and the cleanliness of a dog).

In this case I'm reading a file, stripping out blank lines, commented out lines, and anything after a comment on a line:

# Throw out blank lines and comments
with open('file.txt', 'r') as lines:        
    # From the inside out:
    #    [s.partition('#')[0].strip() for s in lines]... Throws out comments
    #   filter(lambda x: x!= '', [s.part... Filters out blank lines
    #  y for y in filter... Converts filter object to list
    file_contents = [y for y in filter(lambda x: x != '', [s.partition('#')[0].strip() for s in lines])]

@Zelphir Kaltstahl 2015-09-16 11:50:08

This achieves a lot in very little code indeed. I think it might be a bit too much logic in one line to easily understand and readability is what counts though.

@Steve Jessop 2016-04-26 10:59:45

You could write this as file_contents = list(filter(None, (s.partition('#')[0].strip() for s in lines)))

@Adeynack 2014-10-15 23:50:25

An important difference is that list comprehension will return a list while the filter returns a filter, which you cannot manipulate like a list (ie: call len on it, which does not work with the return of filter).

My own self-learning brought me to some similar issue.

That being said, if there is a way to have the resulting list from a filter, a bit like you would do in .NET when you do lst.Where(i => i.something()).ToList(), I am curious to know it.

EDIT: This is the case for Python 3, not 2 (see discussion in comments).

@thiruvenkadam 2015-01-29 07:33:54

filter returns a list and we can use len on it. At least in my Python 2.7.6.

@Adeynack 2015-01-29 17:33:39

It is not the case in Python 3. a = [1, 2, 3, 4, 5, 6, 7, 8] f = filter(lambda x: x % 2 == 0, a) lc = [i for i in a if i % 2 == 0] >>> type(f) <class 'filter'> >>> type(lc) <class 'list'>

@Steve Jessop 2016-04-26 10:54:29

"if there is a way to have the resulting list ... I am curious to know it". Just call list() on the result: list(filter(my_func, my_iterable)). And of course you could replace list with set, or tuple, or anything else that takes an iterable. But to anyone other than functional programmers, the case is even stronger to use a list comprehension rather than filter plus explicit conversion to list.

@thiruvenkadam 2015-01-29 07:32:12

Filter is just that. It filters out the elements of a list. You can see the definition mentions the same(in the official docs link I mentioned before). Whereas, list comprehension is something that produces a new list after acting upon something on the previous list.(Both filter and list comprehension creates new list and not perform operation in place of the older list. A new list here is something like a list with, say, an entirely new data type. Like converting integers to string ,etc)

In your example, it is better to use filter than list comprehension, as per the definition. However, if you want, say other_attribute from the list elements, in your example is to be retrieved as a new list, then you can use list comprehension.

return [item.other_attribute for item in my_list if item.attribute==value]

This is how I actually remember about filter and list comprehension. Remove a few things within a list and keep the other elements intact, use filter. Use some logic on your own at the elements and create a watered down list suitable for some purpose, use list comprehension.

@thiruvenkadam 2015-01-29 07:41:05

I will be happy to know the reason for down voting so that I will not repeat it again anywhere in the future.

@Agos 2015-02-02 11:14:53

the definition of filter and list comprehension were not necessary, as their meaning was not being debated. That a list comprehension should be used only for “new” lists is presented but not argued for.

@thiruvenkadam 2015-02-02 14:02:17

I used the definition to say that filter gives you list with same elements which are true for a case but with list comprehension we can modify the elements themselves, like converting int to str. But point taken :-)

@tim 2013-08-21 22:54:53

My take

def filter_list(list, key, value, limit=None):
    return [i for i in list if i[key] == value][:limit]

@user707650 2014-01-09 11:16:39

i was never said to be a dict, and there isn't a need for limit. Other than that, how is this different than what the OP suggested, and how does it answer the question?

@Tendayi Mawushe 2010-06-10 10:58:17

This is a somewhat religious issue in Python. Even though Guido considered removing map, filter and reduce from Python 3, there was enough of a backlash that in the end only reduce was moved from built-ins to functools.reduce.

Personally I find list comprehensions easier to read. It is more explicit what is happening from the expression [i for i in list if i.attribute == value] as all the behaviour is on the surface not inside the filter function.

I would not worry too much about the performance difference between the two approaches as it is marginal. I would really only optimise this if it proved to be the bottleneck in your application which is unlikely.

Also since the BDFL wanted filter gone from the language then surely that automatically makes list comprehensions more Pythonic ;-)

@dashesy 2013-06-12 01:17:46

Thanks for the links to Guido's input, if nothing else for me it means I will try not to use them any more, so that I won't get the habit, and I won't become supportive of that religion :)

@njzk2 2014-05-30 20:22:58

but reduce is the most complex to do with simple tools! map and filter are trivial to replace with comprehensions!

@Tagar 2015-06-28 16:10:15

didn't know reduce was demoted in Python3. thanks for the insight! reduce() is still quite helpful in distributed computing, like PySpark. I think that was a mistake..

@icc97 2017-10-11 11:58:05

@Tagar you can still use reduce you just have to import it from functools

@Umang 2010-06-10 10:22:36

Although filter may be the "faster way", the "Pythonic way" would be not to care about such things unless performance is absolutely critical (in which case you wouldn't be using Python!).

@bli 2017-01-23 16:44:02

Late comment to an often-seen argument: Sometimes it makes a difference to have an analysis run in 5 hours instead of 10, and if that can be achieved by taking one hour optimizing python code, it can be worth it (especially if one is comfortable with python and not with faster languages).

@unbeli 2010-06-10 10:19:27

I find the second way more readable. It tells you exactly what the intention is: filter the list.
PS: do not use 'list' as a variable name

@John La Rooy 2010-06-10 10:17:47

generally filter is slightly faster if using a builtin function.

I would expect the list comprehension to be slightly faster in your case

@giaosudau 2014-11-27 16:27:36

python -m timeit 'filter(lambda x: x in [1,2,3,4,5], range(10000000))' 10 loops, best of 3: 1.44 sec per loop python -m timeit '[x for x in range(10000000) if x in [1,2,3,4,5]]' 10 loops, best of 3: 860 msec per loop Not really?!

@John La Rooy 2014-11-27 21:08:36

@sepdau, lambda functions are not builtins. List comprehensions have improved over the past 4 years - now the difference is negligible anyway even with builtin functions

Related Questions

Sponsored Content

45 Answered Questions

[SOLVED] How to make a flat list out of list of lists?

26 Answered Questions

[SOLVED] How do I concatenate two lists in Python?

21 Answered Questions

[SOLVED] How do I list all files of a directory?

  • 2010-07-08 19:31:22
  • duhhunjonn
  • 3632711 View
  • 3474 Score
  • 21 Answer
  • Tags:   python directory

60 Answered Questions

[SOLVED] How do you split a list into evenly sized chunks?

20 Answered Questions

28 Answered Questions

[SOLVED] How do I check if a list is empty?

  • 2008-09-10 06:20:11
  • Ray Vega
  • 2469963 View
  • 3235 Score
  • 28 Answer
  • Tags:   python list

26 Answered Questions

[SOLVED] Why not inherit from List&lt;T&gt;?

15 Answered Questions

[SOLVED] How to clone or copy a list?

28 Answered Questions

[SOLVED] Finding the index of an item given a list containing it in Python

  • 2008-10-07 01:39:38
  • Eugene M
  • 3456008 View
  • 2844 Score
  • 28 Answer
  • Tags:   python list indexing

7 Answered Questions

[SOLVED] How do I get the number of elements in a list?

  • 2009-11-11 00:30:54
  • y2k
  • 3137358 View
  • 1826 Score
  • 7 Answer
  • Tags:   python list

Sponsored Content