By Kingle


2019-03-14 20:03:29 8 Comments

I have a list A of about 62,000 numbers, and another list B of about 370,000. I would like to filter B so that it contains only elements from A. I tried something like this:

A=[0,3,5,73,88,43,2,1]
B=[0,5,10,42,43,56,83,88,892,1089,3165]
C=[item for item in A if item in set(B)] 

Which works, but is obviously very slow for such large lists because (I think?) the search continues through the entire B, even when the element has already been found in B. So the script is going through a list of 370,000 elements 62,000 times.

The elements in A and B are unique (B contains a list of unique values between 0 and 700,000 and A contains a unique subset of those) so once A[i] is found in B, the search can stop. The values are also in ascending order, if that means anything.

Is there some way to do this more quickly?

2 comments

@Fukiyel 2019-03-14 20:11:30

To be really sure what I've done is the fastest possible, I would have done that :

A=[0,3,5,73,88,43,2,1]
B=[0,5,10,42,43,56,83,88,892,1089,3165]

B_filter = B.copy()
C = []
for item in A:
    if filter in B_filter:
        C.append(item)
        B_filter.pop(0) # B_filter is a list, and it's in ascending order so always the first

If you don't care about losing your B list, you can just use B instead of B_filter and not declare B_filter, so you don't have to copy a 370k large list.

@Patrick Haugh 2019-03-14 20:07:49

This is creating a new set(B) for every item in A. Instead, use the built-in set.intersection:

C = set(A).intersection(B)

@Christian Dean 2019-03-14 20:10:37

Nice! +1. What about set(a) & set(B)? Just seems a bit cleaner.

@Patrick Haugh 2019-03-14 20:11:55

@ChristianDean That requires a third set to be created, set(B). In the above, only set(A) and C are created as new objects, B is just iterated over.

@Christian Dean 2019-03-14 20:12:43

Ah, I missed that. Good point.

@Kingle 2019-03-14 20:17:20

This is extremely fast (and bonus points for simplicity). Exactly what I was looking for. Thank you.

Related Questions

Sponsored Content

29 Answered Questions

[SOLVED] Finding the index of an item given a list containing it in Python

  • 2008-10-07 01:39:38
  • Eugene M
  • 3138888 View
  • 2588 Score
  • 29 Answer
  • Tags:   python list

27 Answered Questions

[SOLVED] Difference between append vs. extend list methods in Python

30 Answered Questions

[SOLVED] How do I concatenate two lists in Python?

6 Answered Questions

[SOLVED] How do I get the number of elements in a list in Python?

  • 2009-11-11 00:30:54
  • y2k
  • 3002992 View
  • 1745 Score
  • 6 Answer
  • Tags:   python list

20 Answered Questions

[SOLVED] How to clone or copy a list?

35 Answered Questions

[SOLVED] How do I check if a list is empty?

  • 2008-09-10 06:20:11
  • Ray Vega
  • 2100254 View
  • 3140 Score
  • 35 Answer
  • Tags:   python list

25 Answered Questions

[SOLVED] How do I list all files of a directory?

  • 2010-07-08 19:31:22
  • duhhunjonn
  • 3027782 View
  • 3319 Score
  • 25 Answer
  • Tags:   python directory

39 Answered Questions

[SOLVED] How to make a flat list out of list of lists?

19 Answered Questions

[SOLVED] How do I remove an element from a list by index in Python?

  • 2009-03-09 18:16:11
  • Joan Venge
  • 2123003 View
  • 1195 Score
  • 19 Answer
  • Tags:   python list

12 Answered Questions

[SOLVED] Getting the last element of a list in Python

  • 2009-05-30 19:28:53
  • Janusz
  • 1539923 View
  • 1618 Score
  • 12 Answer
  • Tags:   python list indexing

Sponsored Content