By Josh Glover


2009-01-26 15:43:58 8 Comments

Is there a built-in that removes duplicates from list in Python, whilst preserving order? I know that I can use a set to remove duplicates, but that destroys the original order. I also know that I can roll my own like this:

def uniq(input):
  output = []
  for x in input:
    if x not in output:
      output.append(x)
  return output

(Thanks to unwind for that code sample.)

But I'd like to avail myself of a built-in or a more Pythonic idiom if possible.

Related question: In Python, what is the fastest algorithm for removing duplicates from a list so that all elements are unique while preserving order?

24 comments

@Markus Jarderot 2009-01-26 15:47:01

Here you have some alternatives: http://www.peterbe.com/plog/uniqifiers-benchmark

Fastest one:

def f7(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]

Why assign seen.add to seen_add instead of just calling seen.add? Python is a dynamic language, and resolving seen.add each iteration is more costly than resolving a local variable. seen.add could have changed between iterations, and the runtime isn't smart enough to rule that out. To play it safe, it has to check the object each time.

If you plan on using this function a lot on the same dataset, perhaps you would be better off with an ordered set: http://code.activestate.com/recipes/528878/

O(1) insertion, deletion and member-check per operation.

(Small additional note: seen.add() always returns None, so the or above is there only as a way to attempt a set update, and not as an integral part of the logical test.)

@Jesse Dhillon 2013-03-22 17:01:56

Is seen.add really resolved each iteration? Wouldn't it be resolved once when the list comprehension is parsed and transformed?

@Markus Jarderot 2013-03-22 17:24:40

@JesseDhillon seen.add could have changed between iterations, and the runtime isn't smart enough to rule that out. To play safe, it has to check the object each time. -- If you look at the bytecode with dis.dis(f), you can see that it executes LOAD_ATTR for the add member on each iteration. ideone.com/tz1Tll

@osa 2013-12-17 22:08:58

Technically, O(1) for insertions and lookups is impossible, if you are careful with definitions. If most elements are distinct, hashing is best analyzed as O(log N), not O(1), much like counting the digits in a 64-bit number takes 64 operations.

@Markus Jarderot 2013-12-18 00:52:52

@SergeyOrshanskiy Almost O(1).

@Jens Timmerman 2014-03-11 14:28:39

When I try this on a list of lists I get: TypeError: unhashable type: 'list'

@Markus Jarderot 2014-03-11 16:22:30

@JensTimmerman Yes. Lists in Python are not hashable, and thus can't be used for fast look-up. You can either convert the sub-lists to a hashable type (tuple?), or swap set() and seen.add for list() and seen.append.

@Jens Timmerman 2014-03-12 17:07:07

But then you would loose all the speed of this implementation...

@user136036 2014-10-24 16:39:17

Your solution is not the fastest one. In Python 3 (did not test 2) this is faster (300k entries list - 0.045s (yours) vs 0.035s (this one): seen = set(); return [x for x in lines if x not in seen and not seen.add(x)]. I could not find any speed effect of the seen_add line you did.

@Willem Van Onsem 2015-05-19 17:42:15

A small comment: the insertion/deletion/member methods on sets in Python are O(n) worst case and O(1) average case: they use a hashset with linked lists as is explained here.

@jamylak 2015-05-20 06:22:36

@user136036 Please link to your tests. How many times did you run them?seen_add is an improvement but timings can be affected by system resources at the time. Would be interested to see full timings

@stevenjackson121 2016-10-01 02:11:05

Did you test this solution across Python implementations, or only on CPython? I would be surprised if the seen_add line helped much given PyPy's JIT for example.

@Dave 2016-10-04 15:47:16

Even on standard CPython (3.5) I see no timing difference. Timeit gives 3.048, 2.898, 2.852 without seen_add, and 2.960, 3.008, 2.959 with. (All tests run 1000 times on the same list of 50,000 random integers 0-9). Something seems fishy about the explanation - seen_add should execute the same code as seen.add; both are shown as <built-in method add of set object at 0x01234567>.

@Markus Jarderot 2016-10-04 16:43:03

It is a micro-optimization. The disassembled code changes from LOAD_DEREF 0 (seen) + LOAD_ATTR 0 (add) to just LOAD_DEREF 1 (seen_add). Optimizing compilers might do something similar.

@prof.phython 2017-12-11 09:17:02

I am not able to understand seen_add(x). wouldn't it return None every time? Please explain.

@Markus Jarderot 2017-12-11 13:30:55

@prof.phython Yes. It is just there because if not() is a convenient way to execute the function inline. x in seen or seen_add(x) evaluates to either False or None, both of which turn into True when prefixed by not.

@prof.phython 2017-12-11 15:23:23

@MarkusJarderot Okay. But why are we doing (x in seen or seen_add(x))'. wouldn't only (x in seen)` would work? What does seen_add(x) do in this statement? Thanks

@Markus Jarderot 2017-12-11 17:00:04

@prof.phython seen_add is a reference to seen.add(), which adds the item the set while the list is being processed. The next time an identical item is found, it will be skipped.

@prof.phython 2017-12-11 17:38:58

@MarkusJarderot Yes i can understand that. My only confusion is that why return [x for x in seq if not (x in seen)] cannot work. Why do we need seen.add() to check?

@Markus Jarderot 2017-12-12 11:48:23

@prof.phython It is not there to affect the result. It is there to update the set if the check fails (x in seen).

@prof.phython 2017-12-12 13:02:53

Got it!! Thanks

@Alexander 2017-08-18 00:35:08

Not to kick a dead horse (this question is very old and already has lots of good answers), but here is a solution using pandas that is quite fast in many circumstances and is dead simple to use.

import pandas as pd

my_list = [0, 1, 2, 3, 4, 1, 2, 3, 5]

>>> pd.Series(my_list).drop_duplicates().tolist()
# Output:
# [0, 1, 2, 3, 4, 5]

@Pedro Lobito 2019-01-16 23:06:41

Neat solution +1

@timgeb 2018-03-02 08:23:53

In Python 3.7 and above, dictionaries are guaranteed to remember their key insertion order. The answer to this question summarizes the current state of affairs.

The OrderedDict solution thus becomes obsolete and without any import statements we can simply issue:

>>> lst = [1, 2, 1, 3, 3, 2, 4]
>>> list(dict.fromkeys(lst))
[1, 2, 3, 4]

@MSeifert 2018-03-13 17:41:38

Note that raymonds answer already mentions that approach.

@gboffi 2018-11-04 17:30:53

An in-place method

This method is quadratic, because we have a linear lookup into the list for every element of the list (to that we have to add the cost of rearranging the list because of the del s).

That said, it is possible to operate in place if we start from the end of the list and proceed toward the origin removing each term that is present in the sub-list at its left

This idea in code is simply

for i in range(len(l)-1,0,-1): 
    if l[i] in l[:i]: del l[i] 

A simple test of the implementation

In [91]: from random import randint, seed                                                                                            
In [92]: seed('20080808') ; l = [randint(1,6) for _ in range(12)] # Beijing Olympics                                                                 
In [93]: for i in range(len(l)-1,0,-1): 
    ...:     print(l) 
    ...:     print(i, l[i], l[:i], end='') 
    ...:     if l[i] in l[:i]: 
    ...:          print( ': remove', l[i]) 
    ...:          del l[i] 
    ...:     else: 
    ...:          print() 
    ...: print(l)
[6, 5, 1, 4, 6, 1, 6, 2, 2, 4, 5, 2]
11 2 [6, 5, 1, 4, 6, 1, 6, 2, 2, 4, 5]: remove 2
[6, 5, 1, 4, 6, 1, 6, 2, 2, 4, 5]
10 5 [6, 5, 1, 4, 6, 1, 6, 2, 2, 4]: remove 5
[6, 5, 1, 4, 6, 1, 6, 2, 2, 4]
9 4 [6, 5, 1, 4, 6, 1, 6, 2, 2]: remove 4
[6, 5, 1, 4, 6, 1, 6, 2, 2]
8 2 [6, 5, 1, 4, 6, 1, 6, 2]: remove 2
[6, 5, 1, 4, 6, 1, 6, 2]
7 2 [6, 5, 1, 4, 6, 1, 6]
[6, 5, 1, 4, 6, 1, 6, 2]
6 6 [6, 5, 1, 4, 6, 1]: remove 6
[6, 5, 1, 4, 6, 1, 2]
5 1 [6, 5, 1, 4, 6]: remove 1
[6, 5, 1, 4, 6, 2]
4 6 [6, 5, 1, 4]: remove 6
[6, 5, 1, 4, 2]
3 4 [6, 5, 1]
[6, 5, 1, 4, 2]
2 1 [6, 5]
[6, 5, 1, 4, 2]
1 5 [6]
[6, 5, 1, 4, 2]

In [94]:                                                                                                                             

@gboffi 2018-11-04 17:44:01

Before posting I have searched the body of the answers for 'place' to no avail. If others have solved the problem in a similar way please alert me and I'll remove my answer asap.

@jamylak 2013-06-10 02:47:13

Edit 2016

As Raymond pointed out, in python 3.5+ where OrderedDict is implemented in C, the list comprehension approach will be slower than OrderedDict (unless you actually need the list at the end - and even then, only if the input is very short). So the best solution for 3.5+ is OrderedDict.

Important Edit 2015

As @abarnert notes, the more_itertools library (pip install more_itertools) contains a unique_everseen function that is built to solve this problem without any unreadable (not seen.add) mutations in list comprehensions. This is also the fastest solution too:

>>> from  more_itertools import unique_everseen
>>> items = [1, 2, 0, 1, 3, 2]
>>> list(unique_everseen(items))
[1, 2, 0, 3]

Just one simple library import and no hacks. This comes from an implementation of the itertools recipe unique_everseen which looks like:

def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in filterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

In Python 2.7+ the accepted common idiom (which works but isn't optimized for speed, I would now use unique_everseen) for this uses collections.OrderedDict:

Runtime: O(N)

>>> from collections import OrderedDict
>>> items = [1, 2, 0, 1, 3, 2]
>>> list(OrderedDict.fromkeys(items))
[1, 2, 0, 3]

This looks much nicer than:

seen = set()
[x for x in seq if x not in seen and not seen.add(x)]

and doesn't utilize the ugly hack:

not seen.add(x)

which relies on the fact that set.add is an in-place method that always returns None so not None evaluates to True.

Note however that the hack solution is faster in raw speed though it has the same runtime complexity O(N).

@Nakilon 2013-06-14 13:40:19

Converting to some custom kind of dict just to take keys? Just another crutch.

@jamylak 2013-06-14 13:49:24

@Nakilon Are you saying it's not clear? Or are you expecting Python to be more pure?

@Imran 2013-06-18 06:58:41

@Nakilon I don't really see how it's a crutch. It doesn't expose any mutable state, so its very clean in that sense. Internally, Python sets are implemented with dict() (stackoverflow.com/questions/3949310/…), so basically you're just doing what the interpreter would've done anyway.

@ely 2013-09-04 19:42:36

Just use side effects and do [seen.add(x) for x in seq if x not in seen], or if you don't like comprehension side effects just use a for loop: for x in seq: seen.add(x) if x not in seen else None (still a one-liner, although in this case I think one-liner-ness is a silly property to try to have in a solution.

@flornquake 2013-09-10 00:59:09

@EMS That doesn't preserve order. You could just as well do seen = set(seq).

@ely 2013-09-10 13:57:02

Sure, when seen is a set. Just use seen=[] and [seen.append(x) for x in seq if x not in seen] does preserve order.

@wim 2013-10-18 00:58:20

+1 This is a good and pythonic way, and is vetted by a python core developer here.

@Santi 2014-08-13 07:55:49

Hi, how do I remove duplicates from a table, i.e. if one of the elements in the first column is duplicate, then remove the whole row?

@user136036 2014-10-24 16:23:30

This solution is extremly slower than the mentioned "hack". For my list of 300k entries over 50x slower.

@jamylak 2014-12-08 09:37:30

I agree its slower, for speed I would definitely use the hack

@Willem Van Onsem 2015-05-19 20:00:43

The run-time complexity is strictly speaking O(n^2) worst-case and O(n) average case...

@jamylak 2015-05-20 05:22:53

@CommuSoft I agree, although practically it's almost always O(n) due to the super highly unlikely worst case

@MichaelGofron 2016-01-13 17:55:22

I'm getting a no module named 'more_itertools' found. Is the module deprecated?

@jamylak 2016-01-14 01:12:52

@MichaelGofron It's an external library so you have to install it. eg. pip install more_itertools

@MSeifert 2017-01-10 19:55:48

Just to add another (very performant) implementation of such a functionality from an external module1: iteration_utilities.unique_everseen:

>>> from iteration_utilities import unique_everseen
>>> lst = [1,1,1,2,3,2,2,2,1,3,4]

>>> list(unique_everseen(lst))
[1, 2, 3, 4]

Timings

I did some timings (Python 3.6) and these show that it's faster than all other alternatives I tested, including OrderedDict.fromkeys, f7 and more_itertools.unique_everseen:

%matplotlib notebook

from iteration_utilities import unique_everseen
from collections import OrderedDict
from more_itertools import unique_everseen as mi_unique_everseen

def f7(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]

def iteration_utilities_unique_everseen(seq):
    return list(unique_everseen(seq))

def more_itertools_unique_everseen(seq):
    return list(mi_unique_everseen(seq))

def odict(seq):
    return list(OrderedDict.fromkeys(seq))

from simple_benchmark import benchmark

b = benchmark([f7, iteration_utilities_unique_everseen, more_itertools_unique_everseen, odict],
              {2**i: list(range(2**i)) for i in range(1, 20)},
              'list size (no duplicates)')
b.plot()

enter image description here

And just to make sure I also did a test with more duplicates just to check if it makes a difference:

import random

b = benchmark([f7, iteration_utilities_unique_everseen, more_itertools_unique_everseen, odict],
              {2**i: [random.randint(0, 2**(i-1)) for _ in range(2**i)] for i in range(1, 20)},
              'list size (lots of duplicates)')
b.plot()

enter image description here

And one containing only one value:

b = benchmark([f7, iteration_utilities_unique_everseen, more_itertools_unique_everseen, odict],
              {2**i: [1]*(2**i) for i in range(1, 20)},
              'list size (only duplicates)')
b.plot()

enter image description here

In all of these cases the iteration_utilities.unique_everseen function is the fastest (on my computer).


This iteration_utilities.unique_everseen function can also handle unhashable values in the input (however with an O(n*n) performance instead of the O(n) performance when the values are hashable).

>>> lst = [{1}, {1}, {2}, {1}, {3}]

>>> list(unique_everseen(lst))
[{1}, {2}, {3}]

1 Disclaimer: I'm the author of that package.

@Raymond Hettinger 2016-10-03 15:47:33

In Python 2.7, the new way of removing duplicates from an iterable while keeping it in the original order is:

>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']

In Python 3.5, the OrderedDict has a C implementation. My timings show that this is now both the fastest and shortest of the various approaches for Python 3.5.

In Python 3.6, the regular dict became both ordered and compact. (This feature is holds for CPython and PyPy but may not present in other implementations). That gives us a new fastest way of deduping while retaining order:

>>> list(dict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']

In Python 3.7, the regular dict is guaranteed to both ordered across all implementations. So, the shortest and fastest solution is:

>>> list(dict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']

Response to @max: Once you move to 3.6 or 3.7 and use the regular dict instead of OrderedDict, you can't really beat the performance in any other way. The dictionary is dense and readily converts to a list with almost no overhead. The target list is pre-sized to len(d) which saves all the resizes that occur in a list comprehension. Also, since the internal key list is dense, copying the pointers is about almost fast as a list copy.

@max 2016-12-14 20:20:50

It is faster than any other approach on my machine (python 3.5) as long as I don't convert OrderedDict to a list in the end. If I do need to convert it to a list, for small inputs the list comprehension approach is still faster by up to 1.5 times. That said, this solution is much cleaner.

@Mr_and_Mrs_D 2018-05-31 12:37:12

The only gotcha is that the iterable "elements" must be hashable - would be nice to have the equivalent for iterables with arbitrary elements (as a list of lists)

@Arthur 2019-01-09 23:01:06

Insertion order iteration over a dict provides functionality that services more use-cases than removing duplicates. For example, scientific analyses rely on reproducible computations which non-deterministic dict iteration doesn't support. Reproducibility is a major current objective in computational scientific modeling, so we welcome this new feature. Although I know it's trivial to build with a deterministic dict, a high-performance, deterministic set() would help more naive users develop reproducible codes.

@Zhifeng Hu 2012-11-07 03:21:34

You can reference a list comprehension as it is being built by the symbol '_[1]'.
For example, the following function unique-ifies a list of elements without changing their order by referencing its list comprehension.

def unique(my_list): 
    return [x for x in my_list if x not in locals()['_[1]']]

Demo:

l1 = [1, 2, 3, 4, 1, 2, 3, 4, 5]
l2 = [x for x in l1 if x not in locals()['_[1]']]
print l2

Output:

[1, 2, 3, 4, 5]

@d33tah 2013-03-17 11:29:24

Please note it won't work in Python 2.7+.

@ely 2013-09-04 19:45:38

Also note that it would make it an O(n^2) operation, where as creating a set/dict (which has constant look up time) and adding only previously unseen elements will be linear.

@jamylak 2015-06-07 15:15:36

This is Python 2.6 only I believe. And yes it is O(N^2)

@Rob Murray 2016-01-27 13:08:08

A solution without using imported modules or sets:

text = "ask not what your country can do for you ask what you can do for your country"
sentence = text.split(" ")
noduplicates = [(sentence[i]) for i in range (0,len(sentence)) if sentence[i] not in sentence[:i]]
print(noduplicates)

Gives output:

['ask', 'not', 'what', 'your', 'country', 'can', 'do', 'for', 'you']

@Jean-François Fabre 2018-11-20 21:17:03

this is O(N**2) complexity + list slicing each time.

@Soham Joshi 2016-01-12 17:16:19

this will preserve order and run in O(n) time. basically the idea is to create a hole wherever there is a duplicate found and sink it down to the bottom. makes use of a read and write pointer. whenever a duplicate is found only the read pointer advances and write pointer stays on the duplicate entry to overwrite it.

def deduplicate(l):
    count = {}
    (read,write) = (0,0)
    while read < len(l):
        if l[read] in count:
            read += 1
            continue
        count[l[read]] = True
        l[write] = l[read]
        read += 1
        write += 1
    return l[0:write]

@Lei 2015-07-18 00:10:51

If you routinely use pandas, and aesthetics is preferred over performance, then consider the built-in function pandas.Series.drop_duplicates:

    import pandas as pd
    import numpy as np

    uniquifier = lambda alist: pd.Series(alist).drop_duplicates().tolist()

    # from the chosen answer 
    def f7(seq):
        seen = set()
        seen_add = seen.add
        return [ x for x in seq if not (x in seen or seen_add(x))]

    alist = np.random.randint(low=0, high=1000, size=10000).tolist()

    print uniquifier(alist) == f7(alist)  # True

Timing:

    In [104]: %timeit f7(alist)
    1000 loops, best of 3: 1.3 ms per loop
    In [110]: %timeit uniquifier(alist)
    100 loops, best of 3: 4.39 ms per loop

@Ilya Prokin 2015-05-16 23:05:08

A simple recursive solution:

def uniquefy_list(a):
    return uniquefy_list(a[1:]) if a[0] in a[1:] else [a[0]]+uniquefy_list(a[1:]) if len(a)>1 else [a[0]]

@Sergey M Nikitin 2015-04-27 14:47:21

5 x faster reduce variant but more sophisticated

>>> l = [5, 6, 6, 1, 1, 2, 2, 3, 4]
>>> reduce(lambda r, v: v in r[1] and r or (r[0].append(v) or r[1].add(v)) or r, l, ([], set()))[0]
[5, 6, 1, 2, 3, 4]

Explanation:

default = (list(), set())
# use list to keep order
# use set to make lookup faster

def reducer(result, item):
    if item not in result[1]:
        result[0].append(item)
        result[1].add(item)
    return result

>>> reduce(reducer, l, default)[0]
[5, 6, 1, 2, 3, 4]

@kylie.a 2014-11-07 05:02:54

l = [1,2,2,3,3,...]
n = []
n.extend(ele for ele in l if ele not in set(n))

A generator expression that uses the O(1) look up of a set to determine whether or not to include an element in the new list.

@John Coleman 2017-01-03 20:14:37

Clever use of extend with a generator expression which depends on the thing being extended (so +1), but set(n) is recomputed at each stage (which is linear) and this bumps the overall approach to being quadratic. In fact, this is almost certainly worse than simply using ele in n. Making a set for a single membership test isn't worth the expense of the set creation. Still -- it is an interesting approach.

@dominecf 2013-10-02 11:23:31

Relatively effective approach with _sorted_ a numpy arrays:

b = np.array([1,3,3, 8, 12, 12,12])    
numpy.hstack([b[0], [x[0] for x in zip(b[1:], b[:-1]) if x[0]!=x[1]]])

Outputs:

array([ 1,  3,  8, 12])

@user1969453 2014-04-25 01:28:31

You could do a sort of ugly list comprehension hack.

[l[i] for i in range(len(l)) if l.index(l[i]) == i]

@Evpok 2015-02-27 17:56:02

Prefer i,e in enumerate(l) to l[i] for i in range(len(l)).

@dansalmo 2013-04-13 17:32:19

sequence = ['1', '2', '3', '3', '6', '4', '5', '6']
unique = []
[unique.append(item) for item in sequence if item not in unique]

unique → ['1', '2', '3', '6', '4', '5']

@Mayur Patel 2013-05-27 18:30:24

This may not be the most concise, but I like it for its clarity. Thanks

@goncalopp 2014-03-19 17:13:52

It's worth noting that this runs in n^2

@Martijn Pieters 2015-03-03 14:32:11

Ick. 2 strikes: Using a list for membership testing (slow, O(N)) and using a list comprehension for the side effects (building another list of Nonereferences in the process!)

@jamylak 2018-02-17 07:43:03

I agree with @MartijnPieters there's absolutely no reason for the list comprehension with side effects. Just use a for loop instead

@abarnert 2013-10-09 18:27:09

For another very late answer to another very old question:

The itertools recipes have a function that does this, using the seen set technique, but:

  • Handles a standard key function.
  • Uses no unseemly hacks.
  • Optimizes the loop by pre-binding seen.add instead of looking it up N times. (f7 also does this, but some versions don't.)
  • Optimizes the loop by using ifilterfalse, so you only have to loop over the unique elements in Python, instead of all of them. (You still iterate over all of them inside ifilterfalse, of course, but that's in C, and much faster.)

Is it actually faster than f7? It depends on your data, so you'll have to test it and see. If you want a list in the end, f7 uses a listcomp, and there's no way to do that here. (You can directly append instead of yielding, or you can feed the generator into the list function, but neither one can be as fast as the LIST_APPEND inside a listcomp.) At any rate, usually, squeezing out a few microseconds is not going to be as important as having an easily-understandable, reusable, already-written function that doesn't require DSU when you want to decorate.

As with all of the recipes, it's also available in more-iterools.

If you just want the no-key case, you can simplify it as:

def unique(iterable):
    seen = set()
    seen_add = seen.add
    for element in itertools.ifilterfalse(seen.__contains__, iterable):
        seen_add(element)
        yield element

@jamylak 2015-06-07 15:24:06

I completely overlooked more-itertools this is clearly the best answer. A simple from more_itertools import unique_everseen list(unique_everseen(items)) A much faster approach than mine and much better than the accepted answer, I think the library download is worth it. I am going to community wiki my answer and add this in.

@ely 2013-09-10 21:40:58

Borrowing the recursive idea used in definining Haskell's nub function for lists, this would be a recursive approach:

def unique(lst):
    return [] if lst==[] else [lst[0]] + unique(filter(lambda x: x!= lst[0], lst[1:]))

e.g.:

In [118]: unique([1,5,1,1,4,3,4])
Out[118]: [1, 5, 4, 3]

I tried it for growing data sizes and saw sub-linear time-complexity (not definitive, but suggests this should be fine for normal data).

In [122]: %timeit unique(np.random.randint(5, size=(1)))
10000 loops, best of 3: 25.3 us per loop

In [123]: %timeit unique(np.random.randint(5, size=(10)))
10000 loops, best of 3: 42.9 us per loop

In [124]: %timeit unique(np.random.randint(5, size=(100)))
10000 loops, best of 3: 132 us per loop

In [125]: %timeit unique(np.random.randint(5, size=(1000)))
1000 loops, best of 3: 1.05 ms per loop

In [126]: %timeit unique(np.random.randint(5, size=(10000)))
100 loops, best of 3: 11 ms per loop

I also think it's interesting that this could be readily generalized to uniqueness by other operations. Like this:

import operator
def unique(lst, cmp_op=operator.ne):
    return [] if lst==[] else [lst[0]] + unique(filter(lambda x: cmp_op(x, lst[0]), lst[1:]), cmp_op)

For example, you could pass in a function that uses the notion of rounding to the same integer as if it was "equality" for uniqueness purposes, like this:

def test_round(x,y):
    return round(x) != round(y)

then unique(some_list, test_round) would provide the unique elements of the list where uniqueness no longer meant traditional equality (which is implied by using any sort of set-based or dict-key-based approach to this problem) but instead meant to take only the first element that rounds to K for each possible integer K that the elements might round to, e.g.:

In [6]: unique([1.2, 5, 1.9, 1.1, 4.2, 3, 4.8], test_round)
Out[6]: [1.2, 5, 1.9, 4.2, 3]

@ely 2013-09-11 14:05:21

Note that performance will get bad when the number of unique elements is very large relative to the total number of elements, since each successive recursive call's use of filter will barely benefit from the previous call at all. But if the number of unique elements is small relative to the array size, this should perform pretty well.

@shamrock 2013-05-27 21:37:23

I think if you wanna maintain the order,

you can try this:

list1 = ['b','c','d','b','c','a','a']    
list2 = list(set(list1))    
list2.sort(key=list1.index)    
print list2

OR similarly you can do this:

list1 = ['b','c','d','b','c','a','a']  
list2 = sorted(set(list1),key=list1.index)  
print list2 

You can also do this:

list1 = ['b','c','d','b','c','a','a']    
list2 = []    
for i in list1:    
    if not i in list2:  
        list2.append(i)`    
print list2

It can also be written as this:

list1 = ['b','c','d','b','c','a','a']    
list2 = []    
[list2.append(i) for i in list1 if not i in list2]    
print list2 

@Richard 2013-09-03 19:11:08

Your first two answers assume that the order of the list can be rebuilt using a sorting function, but this may not be so.

@cod3monk3y 2014-01-14 16:59:32

It's also O(n^2), worst case on an already sorted list

@Derek Veit 2016-01-08 20:53:47

Most answers are focused on performance. For lists that aren't large enough to worry about performance, the sorted(set(list1),key=list1.index) is the best thing I've seen. No extra import, no extra function, no extra variable, and it's fairly simple and readable.

@Saurabh Hirani 2011-10-09 14:16:00

MizardX's answer gives a good collection of multiple approaches.

This is what I came up with while thinking aloud:

mylist = [x for i,x in enumerate(mylist) if x not in mylist[i+1:]]

@Rivka 2012-09-02 12:05:45

Your solution is nice, but it takes the last appearance of each element. To take the first appearance use: [x for i,x in enumerate(mylist) if x not in mylist[:i]]

@Nikita Volkov 2012-09-05 15:06:58

Since searching in a list is an O(n) operation and you perform it on each item, the resulting complexity of your solution would be O(n^2). This is just unacceptable for such a trivial problem.

@zmk 2011-08-21 20:04:12

For no hashable types (e.g. list of lists), based on MizardX's:

def f7_noHash(seq)
    seen = set()
    return [ x for x in seq if str( x ) not in seen and not seen.add( str( x ) )]

@code22 2011-08-05 17:06:25

If you need one liner then maybe this would help:

reduce(lambda x, y: x + y if y[0] not in x else x, map(lambda x: [x],lst))

... should work but correct me if i'm wrong

@samuel 2013-12-17 05:39:12

can lambda work with if ? I think it's not correct.

@code22 2013-12-17 15:28:08

it's a conditional expression so it's good

@samuel 2013-12-18 06:30:16

yes, it's correct. I'm sorry for my mistake. I tried it just now.

@Rafał Dowgird 2009-01-26 15:47:14

from itertools import groupby
[ key for key,_ in groupby(sortedList)]

The list doesn't even have to be sorted, the sufficient condition is that equal values are grouped together.

Edit: I assumed that "preserving order" implies that the list is actually ordered. If this is not the case, then the solution from MizardX is the right one.

Community edit: This is however the most elegant way to "compress duplicate consecutive elements into a single element".

@unbeknown 2009-01-26 15:51:26

But this doesn't preserve order!

@Josh Glover 2009-01-26 15:54:58

Hrm, this is problematic, since I cannot guarantee that equal values are grouped together without looping once over the list, by which time I could have pruned the duplicates.

@Rafał Dowgird 2009-01-26 15:56:52

I assumed that "preserving order" implied that the list is actually ordered.

@unbeknown 2009-01-26 15:56:54

Ah, yes, I read sortedList as sorted(List) :-( But the input list can be in a different order.

@Rafał Dowgird 2009-01-26 15:59:47

Added clarification.

@unbeknown 2009-01-26 16:00:48

Maybe the specification of the input list is a little bit unclear. The values don't even need to be grouped together: [2, 1, 3, 1]. So which values to keep and which to delete?

@Rafał Dowgird 2009-01-26 16:03:16

I think that the question should be reattached as a coment under the original question.

Related Questions

Sponsored Content

26 Answered Questions

[SOLVED] How do I list all files of a directory?

  • 2010-07-08 19:31:22
  • duhhunjonn
  • 3045375 View
  • 3339 Score
  • 26 Answer
  • Tags:   python directory

54 Answered Questions

[SOLVED] Remove duplicate values from JS array

30 Answered Questions

[SOLVED] How do I concatenate two lists in Python?

6 Answered Questions

[SOLVED] How do I get the number of elements in a list in Python?

  • 2009-11-11 00:30:54
  • y2k
  • 3014783 View
  • 1745 Score
  • 6 Answer
  • Tags:   python list

20 Answered Questions

[SOLVED] How to clone or copy a list?

35 Answered Questions

[SOLVED] How do I check if a list is empty?

  • 2008-09-10 06:20:11
  • Ray Vega
  • 2110685 View
  • 3152 Score
  • 35 Answer
  • Tags:   python list

57 Answered Questions

[SOLVED] How do you split a list into evenly sized chunks?

44 Answered Questions

[SOLVED] Removing duplicates in lists

19 Answered Questions

[SOLVED] How do I remove an element from a list by index in Python?

  • 2009-03-09 18:16:11
  • Joan Venge
  • 2130295 View
  • 1200 Score
  • 19 Answer
  • Tags:   python list

37 Answered Questions

[SOLVED] How can I remove duplicate rows?

Sponsored Content