By Neemaximo


2011-11-01 00:45:24 8 Comments

Pretty much I need to write a program to check if a list has any duplicates and if it does it removes them and returns a new list with the items that werent duplicated/removed. This is what I have but to be honest I do not know what to do.

def remove_duplicates():
    t = ['a', 'b', 'c', 'd']
    t2 = ['a', 'c', 'd']
    for t in t2:
        t.append(t.remove())
    return t

30 comments

@HEEL_caT666 2019-02-23 20:44:39

If you want to preserve the order, and not use any external modules here is an easy way to do this:

>>> t = [1, 9, 2, 3, 4, 5, 3, 6, 7, 5, 8, 9]
>>> list(dict.fromkeys(t))
[1, 9, 2, 3, 4, 5, 6, 7, 8]

Note: This method preserves the order of appearance, so, as seen above, nine will come after one because it was the first time it appeared. This however, is the same result as you would get with doing

from collections import OrderedDict
ulist=list(OrderedDict.fromkeys(l))

but it is much shorter, and runs faster.

This works because each time the fromkeys function tries to create a new key, if the value already exists it will simply overwrite it. This wont affect the dictionary at all however, as fromkeys creates a dictionary where all keys have the value None, so effectively it eliminates all duplicates this way.

@where23 2018-12-19 06:17:26

Sometimes you need to remove the duplicate items in-place, without creating new list. For example, the list is big, or keep it as a shadow copy

from collections import Counter
cntDict = Counter(t)
for item,cnt in cntDict.items():
    for _ in range(cnt-1):
        t.remove(item)

@Cybernetic 2018-10-23 18:57:27

You can use the following function:

def rem_dupes(dup_list): 
    yooneeks = [] 
    for elem in dup_list: 
        if elem not in yooneeks: 
            yooneeks.append(elem) 
    return yooneeks

Example:

my_list = ['this','is','a','list','with','dupicates','in', 'the', 'list']

Usage:

rem_dupes(my_list)

['this', 'is', 'a', 'list', 'with', 'dupicates', 'in', 'the']

@Anoop Kumar 2018-10-19 08:09:09

Python has built-in many functions You can use set() to remove the duplicate inside the list. As per your example there are below two lists t and t2

t = ['a', 'b', 'c', 'd']
t2 = ['a', 'c', 'd']
result = list(set(t) - set(t2))
result

Answer: ['b']

@Ruturaj 2019-03-06 03:15:54

someone else have the similar answer

@Akarsh Jain 2018-10-06 05:46:49

One more better approach could be,

import pandas as pd

myList = [1, 2, 3, 1, 2, 5, 6, 7, 8]
cleanList = pd.Series(myList).drop_duplicates().tolist()
print(cleanList)

#> [1, 2, 3, 5, 6, 7, 8]

and the order remains preserved.

@Glutexo 2019-03-20 12:29:07

Though this might work well, using a heavy library like pandas for this purpose seems like an overkill.

@Flavio Wuensche 2018-09-18 12:56:16

You can use set to remove duplicates:

mylist = list(set(mylist))

But note the results will be unordered. If that's an issue:

mylist.sort()

@Erik Campobadal 2019-01-03 13:39:48

You can just do: mylist = sorted(list(set(mylist)))

@yunus 2018-09-13 07:44:52

this is just a readable funtion ,easily understandable ,and i have used the dict data structure,i have used some builtin funtions and a better complexity of O(n)

def undup(dup_list):
    b={}
    for i in dup_list:
        b.update({i:1})
    return b.keys()
a=["a",'b','a']
print undup(a)

disclamer: u may get an indentation error(if copy and paste) ,use the above code with proper indentation before pasting

@ste_kwr 2018-08-28 23:08:50

Unfortunately. Most answers here either do not preserve the order or are too long. Here is a simple, order preserving answer.

s = [1,2,3,4,5,2,5,6,7,1,3,9,3,5]
x=[]

[x.append(i) for i in s if i not in x]
print(x)

This will give you x with duplicates removed but preserving the order.

@Gajendra D Ambi 2018-08-19 11:30:18

list_with_unique_items = list(set(list_with_duplicates))

@N. Wouda 2018-08-19 14:05:32

There appear to be quite a few other answers here. What does this answer offer over the other solutions posted? Furthermore, while this code may answer the question, it lacks explanation. Please consider adding text to explain what it does, and why it answers the question posed.

@Gajendra D Ambi 2018-08-19 16:15:05

it is a oneliner which needs to explanation. Some like/want/understands answers which are like an essay, few others like answers which use python's inbuilt library, some others like answers which don't use python's library, but it is for those who like oneliners which needs no explanation.

@Wariored 2018-06-20 12:45:17

Very simple way in Python 3:

>>> n = [1, 2, 3, 4, 1, 1]
>>> n
[1, 2, 3, 4, 1, 1]
>>> m = sorted(list(set(n)))
>>> m
[1, 2, 3, 4]

@ShadowRanger 2018-06-20 12:57:54

sorted(list(...)) is redundant (sorted already implicitly converts its argument to a new list, sorts it, then returns the new list, so using both means making an unnecessary temporary list). Use only list if the result need not be sorted, use only sorted if the result needs to be sorted.

@Wariored 2018-06-20 13:02:01

Yeah, you are right!

@Dennis Peterson 2018-03-24 16:39:24

def remove_duplicates(input_list):
  if input_list == []:
    return []
  #sort list from smallest to largest
  input_list=sorted(input_list)
  #initialize ouput list with first element of the       sorted input list
  output_list = [input_list[0]]
  for item in input_list:
    if item >output_list[-1]:
      output_list.append(item)
  return output_list   

@DaFois 2018-03-24 17:02:34

Rather than putting the row code in this way, can you explain what your code does?

@Dennis Peterson 2018-03-24 17:06:02

What don't you understand in that Code , can't you see it's for removing duplicates

@SuperNova 2018-02-22 17:10:31

Another solution might be the following. Create a dictionary out of the list with item as key and index as value, and then print the dictionary keys.

>>> lst = [1, 3, 4, 2, 1, 21, 1, 32, 21, 1, 6, 5, 7, 8, 2]
>>>
>>> dict_enum = {item:index for index, item in enumerate(lst)}
>>> print dict_enum.keys()
[32, 1, 2, 3, 4, 5, 6, 7, 8, 21]

@ShadowRanger 2018-06-20 13:00:04

Why compute/store the index if you never use it? This looks like a solution intended to preserve order (by storing last seen index of each value) that forgot to do so. list(set(lst)) would achieve the same logical result.

@Eli Korvigo 2016-01-13 19:12:59

All the order-preserving approaches I've seen here so far either use naive comparison (with O(n^2) time-complexity at best) or heavy-weight OrderedDicts/set+list combinations that are limited to hashable inputs. Here is a hash-independent O(nlogn) solution:

Update added the key argument, documentation and Python 3 compatibility.

# from functools import reduce <-- add this import on Python 3

def uniq(iterable, key=lambda x: x):
    """
    Remove duplicates from an iterable. Preserves order. 
    :type iterable: Iterable[Ord => A]
    :param iterable: an iterable of objects of any orderable type
    :type key: Callable[A] -> (Ord => B)
    :param key: optional argument; by default an item (A) is discarded 
    if another item (B), such that A == B, has already been encountered and taken. 
    If you provide a key, this condition changes to key(A) == key(B); the callable 
    must return orderable objects.
    """
    # Enumerate the list to restore order lately; reduce the sorted list; restore order
    def append_unique(acc, item):
        return acc if key(acc[-1][1]) == key(item[1]) else acc.append(item) or acc 
    srt_enum = sorted(enumerate(iterable), key=lambda item: key(item[1]))
    return [item[1] for item in sorted(reduce(append_unique, srt_enum, [srt_enum[0]]))] 

@loxaxs 2016-05-18 20:40:58

Yet, this solution requires orderable elements. I will use it uniquify my list of lists: it is a pain to tuple() lists and to hash them. | | | | - Generally speaking, the hash process takes a time proportional to the size of the whole data, while this solution takes a time O(nlog(n)), depending only on the length of the list.

@9000 2017-06-05 16:29:26

I think that the set-based approach is equally cheap (O(n log n)), or cheaper, than sorting + detection of uniques. (This approach would parallelize much better, though.) It also does not exactly preserve the initial order, but it gives a predictable order.

@Eli Korvigo 2017-06-06 17:34:05

@9000 That is true. I've never mentioned time-complexity of a hash-table-based approach, which is obviously O(n). Here you can find many answers incorporating hash-tables. They are not universal, though, because they require objects to be hashable. Moreover, they are a lot more memory-intensive.

@Cochise Ruhulessin 2018-07-18 08:56:24

This should be the accepted answer since the question is how to remove duplicates from a list.

@Eli Korvigo 2019-03-11 06:40:29

Why the down-vote?

@poke 2011-11-01 00:49:04

The common approach to get a unique collection of items is to use a set. Sets are unordered collections of distinct objects. To create a set from any iterable, you can simply pass it to the built-in set() function. If you later need a real list again, you can similarly pass the set to the list() function.

The following example should cover whatever you are trying to do:

>>> t = [1, 2, 3, 1, 2, 5, 6, 7, 8]
>>> t
[1, 2, 3, 1, 2, 5, 6, 7, 8]
>>> list(set(t))
[1, 2, 3, 5, 6, 7, 8]
>>> s = [1, 2, 3]
>>> list(set(t) - set(s))
[8, 5, 6, 7]

As you can see from the example result, the original order is not maintained. As mentioned above, sets themselves are unordered collections, so the order is lost. When converting a set back to a list, an arbitrary order is created.

If order is important to you, then you will have to use a different mechanism. A very common solution for this is to rely on OrderedDict to keep the order of keys during insertion:

>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys(t))
[1, 2, 3, 5, 6, 7, 8]

Note that this has the overhead of creating a dictionary first, and then creating a list from it. So if you don’t actually need to preserve the order, you’re better off using a set. Check out this question for more details and alternative ways to preserve the order when removing duplicates.


Finally note that both the set as well as the OrderedDict solution require your items to be hashable. This usually means that they have to be immutable. If you have to deal with items that are not hashable (e.g. list objects), then you will have to use a slow approach in which you will basically have to compare every item with every other item in a nested loop.

@Kos 2013-01-10 09:00:31

It should be noted that this kills the original order.

@fiatjaf 2013-02-23 06:15:30

It should be noted also that it doesn't work if you have dicts on the list.

@Jay Taylor 2013-05-21 15:31:21

This isn't helpful if list order is important to you.

@Davide 2017-02-15 20:26:45

And most importantly, the content of the original list must be hashable.

@Reza Dodge 2017-05-11 16:56:54

Actually the fourth comment is inclusive and more general than the second comment. It does not simply re-state the second comment as dicts are only one example of unhashable objects. Lists and sets are other examples of unhashable objects.

@noobninja 2018-05-25 16:38:39

sorted(list(set(l1 + l2))) combines 2 lists, then eliminates duplicates, then sorts them in ascending order.

@MSeifert 2016-11-09 01:56:22

It requires installing a 3rd-party module but the package iteration_utilities contains a unique_everseen1 function that can remove all duplicates while preserving the order:

>>> from iteration_utilities import unique_everseen

>>> list(unique_everseen(['a', 'b', 'c', 'd'] + ['a', 'c', 'd']))
['a', 'b', 'c', 'd']

In case you want to avoid the overhead of the list addition operation you can use itertools.chain instead:

>>> from itertools import chain
>>> list(unique_everseen(chain(['a', 'b', 'c', 'd'], ['a', 'c', 'd'])))
['a', 'b', 'c', 'd']

The unique_everseen also works if you have unhashable items (for example lists) in the lists:

>>> from iteration_utilities import unique_everseen
>>> list(unique_everseen([['a'], ['b'], 'c', 'd'] + ['a', 'c', 'd']))
[['a'], ['b'], 'c', 'd', 'a']

However that will be (much) slower than if the items are hashable.


1 Disclosure: I'm the author of the iteration_utilities-library.

@Raymond Hettinger 2011-11-01 00:53:55

In Python 2.7, the new way of removing duplicates from an iterable while keeping it in the original order is:

>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']

In Python 3.5, the OrderedDict has a C implementation. My timings show that this is now both the fastest and shortest of the various approaches for Python 3.5.

In Python 3.6, the regular dict became both ordered and compact. (This feature is holds for CPython and PyPy but may not present in other implementations). That gives us a new fastest way of deduping while retaining order:

>>> list(dict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']

In Python 3.7, the regular dict is guaranteed to both ordered across all implementations. So, the shortest and fastest solution is:

>>> list(dict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']

@Herberth Amaral 2012-10-22 20:23:14

I think this is the only way to keep the items in order.

@Martijn Pieters 2013-08-15 14:24:25

@HerberthAmaral: That is very far from true, see How do you remove duplicates from a list in Python whilst preserving order?

@Herberth Amaral 2013-08-15 21:34:22

@MartijnPieters Correcting: I think this is the only simple way to keep items in order.

@Davide 2017-02-15 20:28:53

For this too, the content of the original list must be hashable

@CraZ 2018-05-16 17:27:58

As @Davide mentioned, the original list must hashable. This means, that this does not work for a list of dictionaries. TypeError: unhashable type: 'dictlist'

@Willem Van Onsem 2017-12-19 11:10:34

There are a lot of answers here that use a set(..) (which is fast given the elements are hashable), or a list (which has the downside that it results in an O(n2) algorithm.

The function I propose is a hybrid one: we use a set(..) for items that are hashable, and a list(..) for the ones that are not. Furthermore it is implemented as a generator such that we can for instance limit the number of items, or do some additional filtering.

Finally we also can use a key argument to specify in what way the elements should be unique. For instance we can use this if we want to filter a list of strings such that every string in the output has a different length.

def uniq(iterable, key=lambda x: x):
    seens = set()
    seenl = []
    for item in iterable:
        k = key(item)
        try:
            seen = k in seens
        except TypeError:
            seen = k in seenl
        if not seen:
            yield item
            try:
                seens.add(k)
            except TypeError:
                seenl.append(k)

We can now for instance use this like:

>>> list(uniq(["apple", "pear", "banana", "lemon"], len))
['apple', 'pear', 'banana']
>>> list(uniq(["apple", "pear", "lemon", "banana"], len))
['apple', 'pear', 'banana']
>>> list(uniq(["apple", "pear", {}, "lemon", [], "banana"], len))
['apple', 'pear', {}, 'banana']
>>> list(uniq(["apple", "pear", {}, "lemon", [], "banana"]))
['apple', 'pear', {}, 'lemon', [], 'banana']
>>> list(uniq(["apple", "pear", {}, "lemon", {}, "banana"]))
['apple', 'pear', {}, 'lemon', 'banana']

It is thus a uniqeness filter that can work on any iterable and filter out uniques, regardless whether these are hashable or not.

It makes one assumption: that if one object is hashable, and another one is not, the two objects are never equal. This can strictly speaking happen, although it would be very uncommon.

@ShadowRanger 2018-06-20 13:02:07

A note: There are built-ins that will break the assumption stated in the final paragraph; frozenset is hashable, set is not, and if they have the same values, they're equal, but you'll treat them as non-equal in this code.

@Willem Van Onsem 2018-06-20 13:04:24

@ShadowRanger: yes, I agree with that, like said it does not solve all the problems. Nevertheless, by using a set(..) this will simply not work at all, and by using a list, this will result in linear lookup time. So it is meant as a "better" set, but with some pitfalls.

@Willem Van Onsem 2018-06-20 13:05:09

Furhermore a set(..) also in rare cases returns objects that are not equal. For example math.nan is not equal to math.nan, but the dictionary will return it, since it checks first for reference equality.

@G M 2014-07-03 12:45:51

There are also solutions using Pandas and Numpy. They both return numpy array so you have to use the function .tolist() if you want a list.

t=['a','a','b','b','b','c','c','c']
t2= ['c','c','b','b','b','a','a','a']

Pandas solution

Using Pandas function unique():

import pandas as pd
pd.unique(t).tolist()
>>>['a','b','c']
pd.unique(t2).tolist()
>>>['c','b','a']

Numpy solution

Using numpy function unique().

import numpy as np
np.unique(t).tolist()
>>>['a','b','c']
np.unique(t2).tolist()
>>>['a','b','c']

Note that numpy.unique() also sort the values. So the list t2 is returned sorted. If you want to have the order preserved use as in this answer:

_, idx = np.unique(t2, return_index=True)
t2[np.sort(idx)].tolist()
>>>['c','b','a']

The solution is not so elegant compared to the others, however, compared to pandas.unique(), numpy.unique() allows you also to check if nested arrays are unique along one selected axis.

@user227666 2014-07-03 12:48:10

This will convert the list to numpy array which is a mess and won't work for strings.

@G M 2014-07-03 16:45:41

@user227666 thanks for your review but that's not true it works even with string and you can add .tolist if you want to get a list...

@Debosmit Ray 2016-10-09 09:11:32

I think this is kinda like trying to kill a bee with a sledgehammer. Works, sure! But, importing a library for just this purpose might be a little overkill, no?

@G M 2016-10-10 07:17:44

@DebosmitRay it could be useful if you work in Data Science where usually you work with numpy and many times you need to work with numpy array.

@Suresh Gupta 2017-10-12 10:28:17

Without using set

data=[1, 2, 3, 1, 2, 5, 6, 7, 8]
uni_data=[]
for dat in data:
    if dat not in uni_data:
        uni_data.append(dat)

print(uni_data) 

@Santosh Pillai 2017-09-18 06:19:58

If you don't care about order and want something different than the pythonic ways suggested above (that is, it can be used in interviews) then :

def remove_dup(arr):
    size = len(arr)
    j = 0    # To store index of next unique element
    for i in range(0, size-1):
        # If current element is not equal
        # to next element then store that
        # current element
        if(arr[i] != arr[i+1]):
            arr[j] = arr[i]
            j+=1

    arr[j] = arr[size-1] # Store the last element as whether it is unique or repeated, it hasn't stored previously

    return arr[0:j+1]

if __name__ == '__main__':
    arr = [10, 10, 1, 1, 1, 3, 3, 4, 5, 6, 7, 8, 8, 9]
    print(remove_dup(sorted(arr)))

Time Complexity : O(n)

Auxiliary Space : O(n)

Reference: http://www.geeksforgeeks.org/remove-duplicates-sorted-array/

@whackamadoodle3000 2017-08-26 23:23:42

def remove_duplicates(A):
   [A.pop(count) for count,elem in enumerate(A) if A.count(elem)!=1]
   return A

A list comprehesion to remove duplicates

@Anurag Misra 2017-08-18 11:11:54

You can do this simply by using sets.

Step1: Get Different elements of lists
Step2 Get Common elements of lists
Step3 Combine them

In [1]: a = ["apples", "bananas", "cucumbers"]

In [2]: b = ["pears", "apples", "watermelons"]

In [3]: set(a).symmetric_difference(b).union(set(a).intersection(b))
Out[3]: {'apples', 'bananas', 'cucumbers', 'pears', 'watermelons'}

@Anurag Misra 2017-08-17 07:39:25

Best approach of removing duplicates from a list is using set() function, available in python, again converting that set into list

In [2]: some_list = ['a','a','v','v','v','c','c','d']
In [3]: list(set(some_list))
Out[3]: ['a', 'c', 'd', 'v']

@Meet Zaveri 2018-05-01 07:52:28

It works flawlessly!

@Anurag Misra 2018-05-02 05:53:55

@MeetZaveri glad.!

@Nurul Akter Towhid 2017-07-29 00:39:14

Using set :

a = [0,1,2,3,4,3,3,4]
a = list(set(a))
print a

Using unique :

import numpy as np
a = [0,1,2,3,4,3,3,4]
a = np.unique(a).tolist()
print a

@user8383782 2017-07-29 00:33:01

I think converting to set is the easiest way to remove duplicate:

list1 = [1,2,1]
list1 = list(set(list1))
print list1

@Atonal 2017-06-06 09:12:26

You could also do this:

>>> t = [1, 2, 3, 3, 2, 4, 5, 6]
>>> s = [x for i, x in enumerate(t) if i == t.index(x)]
>>> s
[1, 2, 3, 4, 5, 6]

The reason that above works is that index method returns only the first index of an element. Duplicate elements have higher indices. Refer to here:

list.index(x[, start[, end]])
Return zero-based index in the list of the first item whose value is x. Raises a ValueError if there is no such item.

@Eli Korvigo 2018-04-13 20:42:34

This is horribly inefficient. list.index is a linear-time operation, making your solution quadratic.

@Atonal 2018-10-13 00:08:25

You're right. But also I believe it's fairly obvious the solution is intended to be a one liner that preserves the order. Everything else is already in here.

@9000 2011-11-01 00:49:33

It's a one-liner: list(set(source_list)) will do the trick.

A set is something that can't possibly have duplicates.

Update: an order-preserving approach is two lines:

from collections import OrderedDict
OrderedDict((x, True) for x in source_list).keys()

Here we use the fact that OrderedDict remembers the insertion order of keys, and does not change it when a value at a particular key is updated. We insert True as values, but we could insert anything, values are just not used. (set works a lot like a dict with ignored values, too.)

@thodnev 2017-04-01 19:46:31

Here's the fastest pythonic solution comaring to others listed in replies.

Using implementation details of short-circuit evaluation allows to use list comprehension, which is fast enough. visited.add(item) always returns None as a result, which is evaluated as False, so the right-side of or would always be the result of such an expression.

Time it yourself

def deduplicate(sequence):
    visited = set()
    adder = visited.add  # get rid of qualification overhead
    out = [adder(item) or item for item in sequence if item not in visited]
    return out

@Björn Pollex 2017-03-09 11:50:35

For completeness, and since this is a very popular question, the toolz library offers a unique function:

>>> tuple(unique((1, 2, 3)))
(1, 2, 3)
>>> tuple(unique((1, 2, 1, 3)))
(1, 2, 3)

@James Sapam 2014-01-01 15:39:09

Another way of doing:

>>> seq = [1,2,3,'a', 'a', 1,2]
>> dict.fromkeys(seq).keys()
['a', 1, 2, 3]

@Dustin Wyatt 2017-12-22 15:24:59

Note that in modern Python versions (2.7+ I think, but I don't recall for sure), keys() returns a dictionary view object, not a list.

Related Questions

Sponsored Content

26 Answered Questions

[SOLVED] How do I list all files of a directory?

  • 2010-07-08 19:31:22
  • duhhunjonn
  • 3052440 View
  • 3346 Score
  • 26 Answer
  • Tags:   python directory

27 Answered Questions

[SOLVED] Difference between append vs. extend list methods in Python

26 Answered Questions

[SOLVED] Finding duplicate values in a SQL table

  • 2010-04-07 18:17:29
  • Alex
  • 2304332 View
  • 1563 Score
  • 26 Answer
  • Tags:   sql duplicates

29 Answered Questions

[SOLVED] Finding the index of an item given a list containing it in Python

  • 2008-10-07 01:39:38
  • Eugene M
  • 3154455 View
  • 2598 Score
  • 29 Answer
  • Tags:   python list

30 Answered Questions

[SOLVED] How do I concatenate two lists in Python?

20 Answered Questions

[SOLVED] How to clone or copy a list?

35 Answered Questions

[SOLVED] How do I check if a list is empty?

  • 2008-09-10 06:20:11
  • Ray Vega
  • 2115269 View
  • 3155 Score
  • 35 Answer
  • Tags:   python list

38 Answered Questions

[SOLVED] How to make a flat list out of list of lists?

14 Answered Questions

[SOLVED] What is the optimal algorithm for the game 2048?

12 Answered Questions

[SOLVED] Meaning of @classmethod and @staticmethod for beginner?

Sponsored Content