#### [SOLVED] Removing duplicates in lists

Pretty much I need to write a program to check if a list has any duplicates and if it does it removes them and returns a new list with the items that weren't duplicated/removed. This is what I have but to be honest I do not know what to do.

``````def remove_duplicates():
t = ['a', 'b', 'c', 'd']
t2 = ['a', 'c', 'd']
for t in t2:
t.append(t.remove())
return t
``````

#### @poke 2011-11-01 00:49:04

The common approach to get a unique collection of items is to use a `set`. Sets are unordered collections of distinct objects. To create a set from any iterable, you can simply pass it to the built-in `set()` function. If you later need a real list again, you can similarly pass the set to the `list()` function.

The following example should cover whatever you are trying to do:

``````>>> t = [1, 2, 3, 1, 2, 5, 6, 7, 8]
>>> t
[1, 2, 3, 1, 2, 5, 6, 7, 8]
>>> list(set(t))
[1, 2, 3, 5, 6, 7, 8]
>>> s = [1, 2, 3]
>>> list(set(t) - set(s))
[8, 5, 6, 7]
``````

As you can see from the example result, the original order is not maintained. As mentioned above, sets themselves are unordered collections, so the order is lost. When converting a set back to a list, an arbitrary order is created.

### Maintaining order

If order is important to you, then you will have to use a different mechanism. A very common solution for this is to rely on `OrderedDict` to keep the order of keys during insertion:

``````>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys(t))
[1, 2, 3, 5, 6, 7, 8]
``````

Starting with Python 3.7, the built-in dictionary is guaranteed to maintain the insertion order as well, so you can also use that directly if you are on Python 3.7 or later (or CPython 3.6):

``````>>> list(dict.fromkeys(t))
[1, 2, 3, 5, 6, 7, 8]
``````

Note that this has the overhead of creating a dictionary first, and then creating a list from it. If you don’t actually need to preserve the order, you’re better off using a set. Check out this question for more details and alternative ways to preserve the order when removing duplicates.

Finally note that both the `set` as well as the `OrderedDict`/`dict` solutions require your items to be hashable. This usually means that they have to be immutable. If you have to deal with items that are not hashable (e.g. list objects), then you will have to use a slow approach in which you will basically have to compare every item with every other item in a nested loop.

#### @Israel Teixeira 2019-05-22 15:57:02

If your list is ordered, you can use the following approach to iterate over it skipping the repeated values. This is especially useful to handle big lists with low memory consumption evading the cost of building a `dict` or a `set`:

``````def uniq(iterator):
prev = None
for item in iterator:
if item != prev:
prev = item
yield item
``````

Then:

``````for item in [1, 1, 3, 5, 5, 6]:
print(item, end=' ')
``````

The output is going to be: `1 3 5 6`

#### @Israel Teixeira 2019-05-23 19:54:56

My answer details a more efficient way of skipping repeated values when the constraint of having a pre-ordered list is appreciated

#### @HEEL_caT666 2019-02-23 20:44:39

If you want to preserve the order, and not use any external modules here is an easy way to do this:

``````>>> t = [1, 9, 2, 3, 4, 5, 3, 6, 7, 5, 8, 9]
>>> list(dict.fromkeys(t))
[1, 9, 2, 3, 4, 5, 6, 7, 8]
``````

Note: This method preserves the order of appearance, so, as seen above, nine will come after one because it was the first time it appeared. This however, is the same result as you would get with doing

``````from collections import OrderedDict
ulist=list(OrderedDict.fromkeys(l))
``````

but it is much shorter, and runs faster.

This works because each time the `fromkeys` function tries to create a new key, if the value already exists it will simply overwrite it. This wont affect the dictionary at all however, as `fromkeys` creates a dictionary where all keys have the value `None`, so effectively it eliminates all duplicates this way.

#### @dtypist 2019-05-02 10:30:54

Also try it out here

#### @anatolyg 2019-05-30 14:44:37

This doesn't work in Python 2

#### @where23 2018-12-19 06:17:26

Sometimes you need to remove the duplicate items in-place, without creating new list. For example, the list is big, or keep it as a shadow copy

``````from collections import Counter
cntDict = Counter(t)
for item,cnt in cntDict.items():
for _ in range(cnt-1):
t.remove(item)
``````

#### @Cybernetic 2018-10-23 18:57:27

You can use the following function:

``````def rem_dupes(dup_list):
yooneeks = []
for elem in dup_list:
if elem not in yooneeks:
yooneeks.append(elem)
return yooneeks
``````

Example:

``````my_list = ['this','is','a','list','with','dupicates','in', 'the', 'list']
``````

Usage:

``````rem_dupes(my_list)
``````

['this', 'is', 'a', 'list', 'with', 'dupicates', 'in', 'the']

#### @Anoop Kumar 2018-10-19 08:09:09

Python has built-in many functions You can use set() to remove the duplicate inside the list. As per your example there are below two lists t and t2

``````t = ['a', 'b', 'c', 'd']
t2 = ['a', 'c', 'd']
result = list(set(t) - set(t2))
result
``````

#### @Ruturaj 2019-03-06 03:15:54

someone else have the similar answer

#### @Akarsh Jain 2018-10-06 05:46:49

One more better approach could be,

``````import pandas as pd

myList = [1, 2, 3, 1, 2, 5, 6, 7, 8]
cleanList = pd.Series(myList).drop_duplicates().tolist()
print(cleanList)

#> [1, 2, 3, 5, 6, 7, 8]
``````

and the order remains preserved.

#### @Glutexo 2019-03-20 12:29:07

Though this might work well, using a heavy library like pandas for this purpose seems like an overkill.

#### @Flavio Wuensche 2018-09-18 12:56:16

You can use `set` to remove duplicates:

``````mylist = list(set(mylist))
``````

But note the results will be unordered. If that's an issue:

``````mylist.sort()
``````

You can just do: mylist = sorted(list(set(mylist)))

#### @yunus 2018-09-13 07:44:52

this is just a readable funtion ,easily understandable ,and i have used the dict data structure,i have used some builtin funtions and a better complexity of O(n)

``````def undup(dup_list):
b={}
for i in dup_list:
b.update({i:1})
return b.keys()
a=["a",'b','a']
print undup(a)
``````

disclamer: u may get an indentation error(if copy and paste) ,use the above code with proper indentation before pasting

#### @ste_kwr 2018-08-28 23:08:50

Unfortunately. Most answers here either do not preserve the order or are too long. Here is a simple, order preserving answer.

``````s = [1,2,3,4,5,2,5,6,7,1,3,9,3,5]
x=[]

[x.append(i) for i in s if i not in x]
print(x)
``````

This will give you x with duplicates removed but preserving the order.

#### @Gajendra D Ambi 2018-08-19 11:30:18

``````list_with_unique_items = list(set(list_with_duplicates))
``````

#### @N. Wouda 2018-08-19 14:05:32

There appear to be quite a few other answers here. What does this answer offer over the other solutions posted? Furthermore, while this code may answer the question, it lacks explanation. Please consider adding text to explain what it does, and why it answers the question posed.

#### @Gajendra D Ambi 2018-08-19 16:15:05

it is a oneliner which needs to explanation. Some like/want/understands answers which are like an essay, few others like answers which use python's inbuilt library, some others like answers which don't use python's library, but it is for those who like oneliners which needs no explanation.

#### @Wariored 2018-06-20 12:45:17

Very simple way in Python 3:

``````>>> n = [1, 2, 3, 4, 1, 1]
>>> n
[1, 2, 3, 4, 1, 1]
>>> m = sorted(list(set(n)))
>>> m
[1, 2, 3, 4]
``````

`sorted(list(...))` is redundant (`sorted` already implicitly converts its argument to a new `list`, sorts it, then returns the new `list`, so using both means making an unnecessary temporary `list`). Use only `list` if the result need not be sorted, use only `sorted` if the result needs to be sorted.

#### @Wariored 2018-06-20 13:02:01

Yeah, you are right!

#### @Dennis Peterson 2018-03-24 16:39:24

``````def remove_duplicates(input_list):
if input_list == []:
return []
#sort list from smallest to largest
input_list=sorted(input_list)
#initialize ouput list with first element of the       sorted input list
output_list = [input_list[0]]
for item in input_list:
if item >output_list[-1]:
output_list.append(item)
return output_list
``````

#### @DaFois 2018-03-24 17:02:34

Rather than putting the row code in this way, can you explain what your code does?

#### @Dennis Peterson 2018-03-24 17:06:02

What don't you understand in that Code , can't you see it's for removing duplicates

#### @SuperNova 2018-02-22 17:10:31

Another solution might be the following. Create a dictionary out of the list with item as key and index as value, and then print the dictionary keys.

``````>>> lst = [1, 3, 4, 2, 1, 21, 1, 32, 21, 1, 6, 5, 7, 8, 2]
>>>
>>> dict_enum = {item:index for index, item in enumerate(lst)}
>>> print dict_enum.keys()
[32, 1, 2, 3, 4, 5, 6, 7, 8, 21]
``````

Why compute/store the index if you never use it? This looks like a solution intended to preserve order (by storing last seen index of each value) that forgot to do so. `list(set(lst))` would achieve the same logical result.

#### @Eli Korvigo 2016-01-13 19:12:59

All the order-preserving approaches I've seen here so far either use naive comparison (with O(n^2) time-complexity at best) or heavy-weight `OrderedDicts`/`set`+`list` combinations that are limited to hashable inputs. Here is a hash-independent O(nlogn) solution:

Update added the `key` argument, documentation and Python 3 compatibility.

``````# from functools import reduce <-- add this import on Python 3

def uniq(iterable, key=lambda x: x):
"""
Remove duplicates from an iterable. Preserves order.
:type iterable: Iterable[Ord => A]
:param iterable: an iterable of objects of any orderable type
:type key: Callable[A] -> (Ord => B)
:param key: optional argument; by default an item (A) is discarded
if another item (B), such that A == B, has already been encountered and taken.
If you provide a key, this condition changes to key(A) == key(B); the callable
must return orderable objects.
"""
# Enumerate the list to restore order lately; reduce the sorted list; restore order
def append_unique(acc, item):
return acc if key(acc[-1][1]) == key(item[1]) else acc.append(item) or acc
srt_enum = sorted(enumerate(iterable), key=lambda item: key(item[1]))
return [item[1] for item in sorted(reduce(append_unique, srt_enum, [srt_enum[0]]))]
``````

#### @loxaxs 2016-05-18 20:40:58

Yet, this solution requires orderable elements. I will use it uniquify my list of lists: it is a pain to `tuple()` lists and to hash them. | | | | - Generally speaking, the hash process takes a time proportional to the size of the whole data, while this solution takes a time O(nlog(n)), depending only on the length of the list.

#### @9000 2017-06-05 16:29:26

I think that the set-based approach is equally cheap (O(n log n)), or cheaper, than sorting + detection of uniques. (This approach would parallelize much better, though.) It also does not exactly preserve the initial order, but it gives a predictable order.

#### @Eli Korvigo 2017-06-06 17:34:05

@9000 That is true. I've never mentioned time-complexity of a hash-table-based approach, which is obviously O(n). Here you can find many answers incorporating hash-tables. They are not universal, though, because they require objects to be hashable. Moreover, they are a lot more memory-intensive.

#### @Cochise Ruhulessin 2018-07-18 08:56:24

This should be the accepted answer since the question is how to remove duplicates from a list.

#### @Eli Korvigo 2019-03-11 06:40:29

Why the down-vote?

#### @MSeifert 2016-11-09 01:56:22

It requires installing a 3rd-party module but the package `iteration_utilities` contains a `unique_everseen`1 function that can remove all duplicates while preserving the order:

``````>>> from iteration_utilities import unique_everseen

>>> list(unique_everseen(['a', 'b', 'c', 'd'] + ['a', 'c', 'd']))
['a', 'b', 'c', 'd']
``````

In case you want to avoid the overhead of the list addition operation you can use `itertools.chain` instead:

``````>>> from itertools import chain
>>> list(unique_everseen(chain(['a', 'b', 'c', 'd'], ['a', 'c', 'd'])))
['a', 'b', 'c', 'd']
``````

The `unique_everseen` also works if you have unhashable items (for example lists) in the lists:

``````>>> from iteration_utilities import unique_everseen
>>> list(unique_everseen([['a'], ['b'], 'c', 'd'] + ['a', 'c', 'd']))
[['a'], ['b'], 'c', 'd', 'a']
``````

However that will be (much) slower than if the items are hashable.

1 Disclosure: I'm the author of the `iteration_utilities`-library.

#### @Raymond Hettinger 2011-11-01 00:53:55

In Python 2.7, the new way of removing duplicates from an iterable while keeping it in the original order is:

``````>>> from collections import OrderedDict
['a', 'b', 'r', 'c', 'd']
``````

In Python 3.5, the OrderedDict has a C implementation. My timings show that this is now both the fastest and shortest of the various approaches for Python 3.5.

In Python 3.6, the regular dict became both ordered and compact. (This feature is holds for CPython and PyPy but may not present in other implementations). That gives us a new fastest way of deduping while retaining order:

``````>>> list(dict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']
``````

In Python 3.7, the regular dict is guaranteed to both ordered across all implementations. So, the shortest and fastest solution is:

``````>>> list(dict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']
``````

#### @Herberth Amaral 2012-10-22 20:23:14

I think this is the only way to keep the items in order.

#### @Martijn Pieters 2013-08-15 14:24:25

@HerberthAmaral: That is very far from true, see How do you remove duplicates from a list in Python whilst preserving order?

#### @Herberth Amaral 2013-08-15 21:34:22

@MartijnPieters Correcting: I think this is the only simple way to keep items in order.

#### @Davide 2017-02-15 20:28:53

For this too, the content of the original list must be hashable

#### @CraZ 2018-05-16 17:27:58

As @Davide mentioned, the original list must hashable. This means, that this does not work for a list of dictionaries. `TypeError: unhashable type: 'dictlist'`

#### @Willem Van Onsem 2017-12-19 11:10:34

There are a lot of answers here that use a `set(..)` (which is fast given the elements are hashable), or a list (which has the downside that it results in an O(n2) algorithm.

The function I propose is a hybrid one: we use a `set(..)` for items that are hashable, and a `list(..)` for the ones that are not. Furthermore it is implemented as a generator such that we can for instance limit the number of items, or do some additional filtering.

Finally we also can use a `key` argument to specify in what way the elements should be unique. For instance we can use this if we want to filter a list of strings such that every string in the output has a different length.

``````def uniq(iterable, key=lambda x: x):
seens = set()
seenl = []
for item in iterable:
k = key(item)
try:
seen = k in seens
except TypeError:
seen = k in seenl
if not seen:
yield item
try:
except TypeError:
seenl.append(k)``````

We can now for instance use this like:

``````>>> list(uniq(["apple", "pear", "banana", "lemon"], len))
['apple', 'pear', 'banana']
>>> list(uniq(["apple", "pear", "lemon", "banana"], len))
['apple', 'pear', 'banana']
>>> list(uniq(["apple", "pear", {}, "lemon", [], "banana"], len))
['apple', 'pear', {}, 'banana']
>>> list(uniq(["apple", "pear", {}, "lemon", [], "banana"]))
['apple', 'pear', {}, 'lemon', [], 'banana']
>>> list(uniq(["apple", "pear", {}, "lemon", {}, "banana"]))
['apple', 'pear', {}, 'lemon', 'banana']
``````

It is thus a uniqeness filter that can work on any iterable and filter out uniques, regardless whether these are hashable or not.

It makes one assumption: that if one object is hashable, and another one is not, the two objects are never equal. This can strictly speaking happen, although it would be very uncommon.

A note: There are built-ins that will break the assumption stated in the final paragraph; `frozenset` is hashable, `set` is not, and if they have the same values, they're equal, but you'll treat them as non-equal in this code.

#### @Willem Van Onsem 2018-06-20 13:04:24

@ShadowRanger: yes, I agree with that, like said it does not solve all the problems. Nevertheless, by using a `set(..)` this will simply not work at all, and by using a `list`, this will result in linear lookup time. So it is meant as a "better" set, but with some pitfalls.

#### @Willem Van Onsem 2018-06-20 13:05:09

Furhermore a `set(..)` also in rare cases returns objects that are not equal. For example `math.nan` is not equal to `math.nan`, but the dictionary will return it, since it checks first for reference equality.

#### @G M 2014-07-03 12:45:51

There are also solutions using Pandas and Numpy. They both return numpy array so you have to use the function `.tolist()` if you want a list.

``````t=['a','a','b','b','b','c','c','c']
t2= ['c','c','b','b','b','a','a','a']
``````

## Pandas solution

Using Pandas function `unique()`:

``````import pandas as pd
pd.unique(t).tolist()
>>>['a','b','c']
pd.unique(t2).tolist()
>>>['c','b','a']
``````

## Numpy solution

Using numpy function `unique()`.

``````import numpy as np
np.unique(t).tolist()
>>>['a','b','c']
np.unique(t2).tolist()
>>>['a','b','c']
``````

Note that numpy.unique() also sort the values. So the list `t2` is returned sorted. If you want to have the order preserved use as in this answer:

``````_, idx = np.unique(t2, return_index=True)
t2[np.sort(idx)].tolist()
>>>['c','b','a']
``````

The solution is not so elegant compared to the others, however, compared to pandas.unique(), numpy.unique() allows you also to check if nested arrays are unique along one selected axis.

#### @user227666 2014-07-03 12:48:10

This will convert the list to numpy array which is a mess and won't work for strings.

#### @G M 2014-07-03 16:45:41

@user227666 thanks for your review but that's not true it works even with string and you can add .tolist if you want to get a list...

#### @Debosmit Ray 2016-10-09 09:11:32

I think this is kinda like trying to kill a bee with a sledgehammer. Works, sure! But, importing a library for just this purpose might be a little overkill, no?

#### @G M 2016-10-10 07:17:44

@DebosmitRay it could be useful if you work in Data Science where usually you work with numpy and many times you need to work with numpy array.

#### @Suresh Gupta 2017-10-12 10:28:17

Without using set

``````data=[1, 2, 3, 1, 2, 5, 6, 7, 8]
uni_data=[]
for dat in data:
if dat not in uni_data:
uni_data.append(dat)

print(uni_data)
``````

#### @Santosh Pillai 2017-09-18 06:19:58

If you don't care about order and want something different than the pythonic ways suggested above (that is, it can be used in interviews) then :

``````def remove_dup(arr):
size = len(arr)
j = 0    # To store index of next unique element
for i in range(0, size-1):
# If current element is not equal
# to next element then store that
# current element
if(arr[i] != arr[i+1]):
arr[j] = arr[i]
j+=1

arr[j] = arr[size-1] # Store the last element as whether it is unique or repeated, it hasn't stored previously

return arr[0:j+1]

if __name__ == '__main__':
arr = [10, 10, 1, 1, 1, 3, 3, 4, 5, 6, 7, 8, 8, 9]
print(remove_dup(sorted(arr)))
``````

Time Complexity : O(n)

Auxiliary Space : O(n)

``````def remove_duplicates(A):
[A.pop(count) for count,elem in enumerate(A) if A.count(elem)!=1]
return A
``````

A list comprehesion to remove duplicates

#### @Anurag Misra 2017-08-18 11:11:54

You can do this simply by using sets.

Step1: Get Different elements of lists
Step2 Get Common elements of lists
Step3 Combine them

``````In [1]: a = ["apples", "bananas", "cucumbers"]

In [2]: b = ["pears", "apples", "watermelons"]

In [3]: set(a).symmetric_difference(b).union(set(a).intersection(b))
Out[3]: {'apples', 'bananas', 'cucumbers', 'pears', 'watermelons'}
``````

#### @Anurag Misra 2017-08-17 07:39:25

Best approach of removing duplicates from a list is using set() function, available in python, again converting that set into list

``````In [2]: some_list = ['a','a','v','v','v','c','c','d']
In [3]: list(set(some_list))
Out[3]: ['a', 'c', 'd', 'v']
``````

#### @Meet Zaveri 2018-05-01 07:52:28

It works flawlessly!

#### @Nurul Akter Towhid 2017-07-29 00:39:14

Using set :

``````a = [0,1,2,3,4,3,3,4]
a = list(set(a))
print a
``````

Using unique :

``````import numpy as np
a = [0,1,2,3,4,3,3,4]
a = np.unique(a).tolist()
print a
``````

#### @user8383782 2017-07-29 00:33:01

I think converting to set is the easiest way to remove duplicate:

``````list1 = [1,2,1]
list1 = list(set(list1))
print list1
``````

#### @Atonal 2017-06-06 09:12:26

You could also do this:

``````>>> t = [1, 2, 3, 3, 2, 4, 5, 6]
>>> s = [x for i, x in enumerate(t) if i == t.index(x)]
>>> s
[1, 2, 3, 4, 5, 6]
``````

The reason that above works is that `index` method returns only the first index of an element. Duplicate elements have higher indices. Refer to here:

list.index(x[, start[, end]])
Return zero-based index in the list of the first item whose value is x. Raises a ValueError if there is no such item.

#### @Eli Korvigo 2018-04-13 20:42:34

This is horribly inefficient. `list.index` is a linear-time operation, making your solution quadratic.

#### @Atonal 2018-10-13 00:08:25

You're right. But also I believe it's fairly obvious the solution is intended to be a one liner that preserves the order. Everything else is already in here.

#### @9000 2011-11-01 00:49:33

It's a one-liner: `list(set(source_list))` will do the trick.

A `set` is something that can't possibly have duplicates.

Update: an order-preserving approach is two lines:

``````from collections import OrderedDict
OrderedDict((x, True) for x in source_list).keys()
``````

Here we use the fact that `OrderedDict` remembers the insertion order of keys, and does not change it when a value at a particular key is updated. We insert `True` as values, but we could insert anything, values are just not used. (`set` works a lot like a `dict` with ignored values, too.)

#### @thodnev 2017-04-01 19:46:31

Here's the fastest pythonic solution comaring to others listed in replies.

Using implementation details of short-circuit evaluation allows to use list comprehension, which is fast enough. `visited.add(item)` always returns `None` as a result, which is evaluated as `False`, so the right-side of `or` would always be the result of such an expression.

Time it yourself

``````def deduplicate(sequence):
visited = set()
out = [adder(item) or item for item in sequence if item not in visited]
return out
``````

#### @Björn Pollex 2017-03-09 11:50:35

For completeness, and since this is a very popular question, the toolz library offers a `unique` function:

``````>>> tuple(unique((1, 2, 3)))
(1, 2, 3)
>>> tuple(unique((1, 2, 1, 3)))
(1, 2, 3)
``````

### [SOLVED] Finding duplicate values in a SQL table

• 2010-04-07 18:17:29
• Alex
• 2440555 View
• 1658 Score
• Tags:   sql duplicates

### [SOLVED] How do I check if a list is empty?

• 2008-09-10 06:20:11
• Ray Vega
• 2271346 View
• 3236 Score
• Tags:   python list

### [SOLVED] How do I list all files of a directory?

• 2010-07-08 19:31:22
• duhhunjonn
• 3307242 View
• 3475 Score
• Tags:   python directory

### [SOLVED] Finding the index of an item given a list containing it in Python

• 2008-10-07 01:39:38
• Eugene M
• 3289719 View
• 2708 Score
• Tags:   python list indexing