By Casebash


2009-10-31 10:12:07 8 Comments

Python has an ordered dictionary. What about an ordered set?

12 comments

@jrc 2018-12-06 18:21:53

The answer is no, but you can use collections.OrderedDict from the Python standard library with just keys (and values as None) for the same purpose.

Update: As of Python 3.7 (and CPython 3.6), standard dict is guaranteed to preserve order and is more performant than OrderedDict. (For backward compatibility and especially readability, however, you may wish to continue using OrderedDict.)

Here's an example of how to use dict as an ordered set to filter out duplicate items while preserving order, thereby emulating an ordered set. Use the dict class method fromkeys() to create a dict, then simply ask for the keys() back.

>>> keywords = ['foo', 'bar', 'bar', 'foo', 'baz', 'foo']

>>> list(dict.fromkeys(keywords).keys())
['foo', 'bar', 'baz']

@jez 2018-12-21 21:29:28

Maybe worth mentioning that this also works (faster) with vanilla dict.fromkeys(). But in that case, key order is only preserved in CPython 3.6+ implementations, so OrderedDict is a more portable solution when order matters.

@Anwar Hossain 2019-02-06 19:14:57

won't work if the values are not string

@raratiru 2019-04-09 16:33:37

@AnwarHossain keys = (1,2,3,1,2,1) list(OrderedDict.fromkeys(keys).keys()) -> [1, 2, 3], python-3.7. It works.

@user474491 2019-09-07 06:14:03

Can we infer that Set in Python 3.7+ preserve order too ?

@shrewmouse 2019-09-11 14:16:52

This answers the actual question instead of jumping right into a work-around.

@Mahmoud Hashemi 2016-02-07 20:41:45

I can do you one better than an OrderedSet: boltons has a pure-Python, 2/3-compatible IndexedSet type that is not only an ordered set, but also supports indexing (as with lists).

Simply pip install boltons (or copy setutils.py into your codebase), import the IndexedSet and:

>>> from boltons.setutils import IndexedSet
>>> x = IndexedSet(list(range(4)) + list(range(8)))
>>> x
IndexedSet([0, 1, 2, 3, 4, 5, 6, 7])
>>> x - set(range(2))
IndexedSet([2, 3, 4, 5, 6, 7])
>>> x[-1]
7
>>> fcr = IndexedSet('freecreditreport.com')
>>> ''.join(fcr[:fcr.index('.')])
'frecditpo'

Everything is unique and retained in order. Full disclosure: I wrote the IndexedSet, but that also means you can bug me if there are any issues. :)

@Loïc N. 2018-07-16 02:40:13

So i also had a small list where i clearly had the possibility of introducing non-unique values.

I searched for the existence of a unique list of some sort, but then realized that testing the existence of the element before adding it works just fine.

if(not new_element in my_list):
    my_list.append(new_element)

I don't know if there are caveats to this simple approach, but it solves my problem.

@Draconis 2018-11-09 04:19:03

The main issue with this approach is that adding runs in O(n). Meaning it gets slower with big lists. Python's built-in sets are very good at making adding elements faster. But for simple use-cases, it certainly does work!

@Algebra 2017-12-06 10:50:34

There's no OrderedSet in official library. I make an exhaustive cheatsheet of all the data structure for your reference.

DataStructure = {
    'Collections': {
        'Map': [
            ('dict', 'OrderDict', 'defaultdict'),
            ('chainmap', 'types.MappingProxyType')
        ],
        'Set': [('set', 'frozenset'), {'multiset': 'collection.Counter'}]
    },
    'Sequence': {
        'Basic': ['list', 'tuple', 'iterator']
    },
    'Algorithm': {
        'Priority': ['heapq', 'queue.PriorityQueue'],
        'Queue': ['queue.Queue', 'multiprocessing.Queue'],
        'Stack': ['collection.deque', 'queue.LifeQueue']
        },
    'text_sequence': ['str', 'byte', 'bytearray']
}

@Casebash 2009-10-31 10:15:06

There is an ordered set (possible new link) recipe for this which is referred to from the Python 2 Documentation. This runs on Py2.6 or later and 3.0 or later without any modifications. The interface is almost exactly the same as a normal set, except that initialisation should be done with a list.

OrderedSet([1, 2, 3])

This is a MutableSet, so the signature for .union doesn't match that of set, but since it includes __or__ something similar can easily be added:

@staticmethod
def union(*sets):
    union = OrderedSet()
    union.union(*sets)
    return union

def union(self, *sets):
    for set in sets:
        self |= set

@Casebash 2010-12-10 00:59:33

I selected my own answer because the reference from the documentation makes this close to an official answer

@xApple 2012-12-14 12:48:33

The interface is NOT exactly the same as the normal set object, many essential methods are missing such as update, union, intersection.

@Geoffrey Hing 2014-02-24 16:35:03

FYI, I noticed that a slightly modified version of the recipe cited in this answer has been added to PyPi as "ordered-set"

@Kevin 2014-12-05 17:38:35

I'm pretty sure you're not allowed to have two methods both called union in the same class. The last one will "win" and the first one will fail to exist at runtime. This is because OrderedSet.union (no parens) has to refer to a single object.

@mbdevpl 2016-08-31 11:17:51

There is also "orderedset" package which is based on the same recipe but implemented in Cython -- pypi.python.org/pypi/orderedset .

@RichardB 2017-01-21 22:45:01

The ParallelRegression package provides a setList( ) ordered set class that is more method-complete than the options based on the ActiveState recipe. It supports all methods available for lists and most if not all methods available for sets.

@Berislav Lopac 2015-09-25 14:13:08

In case you're already using pandas in your code, its Index object behaves pretty like an ordered set, as shown in this article.

@Daniel K 2014-04-22 16:22:53

Implementations on PyPI

While others have pointed out that there is no built-in implementation of an insertion-order preserving set in Python (yet), I am feeling that this question is missing an answer which states what there is to be found on PyPI.

To the best of my knowledge there currently is:

Both implementations are based on the recipe posted by Raymond Hettinger to ActiveState which is also mentioned in other answers here. I have checked out both and identified the following

critical differences:

  • ordered-set (version 1.1)
    • advantage: O(1) for lookups by index (e.g. my_set[5])
    • disadvantage: remove(item) not implemented
  • oset (version 0.1.3)
    • advantage: O(1) for remove(item)
    • disadvantage: apparently O(n) for lookups by index

Both implementations have O(1) for add(item) and __contains__(item) (item in my_set).

Unfortunately neither implementation has method-based set operations like set1.union(set2) -> You have to use the operator-based form like set1 | set2 instead. See the Python documentation on Set Objects for a full list of set operation methods and their operator-based equivalents.

I first went with ordered-set until I used remove(item) for the first time which crashed my script with a NotImplementedError. As I have never used lookup by index so far, I meanwhile switched to oset.

If you know about other implementations on PyPI, let me know in the comments.

@timdiels 2016-03-16 23:20:18

A new contender is collections_extended.setlist. Functions like set.union don't work on it though, even though it inherits collections.abc.Set.

@warvariuc 2016-03-19 07:22:59

OrderedSet now supports remove

@Michael Lenzen 2015-01-20 18:46:11

A little late to the game, but I've written a class setlist as part of collections-extended that fully implements both Sequence and Set

>>> from collections_extended import setlist
>>> sl = setlist('abracadabra')
>>> sl
setlist(('a', 'b', 'r', 'c', 'd'))
>>> sl[3]
'c'
>>> sl[-1]
'd'
>>> 'r' in sl  # testing for inclusion is fast
True
>>> sl.index('d')  # so is finding the index of an element
4
>>> sl.insert(1, 'd')  # inserting an element already in raises a ValueError
ValueError
>>> sl.index('d')
4

GitHub: https://github.com/mlenzen/collections-extended

Documentation: http://collections-extended.lenzm.net/en/latest/

PyPI: https://pypi.python.org/pypi/collections-extended

@GrantJ 2014-09-23 06:52:26

If you're using the ordered set to maintain a sorted order, consider using a sorted set implementation from PyPI. The sortedcontainers module provides a SortedSet for just this purpose. Some benefits: pure-Python, fast-as-C implementations, 100% unit test coverage, hours of stress testing.

Installing from PyPI is easy with pip:

pip install sortedcontainers

Note that if you can't pip install, simply pull down the sortedlist.py and sortedset.py files from the open-source repository.

Once installed you can simply:

from sortedcontainers import SortedSet
help(SortedSet)

The sortedcontainers module also maintains a performance comparison with several alternative implementations.

For the comment that asked about Python's bag data type, there's alternatively a SortedList data type which can be used to efficiently implement a bag.

@gsnedders 2014-11-24 19:28:12

Note that the SortedSet class there requires members to be comparable and hashable.

@gotgenes 2015-01-29 19:23:04

@gsnedders The builtins set and frozenset also require elements to be hashable. The comparable constraint is the addition for SortedSet, but it's also an obvious constraint.

@ldmtwo 2018-11-06 00:32:15

As the name suggests, this does not maintain order. It is nothing but sorted(set([sequence])) which makes better?

@GrantJ 2018-11-06 17:50:23

@ldmtwo I'm not sure which you're referring to but just to be clear, SortedSet as part of Sorted Containers does maintain sorted order.

@Reinstate Monica 2019-04-10 14:00:45

@GrantJ - It is the difference between whether it maintains insertion order or sort order. Most of the other answers are regarding insertion order. I think you are already aware of this based on your first sentence, but it's probably what ldmtwo is saying.

@hwrd 2013-02-20 22:52:44

For many purposes simply calling sorted will suffice. For example

>>> s = set([0, 1, 2, 99, 4, 40, 3, 20, 24, 100, 60])
>>> sorted(s)
[0, 1, 2, 3, 4, 20, 24, 40, 60, 99, 100]

If you are going to use this repeatedly, there will be overhead incurred by calling the sorted function so you might want to save the resulting list, as long as you're done changing the set. If you need to maintain unique elements and sorted, I agree with the suggestion of using OrderedDict from collections with an arbitrary value such as None.

@Periodic Maintenance 2013-02-21 14:01:14

The purpose for OrderedSet is to be able to get the items in the order which they where added to the set. You example could maybe called SortedSet...

@Stephan202 2009-10-31 10:17:39

An ordered set is functionally a special case of an ordered dictionary.

The keys of a dictionary are unique. Thus, if one disregards the values in an ordered dictionary (e.g. by assigning them None), then one has essentially an ordered set.

As of Python 3.1 there is collections.OrderedDict. The following is an example implementation of an OrderedSet. (Note that only few methods need to be defined or overridden: collections.OrderedDict and collections.MutableSet do the heavy lifting.)

import collections

class OrderedSet(collections.OrderedDict, collections.MutableSet):

    def update(self, *args, **kwargs):
        if kwargs:
            raise TypeError("update() takes no keyword arguments")

        for s in args:
            for e in s:
                 self.add(e)

    def add(self, elem):
        self[elem] = None

    def discard(self, elem):
        self.pop(elem, None)

    def __le__(self, other):
        return all(e in other for e in self)

    def __lt__(self, other):
        return self <= other and self != other

    def __ge__(self, other):
        return all(e in self for e in other)

    def __gt__(self, other):
        return self >= other and self != other

    def __repr__(self):
        return 'OrderedSet([%s])' % (', '.join(map(repr, self.keys())))

    def __str__(self):
        return '{%s}' % (', '.join(map(repr, self.keys())))

    difference = property(lambda self: self.__sub__)
    difference_update = property(lambda self: self.__isub__)
    intersection = property(lambda self: self.__and__)
    intersection_update = property(lambda self: self.__iand__)
    issubset = property(lambda self: self.__le__)
    issuperset = property(lambda self: self.__ge__)
    symmetric_difference = property(lambda self: self.__xor__)
    symmetric_difference_update = property(lambda self: self.__ixor__)
    union = property(lambda self: self.__or__)

@Casebash 2009-10-31 11:01:41

A useful approximation for several purposes, but it doesn't have the nice set operations.

@Stephan202 2009-10-31 11:12:12

@Casebash: yes, one may want to define a class OrderedSet which subclasses OrderedDict and abc.Set and then define __len__, __iter__ and __contains__.

@u0b34a0f6ae 2009-10-31 14:58:50

@Stephan202: Regrettably, the collection ABCs live in collections, but otherwise a good suggestion

@Stephan202 2009-11-01 15:57:10

@kaizer.se: right you are. I now posted an example implementation. Turns out my previous comment was not completely correct, but the posted code should speak for itself.

@Daniel Kats 2012-10-03 15:11:53

This is true, but you do have a lot of wasted space as a result, which leads to suboptimal performance.

@Nurbldoff 2013-09-18 12:11:17

An addition; collections.OrderedDict is also available in python 2.7.

@xApple 2017-04-28 13:06:35

Doing OrderedSet([1,2,3]) raises a TypeError. How does the constructor even work? Missing usage example.

Related Questions

Sponsored Content

10 Answered Questions

[SOLVED] Does Python have a string 'contains' substring method?

16 Answered Questions

[SOLVED] What are metaclasses in Python?

62 Answered Questions

[SOLVED] Calling an external command from Python

23 Answered Questions

[SOLVED] Does Python have a ternary conditional operator?

41 Answered Questions

[SOLVED] How do I merge two dictionaries in a single expression?

29 Answered Questions

[SOLVED] What does if __name__ == "__main__": do?

38 Answered Questions

[SOLVED] What does the "yield" keyword do?

14 Answered Questions

[SOLVED] Add new keys to a dictionary?

10 Answered Questions

[SOLVED] Iterating over dictionaries using 'for' loops

34 Answered Questions

[SOLVED] How do I sort a dictionary by value?

Sponsored Content