By Chris_Rands

2016-10-11 14:59:23 8 Comments

Dictionaries are ordered in Python 3.6 (under the CPython implementation at least) unlike in previous incarnations. This seems like a substantial change, but it's only a short paragraph in the documentation. It is described as a CPython implementation detail rather than a language feature, but also implies this may become standard in the future.

How does the new dictionary implementation perform better than the older one while preserving element order?

Here is the text from the documentation:

dict() now uses a “compact” representation pioneered by PyPy. The memory usage of the new dict() is between 20% and 25% smaller compared to Python 3.5. PEP 468 (Preserving the order of **kwargs in a function.) is implemented by this. The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon (this may change in the future, but it is desired to have this new dict implementation in the language for a few releases before changing the language spec to mandate order-preserving semantics for all current and future Python implementations; this also helps preserve backwards-compatibility with older versions of the language where random iteration order is still in effect, e.g. Python 3.5). (Contributed by INADA Naoki in issue 27350. Idea originally suggested by Raymond Hettinger.)

Update December 2017: dicts retaining insertion order is guaranteed for Python 3.7


@Dimitris Fasarakis Hilliard 2016-10-11 15:17:53

Are dictionaries ordered in Python 3.6+?

They are insertion ordered[1]. As of Python 3.6, for the CPython implementation of Python, dictionaries remember the order of items inserted. This is considered an implementation detail in Python 3.6; you need to use OrderedDict if you want insertion ordering that's guaranteed across other implementations of Python (and other ordered behavior[1]).

As of Python 3.7, this is no longer an implementation detail and instead becomes a language feature. From a python-dev message by GvR:

Make it so. "Dict keeps insertion order" is the ruling. Thanks!

This simply means that you can depend on it. Other implementations of Python must also offer an insertion ordered dictionary if they wish to be a conforming implementation of Python 3.7.

How does the Python 3.6 dictionary implementation perform better[2] than the older one while preserving element order?

Essentially, by keeping two arrays.

  • The first array, dk_entries, holds the entries (of type PyDictKeyEntry) for the dictionary in the order that they were inserted. Preserving order is achieved by this being an append only array where new items are always inserted at the end (insertion order).

  • The second, dk_indices, holds the indices for the dk_entries array (that is, values that indicate the position of the corresponding entry in dk_entries). This array acts as the hash table. When a key is hashed it leads to one of the indices stored in dk_indices and the corresponding entry is fetched by indexing dk_entries. Since only indices are kept, the type of this array depends on the overall size of the dictionary (ranging from type int8_t(1 byte) to int32_t/int64_t (4/8 bytes) on 32/64 bit builds)

In the previous implementation, a sparse array of type PyDictKeyEntry and size dk_size had to be allocated; unfortunately, it also resulted in a lot of empty space since that array was not allowed to be more than 2/3 * dk_size full for performance reasons. (and the empty space still had PyDictKeyEntry size!).

This is not the case now since only the required entries are stored (those that have been inserted) and a sparse array of type intX_t (X depending on dict size) 2/3 * dk_sizes full is kept. The empty space changed from type PyDictKeyEntry to intX_t.

So, obviously, creating a sparse array of type PyDictKeyEntry is much more memory demanding than a sparse array for storing ints.

You can see the full conversation on Python-Dev regarding this feature if interested, it is a good read.

In the original proposal made by Raymond Hettinger, a visualization of the data structures used can be seen which captures the gist of the idea.

For example, the dictionary:

d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}

is currently stored as [keyhash, key, value]:

entries = [['--', '--', '--'],
           [-8522787127447073495, 'barry', 'green'],
           ['--', '--', '--'],
           ['--', '--', '--'],
           ['--', '--', '--'],
           [-9092791511155847987, 'timmy', 'red'],
           ['--', '--', '--'],
           [-6480567542315338377, 'guido', 'blue']]

Instead, the data should be organized as follows:

indices =  [None, 1, None, None, None, 0, None, 2]
entries =  [[-9092791511155847987, 'timmy', 'red'],
            [-8522787127447073495, 'barry', 'green'],
            [-6480567542315338377, 'guido', 'blue']]

As you can visually now see, in the original proposal, a lot of space is essentially empty to reduce collisions and make look-ups faster. With the new approach, you reduce the memory required by moving the sparseness where it's really required, in the indices.

[1]: I say "insertion ordered" and not "ordered" since, with the existence of OrderedDict, "ordered" suggests further behavior that the dict object doesn't provide. OrderedDicts are reversible, provide order sensitive methods and, mainly, provide an order-sensive equality tests (==, !=). dicts currently don't offer any of those behaviors/methods.

[2]: The new dictionary implementations performs better memory wise by being designed more compactly; that's the main benefit here. Speed wise, the difference isn't so drastic, there's places where the new dict might introduce slight regressions (key-lookups, for example) while in others (iteration and resizing come to mind) a performance boost should be present.

Overall, the performance of the dictionary, especially in real-life situations, improves due to the compactness introduced.

@njzk2 2016-10-11 19:19:15

So, what happens when an item is removed? is the entries list resized? or is a blank space kept? or is it compressed from time to time?

@Dimitris Fasarakis Hilliard 2016-10-11 20:03:10

@njzk2 When an item is removed, the corresponding index is replaced by DKIX_DUMMY with a value of -2 and the entry in the entry array replaced by NULL, when inserting is performed the new values are appended to the entries array, Haven't been able to discern yet, but pretty sure when the indices fills up beyond the 2/3 threshold resizing is performed. This can lead to shrinking instead of growing if many DUMMY entries exist.

@Chris_Rands 2017-03-13 22:29:52

Have you noticed any difference speed wise with the new dict implementation?

@Dimitris Fasarakis Hilliard 2017-03-14 13:26:29

@Chris_Rands Nope, the only actual regression I've seen is on the tracker in a message by Victor. Other than that microbenchmark, I've seen no other issue/message indicating a serious speed difference in real-life work loads. There's places where the new dict might introduce slight regressions (key-lookups, for example) while in others (iteration and resizing come to mind) a performance boost would be present.

@Dimitris Fasarakis Hilliard 2017-08-02 19:47:06

Correction on the resizing part: Dictionaries don't resize when you delete items, they re-calculate when you re-insert. So, if a dict is created with d = {i:i for i in range(100)} and you .pop all items w/o inserting, the size won't change. When you add to it again, d[1] = 1, the appropriate size is calculated and the dict resizes.

@Chen A. 2017-09-03 18:13:37

@JimFasarakisHilliard what are the values in the indices list? how 0, 1, 2 is translated to the actual object? is it just for clarity, or that is the actual value inside that list? I thought it would hold the hash value of the key

@Chris_Rands 2018-04-09 20:55:57

Any thoughts on what will happen to OrderedDict in the future, I guess it will be maintained for backward compatibility? Currently OrderedDict supports reversed() iteration and the OrderedDict.move_to_end() method, but maybe these will be added to plain dict too?

@Dimitris Fasarakis Hilliard 2018-04-10 16:57:21

@Chris_Rands I'm pretty sure it is staying. The thing is, and the reason why I changed my answer to remove blanket statements about 'dict being ordered', dicts aren't ordered in the sense OrderedDicts are. The notable issue is equality. dicts have order insensitive ==, OrderedDicts have order sensitive ones. Dumping OrderedDicts and changing dicts to now have order sensitive comparisons could lead to a lot of breakage in old code. I'm guessing the only thing that might change about OrderedDicts is its implementation.

@Dimitris Fasarakis Hilliard 2018-04-10 16:59:41

A related S.O discussion can be found here.

@sinwoobang 2018-08-29 06:01:02

@JimFasarakisHilliard Thank you for your detail answer. I translated it to Korean and distributed to Facebook groups Python Korea. Many of Pythonista in Korea got help from your post. Thank you again.

@ShadowRanger 2019-06-19 01:34:09

@Chris_Rands: FYI, reversed support is coming in Python 3.8. I don't expect to see move_to_end though; the compact ordered dict design doesn't support that as well (it would have to leave a dummy entry behind, and find and update the associated index entry each time) eventually either needlessly expanding the entries array, or forcing a complete rehash to clear dummies. By comparison, OrderedDict just tweaks a couple pointers. Any algorithm that relies on move_to_end should use OrderedDict; adding support to dict would encourage bad code.

@tonix 2019-10-13 09:33:42

What do -9092791511155847987, -8522787127447073495 and -6480567542315338377 mean?

@naught101 2019-11-19 00:04:54

Is creation order still not guaranteed in 3.7? e.g. a = {'one': 1, 'two': 2}?

@Dimitris Fasarakis Hilliard 2019-11-21 07:48:05

@tonix those are the hash values for the dictionary keys.

@ShadowRanger 2020-02-20 21:00:27

@naught101: Creation order is guaranteed in 3.7; in your example, 'one' will always iterate before 'two' (where before 3.6 it would change from run to run, thanks to the per-run hash seed used to perturb string hashes to defend against denial-of-service attacks using keys that could otherwise be crafted to have identical hashes).

@rkengler 2019-07-26 14:38:59

I wanted to add to the discussion above but don't have the reputation to comment.

Python 3.8 is not quite released yet, but it will even include the reversed() function on dictionaries (removing another difference from OrderedDict.

Dict and dictviews are now iterable in reversed insertion order using reversed(). (Contributed by Rémi Lapeyre in bpo-33462.) See what's new in python 3.8

I don't see any mention of the equality operator or other features of OrderedDict so they are still not entirely the same.

@fjsj 2017-12-15 17:24:53

Update: Guido van Rossum announced on the mailing list that as of Python 3.7 dicts in all Python implementations must preserve insertion order.

@Jonny Waffles 2018-06-14 18:54:31

Now that key ordering is the official standard, what is the purpose of the OrderedDict? Or, is it now redundant?

@fjsj 2018-06-15 20:05:13

I guess OrderedDict will not be redundant because it has the move_to_end method and its equality is order sensitive:…. See the note on Jim Fasarakis Hilliard's answer.

@Chris_Rands 2018-06-25 21:15:16

@JonnyWaffles see Jim’s answer and this Q&A…

@boatcoder 2018-06-28 17:51:36

If you want your code to run the same on 2.7 and 3.6/3.7+, you need to use OrderedDict

@ZF007 2019-06-03 00:18:45

Likely there will be an "UnorderedDict" soon for folks that like to hassle their dicts for security reasons ;p

@Maresh 2016-10-11 15:09:00

Below is answering the original first question:

Should I use dict or OrderedDict in Python 3.6?

I think this sentence from the documentation is actually enough to answer your question

The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon

dict is not explicitly meant to be an ordered collection, so if you want to stay consistent and not rely on a side effect of the new implementation you should stick with OrderedDict.

Make your code future proof :)

There's a debate about that here.

EDIT: Python 3.7 will keep this as a feature see

@xji 2016-10-20 21:55:41

Seems that if they didn't mean it to be a real feature but only an implementation detail then they shouldn't even put it into the documentation then.

@Chris_Rands 2017-12-20 09:23:01

I'm not sure about your edit caveat; since the guarantee only applies for Python 3.7, I assume the advice for Python 3.6 is unchanged, i.e. dicts are ordered in CPython but don't count on it

Related Questions

Sponsored Content

45 Answered Questions

[SOLVED] How do I merge two dictionaries in a single expression in Python?

16 Answered Questions

[SOLVED] How can I add new keys to a dictionary?

26 Answered Questions

[SOLVED] Does Python have a ternary conditional operator?

29 Answered Questions

[SOLVED] What is the best way to iterate over a dictionary?

  • 2008-09-26 18:20:06
  • Jake Stewart
  • 1606475 View
  • 2625 Score
  • 29 Answer
  • Tags:   c# dictionary loops

62 Answered Questions

[SOLVED] Calling an external command from Python

11 Answered Questions

[SOLVED] Iterating over dictionaries using 'for' loops

10 Answered Questions

[SOLVED] Why is "1000000000000000 in range(1000000000000001)" so fast in Python 3?

20 Answered Questions

[SOLVED] What are metaclasses in Python?

34 Answered Questions

[SOLVED] How do I sort a dictionary by value?

16 Answered Questions

[SOLVED] Check if a given key already exists in a dictionary

  • 2009-10-21 19:05:09
  • Mohan Gulati
  • 3584807 View
  • 2683 Score
  • 16 Answer
  • Tags:   python dictionary

Sponsored Content