By YGA


2009-03-11 17:09:21 8 Comments

I have a data structure which essentially amounts to a nested dictionary. Let's say it looks like this:

{'new jersey': {'mercer county': {'plumbers': 3,
                                  'programmers': 81},
                'middlesex county': {'programmers': 81,
                                     'salesmen': 62}},
 'new york': {'queens county': {'plumbers': 9,
                                'salesmen': 36}}}

Now, maintaining and creating this is pretty painful; every time I have a new state/county/profession I have to create the lower layer dictionaries via obnoxious try/catch blocks. Moreover, I have to create annoying nested iterators if I want to go over all the values.

I could also use tuples as keys, like such:

{('new jersey', 'mercer county', 'plumbers'): 3,
 ('new jersey', 'mercer county', 'programmers'): 81,
 ('new jersey', 'middlesex county', 'programmers'): 81,
 ('new jersey', 'middlesex county', 'salesmen'): 62,
 ('new york', 'queens county', 'plumbers'): 9,
 ('new york', 'queens county', 'salesmen'): 36}

This makes iterating over the values very simple and natural, but it is more syntactically painful to do things like aggregations and looking at subsets of the dictionary (e.g. if I just want to go state-by-state).

Basically, sometimes I want to think of a nested dictionary as a flat dictionary, and sometimes I want to think of it indeed as a complex hierarchy. I could wrap this all in a class, but it seems like someone might have done this already. Alternatively, it seems like there might be some really elegant syntactical constructions to do this.

How could I do this better?

Addendum: I'm aware of setdefault() but it doesn't really make for clean syntax. Also, each sub-dictionary you create still needs to have setdefault() manually set.

20 comments

@Aaron Hall 2013-11-07 06:53:24

What is the best way to implement nested dictionaries in Python?

Implement __missing__ on a dict subclass to set and return a new instance.

This approach has been available (and documented) since Python 2.5, and (particularly valuable to me) it pretty prints just like a normal dict, instead of the ugly printing of an autovivified defaultdict:

class Vividict(dict):
    def __missing__(self, key):
        value = self[key] = type(self)() # retain local pointer to value
        return value                     # faster to return than dict lookup

(Note self[key] is on the left-hand side of assignment, so there's no recursion here.)

and say you have some data:

data = {('new jersey', 'mercer county', 'plumbers'): 3,
        ('new jersey', 'mercer county', 'programmers'): 81,
        ('new jersey', 'middlesex county', 'programmers'): 81,
        ('new jersey', 'middlesex county', 'salesmen'): 62,
        ('new york', 'queens county', 'plumbers'): 9,
        ('new york', 'queens county', 'salesmen'): 36}

Here's our usage code:

vividict = Vividict()
for (state, county, occupation), number in data.items():
    vividict[state][county][occupation] = number

And now:

>>> import pprint
>>> pprint.pprint(vividict, width=40)
{'new jersey': {'mercer county': {'plumbers': 3,
                                  'programmers': 81},
                'middlesex county': {'programmers': 81,
                                     'salesmen': 62}},
 'new york': {'queens county': {'plumbers': 9,
                                'salesmen': 36}}}

Criticism

A criticism of this type of container is that if the user misspells a key, our code could fail silently:

>>> vividict['new york']['queens counyt']
{}

And additionally now we'd have a misspelled county in our data:

>>> pprint.pprint(vividict, width=40)
{'new jersey': {'mercer county': {'plumbers': 3,
                                  'programmers': 81},
                'middlesex county': {'programmers': 81,
                                     'salesmen': 62}},
 'new york': {'queens county': {'plumbers': 9,
                                'salesmen': 36},
              'queens counyt': {}}}

Explanation:

We're just providing another nested instance of our class Vividict whenever a key is accessed but missing. (Returning the value assignment is useful because it avoids us additionally calling the getter on the dict, and unfortunately, we can't return it as it is being set.)

Note, these are the same semantics as the most upvoted answer but in half the lines of code - nosklo's implementation:

class AutoVivification(dict):
    """Implementation of perl's autovivification feature."""
    def __getitem__(self, item):
        try:
            return dict.__getitem__(self, item)
        except KeyError:
            value = self[item] = type(self)()
            return value

Demonstration of Usage

Below is just an example of how this dict could be easily used to create a nested dict structure on the fly. This can quickly create a hierarchical tree structure as deeply as you might want to go.

import pprint

class Vividict(dict):
    def __missing__(self, key):
        value = self[key] = type(self)()
        return value

d = Vividict()

d['foo']['bar']
d['foo']['baz']
d['fizz']['buzz']
d['primary']['secondary']['tertiary']['quaternary']
pprint.pprint(d)

Which outputs:

{'fizz': {'buzz': {}},
 'foo': {'bar': {}, 'baz': {}},
 'primary': {'secondary': {'tertiary': {'quaternary': {}}}}}

And as the last line shows, it pretty prints beautifully and in order for manual inspection. But if you want to visually inspect your data, implementing __missing__ to set a new instance of its class to the key and return it is a far better solution.

Other alternatives, for contrast:

dict.setdefault

Although the asker thinks this isn't clean, I find it preferable to the Vividict myself.

d = {} # or dict()
for (state, county, occupation), number in data.items():
    d.setdefault(state, {}).setdefault(county, {})[occupation] = number

and now:

>>> pprint.pprint(d, width=40)
{'new jersey': {'mercer county': {'plumbers': 3,
                                  'programmers': 81},
                'middlesex county': {'programmers': 81,
                                     'salesmen': 62}},
 'new york': {'queens county': {'plumbers': 9,
                                'salesmen': 36}}}

A misspelling would fail noisily, and not clutter our data with bad information:

>>> d['new york']['queens counyt']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'queens counyt'

Additionally, I think setdefault works great when used in loops and you don't know what you're going to get for keys, but repetitive usage becomes quite burdensome, and I don't think anyone would want to keep up the following:

d = dict()

d.setdefault('foo', {}).setdefault('bar', {})
d.setdefault('foo', {}).setdefault('baz', {})
d.setdefault('fizz', {}).setdefault('buzz', {})
d.setdefault('primary', {}).setdefault('secondary', {}).setdefault('tertiary', {}).setdefault('quaternary', {})

Another criticism is that setdefault requires a new instance whether it is used or not. However, Python (or at least CPython) is rather smart about handling unused and unreferenced new instances, for example, it reuses the location in memory:

>>> id({}), id({}), id({})
(523575344, 523575344, 523575344)

An auto-vivified defaultdict

This is a neat looking implementation, and usage in a script that you're not inspecting the data on would be as useful as implementing __missing__:

from collections import defaultdict

def vivdict():
    return defaultdict(vivdict)

But if you need to inspect your data, the results of an auto-vivified defaultdict populated with data in the same way looks like this:

>>> d = vivdict(); d['foo']['bar']; d['foo']['baz']; d['fizz']['buzz']; d['primary']['secondary']['tertiary']['quaternary']; import pprint; 
>>> pprint.pprint(d)
defaultdict(<function vivdict at 0x17B01870>, {'foo': defaultdict(<function vivdict 
at 0x17B01870>, {'baz': defaultdict(<function vivdict at 0x17B01870>, {}), 'bar': 
defaultdict(<function vivdict at 0x17B01870>, {})}), 'primary': defaultdict(<function 
vivdict at 0x17B01870>, {'secondary': defaultdict(<function vivdict at 0x17B01870>, 
{'tertiary': defaultdict(<function vivdict at 0x17B01870>, {'quaternary': defaultdict(
<function vivdict at 0x17B01870>, {})})})}), 'fizz': defaultdict(<function vivdict at 
0x17B01870>, {'buzz': defaultdict(<function vivdict at 0x17B01870>, {})})})

This output is quite inelegant, and the results are quite unreadable. The solution typically given is to recursively convert back to a dict for manual inspection. This non-trivial solution is left as an exercise for the reader.

Performance

Finally, let's look at performance. I'm subtracting the costs of instantiation.

>>> import timeit
>>> min(timeit.repeat(lambda: {}.setdefault('foo', {}))) - min(timeit.repeat(lambda: {}))
0.13612580299377441
>>> min(timeit.repeat(lambda: vivdict()['foo'])) - min(timeit.repeat(lambda: vivdict()))
0.2936999797821045
>>> min(timeit.repeat(lambda: Vividict()['foo'])) - min(timeit.repeat(lambda: Vividict()))
0.5354437828063965
>>> min(timeit.repeat(lambda: AutoVivification()['foo'])) - min(timeit.repeat(lambda: AutoVivification()))
2.138362169265747

Based on performance, dict.setdefault works the best. I'd highly recommend it for production code, in cases where you care about execution speed.

If you need this for interactive use (in an IPython notebook, perhaps) then performance doesn't really matter - in which case, I'd go with Vividict for readability of the output. Compared to the AutoVivification object (which uses __getitem__ instead of __missing__, which was made for this purpose) it is far superior.

Conclusion

Implementing __missing__ on a subclassed dict to set and return a new instance is slightly more difficult than alternatives but has the benefits of

  • easy instantiation
  • easy data population
  • easy data viewing

and because it is less complicated and more performant than modifying __getitem__, it should be preferred to that method.

Nevertheless, it has drawbacks:

  • Bad lookups will fail silently.
  • The bad lookup will remain in the dictionary.

Thus I personally prefer setdefault to the other solutions, and have in every situation where I have needed this sort of behavior.

@Eric Duminil 2017-12-27 17:22:47

Excellent answer! Is there any way to specify a finite depth and a leaf type for a Vividict? E.g. 3 and list for a dict of dict of dict of lists which could be populated with d['primary']['secondary']['tertiary'].append(element). I could define 3 different classes for each depth but I'd love to find a cleaner solution.

@Aaron Hall 2017-12-27 18:57:00

@EricDuminil d['primary']['secondary'].setdefault('tertiary', []).append('element') - ?? Thanks for the compliment, but let me be honest - I never actually use __missing__ - I always use setdefault. I should probably update my conclusion/intro...

@nehemiah 2019-03-12 03:47:21

@AaronHall Seems this case is not covered, imgur.com/a/9LRLyhL

@nehemiah 2019-03-13 01:18:32

@AaronHall The correct behavior is the code should create a dict if needed. In this case by overriding the previous assigned value.

@nehemiah 2019-03-13 02:56:05

@AaronHall Also can you help me to understand what is meant by The bad lookup will remain in the dictionary. as I'm considering using this solution?. Much appreciated. Thx

@nehemiah 2019-03-13 04:56:55

@AaronHall The problem with it would fail setdefault when it nested more than two levels of depth. Looks like no structure in Python can offer true vivification as described. I had to settle for two stating methods one for get_nested & one for set_nested which accept a reference for dict and list of nested attributes.

@Yuda Prawira 2017-10-23 14:27:01

I used to use this function. its safe, quick, easily maintainable.

def deep_get(dictionary, keys, default=None):
    return reduce(lambda d, key: d.get(key, default) if isinstance(d, dict) else default, keys.split("."), dictionary)

Example :

>>> from functools import reduce
>>> def deep_get(dictionary, keys, default=None):
...     return reduce(lambda d, key: d.get(key, default) if isinstance(d, dict) else default, keys.split("."), dictionary)
...
>>> person = {'person':{'name':{'first':'John'}}}
>>> print (deep_get(person, "person.name.first"))
John
>>> print (deep_get(person, "person.name.lastname"))
None
>>> print (deep_get(person, "person.name.lastname", default="No lastname"))
No lastname
>>>

@Paula 2011-03-07 16:48:34

defaultdict() is your friend!

For a two dimensional dictionary you can do:

d = defaultdict(defaultdict)
d[1][2] = 3

For more dimensions you can:

d = defaultdict(lambda :defaultdict(defaultdict))
d[1][2][3] = 4

@A-B-B 2017-06-03 21:59:06

This answer works for only three levels at best. For arbitrary levels, consider this answer.

@topkara 2016-04-09 04:25:21

You can use recursion in lambdas and defaultdict, no need to define names:

a = defaultdict((lambda f: f(f))(lambda g: lambda:defaultdict(g(g))))

Here's an example:

>>> a['new jersey']['mercer county']['plumbers']=3
>>> a['new jersey']['middlesex county']['programmers']=81
>>> a['new jersey']['mercer county']['programmers']=81
>>> a['new jersey']['middlesex county']['salesmen']=62
>>> a
defaultdict(<function __main__.<lambda>>,
        {'new jersey': defaultdict(<function __main__.<lambda>>,
                     {'mercer county': defaultdict(<function __main__.<lambda>>,
                                  {'plumbers': 3, 'programmers': 81}),
                      'middlesex county': defaultdict(<function __main__.<lambda>>,
                                  {'programmers': 81, 'salesmen': 62})})})

@JnBrymn 2016-01-21 18:50:09

You can use Addict: https://github.com/mewwts/addict

>>> from addict import Dict
>>> my_new_shiny_dict = Dict()
>>> my_new_shiny_dict.a.b.c.d.e = 2
>>> my_new_shiny_dict
{'a': {'b': {'c': {'d': {'e': 2}}}}}

@JnBrymn 2014-03-29 22:53:36

This is a function that returns a nested dictionary of arbitrary depth:

from collections import defaultdict
def make_dict():
    return defaultdict(make_dict)

Use it like this:

d=defaultdict(make_dict)
d["food"]["meat"]="beef"
d["food"]["veggie"]="corn"
d["food"]["sweets"]="ice cream"
d["animal"]["pet"]["dog"]="collie"
d["animal"]["pet"]["cat"]="tabby"
d["animal"]["farm animal"]="chicken"

Iterate through everything with something like this:

def iter_all(d,depth=1):
    for k,v in d.iteritems():
        print "-"*depth,k
        if type(v) is defaultdict:
            iter_all(v,depth+1)
        else:
            print "-"*(depth+1),v

iter_all(d)

This prints out:

- food
-- sweets
--- ice cream
-- meat
--- beef
-- veggie
--- corn
- animal
-- pet
--- dog
---- labrador
--- cat
---- tabby
-- farm animal
--- chicken

You might eventually want to make it so that new items can not be added to the dict. It's easy to recursively convert all these defaultdicts to normal dicts.

def dictify(d):
    for k,v in d.iteritems():
        if isinstance(v,defaultdict):
            d[k] = dictify(v)
    return dict(d)

@uzi 2012-10-19 18:47:35

I have a similar thing going. I have a lot of cases where I do:

thedict = {}
for item in ('foo', 'bar', 'baz'):
  mydict = thedict.get(item, {})
  mydict = get_value_for(item)
  thedict[item] = mydict

But going many levels deep. It's the ".get(item, {})" that's the key as it'll make another dictionary if there isn't one already. Meanwhile, I've been thinking of ways to deal with this better. Right now, there's a lot of

value = mydict.get('foo', {}).get('bar', {}).get('baz', 0)

So instead, I made:

def dictgetter(thedict, default, *args):
  totalargs = len(args)
  for i,arg in enumerate(args):
    if i+1 == totalargs:
      thedict = thedict.get(arg, default)
    else:
      thedict = thedict.get(arg, {})
  return thedict

Which has the same effect if you do:

value = dictgetter(mydict, 0, 'foo', 'bar', 'baz')

Better? I think so.

@Aaron Maenpaa 2009-03-11 17:19:27

I like the idea of wrapping this in a class and implementing __getitem__ and __setitem__ such that they implemented a simple query language:

>>> d['new jersey/mercer county/plumbers'] = 3
>>> d['new jersey/mercer county/programmers'] = 81
>>> d['new jersey/mercer county/programmers']
81
>>> d['new jersey/mercer country']
<view which implicitly adds 'new jersey/mercer county' to queries/mutations>

If you wanted to get fancy you could also implement something like:

>>> d['*/*/programmers']
<view which would contain 'programmers' entries>

but mostly I think such a thing would be really fun to implement :D

@YGA 2009-03-11 17:38:58

I think this is a bad idea -- you can never predict the syntax of keys. You would still override getitem and setitem but have them take tuples.

@Aaron Maenpaa 2009-03-11 18:25:15

@YGA You're probably right, but it's fun to think about implementing mini languages like this.

@S.K. Venkat 2018-02-26 12:44:20

like the fun way of implementation....

@andygeers 2009-03-11 17:14:19

I find setdefault quite useful; It checks if a key is present and adds it if not:

d = {}
d.setdefault('new jersey', {}).setdefault('mercer county', {})['plumbers'] = 3

setdefault always returns the relevant key, so you are actually updating the values of 'd' in place.

When it comes to iterating, I'm sure you could write a generator easily enough if one doesn't already exist in Python:

def iterateStates(d):
    # Let's count up the total number of "plumbers" / "dentists" / etc.
    # across all counties and states
    job_totals = {}

    # I guess this is the annoying nested stuff you were talking about?
    for (state, counties) in d.iteritems():
        for (county, jobs) in counties.iteritems():
            for (job, num) in jobs.iteritems():
                # If job isn't already in job_totals, default it to zero
                job_totals[job] = job_totals.get(job, 0) + num

    # Now return an iterator of (job, number) tuples
    return job_totals.iteritems()

# Display all jobs
for (job, num) in iterateStates(d):
    print "There are %d %s in total" % (job, num)

@dfrankow 2011-03-08 19:31:53

I like this solution but when I try: count.setdefault(a, {}).setdefault(b, {}).setdefault(c, 0) += 1 I get "illegal expression for augmented assignment"

@user26294 2009-03-11 20:02:11

If the number of nesting levels is small, I use collections.defaultdict for this:

from collections import defaultdict

def nested_dict_factory(): 
  return defaultdict(int)
def nested_dict_factory2(): 
  return defaultdict(nested_dict_factory)
db = defaultdict(nested_dict_factory2)

db['new jersey']['mercer county']['plumbers'] = 3
db['new jersey']['mercer county']['programmers'] = 81

Using defaultdict like this avoids a lot of messy setdefault(), get(), etc.

@John Fouhy 2009-03-11 23:13:02

+1: defaultdict is one of my all-time favourite additions to python. No more .setdefault()!

@A. Coady 2009-03-12 06:27:52

collections.defaultdict can be sub-classed to make a nested dict. Then add any useful iteration methods to that class.

>>> from collections import defaultdict
>>> class nesteddict(defaultdict):
    def __init__(self):
        defaultdict.__init__(self, nesteddict)
    def walk(self):
        for key, value in self.iteritems():
            if isinstance(value, nesteddict):
                for tup in value.walk():
                    yield (key,) + tup
            else:
                yield key, value


>>> nd = nesteddict()
>>> nd['new jersey']['mercer county']['plumbers'] = 3
>>> nd['new jersey']['mercer county']['programmers'] = 81
>>> nd['new jersey']['middlesex county']['programmers'] = 81
>>> nd['new jersey']['middlesex county']['salesmen'] = 62
>>> nd['new york']['queens county']['plumbers'] = 9
>>> nd['new york']['queens county']['salesmen'] = 36
>>> for tup in nd.walk():
    print tup


('new jersey', 'mercer county', 'programmers', 81)
('new jersey', 'mercer county', 'plumbers', 3)
('new jersey', 'middlesex county', 'programmers', 81)
('new jersey', 'middlesex county', 'salesmen', 62)
('new york', 'queens county', 'salesmen', 36)
('new york', 'queens county', 'plumbers', 9)

@YGA 2009-03-14 05:58:49

This is the answer that comes closest to what I was looking for. But ideally there would be all sorts of helper functions, e.g. walk_keys() or such. I'm surprised there's nothing in the standard libraries to do this.

@Pete 2009-03-11 20:08:17

You could create a YAML file and read it in using PyYaml.

Step 1: Create a YAML file, "employment.yml":

new jersey:
  mercer county:
    pumbers: 3
    programmers: 81
  middlesex county:
    salesmen: 62
    programmers: 81
new york:
  queens county:
    plumbers: 9
    salesmen: 36

Step 2: Read it in Python

import yaml
file_handle = open("employment.yml")
my_shnazzy_dictionary = yaml.safe_load(file_handle)
file_handle.close()

and now my_shnazzy_dictionary has all your values. If you needed to do this on the fly, you can create the YAML as a string and feed that into yaml.safe_load(...).

@kmelvn 2009-03-11 20:49:11

YAML is my definitely my choice for inputting lots of deeply nested data (and configuration files, databaes mockups, etc...). If the OP doesn't want extra files lying around, just use a regular Python string in some file and parse that with YAML.

@Pete 2009-03-11 20:52:02

Good point on creating YAML strings: This would be a much cleaner approach than using the "tempfile" module repeatedly.

@paint can 2011-09-19 19:51:07

Just because I haven't seen one this small, here's a dict that gets as nested as you like, no sweat:

# yo dawg, i heard you liked dicts                                                                      
def yodict():
    return defaultdict(yodict)

@wberry 2012-07-07 20:21:53

+1 A variant of this answer is equivalent to the accepted answer (as far as I can tell) and more concise: Vdict = lambda *args, **kwargs: defaultdict(Vdict, *args, **kwargs)

@martineau 2013-08-14 18:27:27

@wberry: Actually all you need is yodict = lambda: defaultdict(yodict).

@wberry 2013-08-15 04:46:10

The accepted version is a subclass of dict, so to be fully equivalent we would need x = Vdict(a=1, b=2) to work.

@martineau 2014-03-21 17:43:46

@wberry: Irrespective of what's in the accepted answer, being a subclass of dict wasn't a requirement stated by the OP, who only asked for the "best way" to implement them -- and besides, it doesn't/shouldn't matter that much in Python anyway.

@nosklo 2009-03-16 21:53:36

class AutoVivification(dict):
    """Implementation of perl's autovivification feature."""
    def __getitem__(self, item):
        try:
            return dict.__getitem__(self, item)
        except KeyError:
            value = self[item] = type(self)()
            return value

Testing:

a = AutoVivification()

a[1][2][3] = 4
a[1][3][3] = 5
a[1][2]['test'] = 6

print a

Output:

{1: {2: {'test': 6, 3: 4}, 3: {3: 5}}}

@jason 2019-02-11 04:38:27

Anybody have this problem when they moved to python 3.x? stackoverflow.com/questions/54622935/…

@nosklo 2019-02-11 12:09:32

@jason pickle is terrible between python versions. Avoid using it to store data you want to keep. Use it only for caches and stuff you can dump and regenerate at will. Not as a long term storage or serialization method.

@jason 2019-02-11 14:39:47

What do you use to store these objects? My autovivification object contains just pandas dataframes and string.

@nosklo 2019-02-12 06:37:20

@jason Depending on the data, I like to use JSON, csv files, or even a sqlite database to store it.

@Roberto Bonvallet 2009-03-13 20:24:47

As others have suggested, a relational database could be more useful to you. You can use a in-memory sqlite3 database as a data structure to create tables and then query them.

import sqlite3

c = sqlite3.Connection(':memory:')
c.execute('CREATE TABLE jobs (state, county, title, count)')

c.executemany('insert into jobs values (?, ?, ?, ?)', [
    ('New Jersey', 'Mercer County',    'Programmers', 81),
    ('New Jersey', 'Mercer County',    'Plumbers',     3),
    ('New Jersey', 'Middlesex County', 'Programmers', 81),
    ('New Jersey', 'Middlesex County', 'Salesmen',    62),
    ('New York',   'Queens County',    'Salesmen',    36),
    ('New York',   'Queens County',    'Plumbers',     9),
])

# some example queries
print list(c.execute('SELECT * FROM jobs WHERE county = "Queens County"'))
print list(c.execute('SELECT SUM(count) FROM jobs WHERE title = "Programmers"'))

This is just a simple example. You could define separate tables for states, counties and job titles.

@Markus Jarderot 2009-03-11 18:52:03

class JobDb(object):
    def __init__(self):
        self.data = []
        self.all = set()
        self.free = []
        self.index1 = {}
        self.index2 = {}
        self.index3 = {}

    def _indices(self,(key1,key2,key3)):
        indices = self.all.copy()
        wild = False
        for index,key in ((self.index1,key1),(self.index2,key2),
                                             (self.index3,key3)):
            if key is not None:
                indices &= index.setdefault(key,set())
            else:
                wild = True
        return indices, wild

    def __getitem__(self,key):
        indices, wild = self._indices(key)
        if wild:
            return dict(self.data[i] for i in indices)
        else:
            values = [self.data[i][-1] for i in indices]
            if values:
                return values[0]

    def __setitem__(self,key,value):
        indices, wild = self._indices(key)
        if indices:
            for i in indices:
                self.data[i] = key,value
        elif wild:
            raise KeyError(k)
        else:
            if self.free:
                index = self.free.pop(0)
                self.data[index] = key,value
            else:
                index = len(self.data)
                self.data.append((key,value))
                self.all.add(index)
            self.index1.setdefault(key[0],set()).add(index)
            self.index2.setdefault(key[1],set()).add(index)
            self.index3.setdefault(key[2],set()).add(index)

    def __delitem__(self,key):
        indices,wild = self._indices(key)
        if not indices:
            raise KeyError
        self.index1[key[0]] -= indices
        self.index2[key[1]] -= indices
        self.index3[key[2]] -= indices
        self.all -= indices
        for i in indices:
            self.data[i] = None
        self.free.extend(indices)

    def __len__(self):
        return len(self.all)

    def __iter__(self):
        for key,value in self.data:
            yield key

Example:

>>> db = JobDb()
>>> db['new jersey', 'mercer county', 'plumbers'] = 3
>>> db['new jersey', 'mercer county', 'programmers'] = 81
>>> db['new jersey', 'middlesex county', 'programmers'] = 81
>>> db['new jersey', 'middlesex county', 'salesmen'] = 62
>>> db['new york', 'queens county', 'plumbers'] = 9
>>> db['new york', 'queens county', 'salesmen'] = 36

>>> db['new york', None, None]
{('new york', 'queens county', 'plumbers'): 9,
 ('new york', 'queens county', 'salesmen'): 36}

>>> db[None, None, 'plumbers']
{('new jersey', 'mercer county', 'plumbers'): 3,
 ('new york', 'queens county', 'plumbers'): 9}

>>> db['new jersey', 'mercer county', None]
{('new jersey', 'mercer county', 'plumbers'): 3,
 ('new jersey', 'mercer county', 'programmers'): 81}

>>> db['new jersey', 'middlesex county', 'programmers']
81

>>>

Edit: Now returning dictionaries when querying with wild cards (None), and single values otherwise.

@Ben Blank 2009-03-11 20:52:50

Why return lists? Seems it should either return a dictionary (so you know what each number represents) or a sum (since that's all you can really do with the list).

@SpoonMeiser 2009-03-11 20:05:32

For easy iterating over your nested dictionary, why not just write a simple generator?

def each_job(my_dict):
    for state, a in my_dict.items():
        for county, b in a.items():
            for job, value in b.items():
                yield {
                    'state'  : state,
                    'county' : county,
                    'job'    : job,
                    'value'  : value
                }

So then, if you have your compilicated nested dictionary, iterating over it becomes simple:

for r in each_job(my_dict):
    print "There are %d %s in %s, %s" % (r['value'], r['job'], r['county'], r['state'])

Obviously your generator can yield whatever format of data is useful to you.

Why are you using try catch blocks to read the tree? It's easy enough (and probably safer) to query whether a key exists in a dict before trying to retrieve it. A function using guard clauses might look like this:

if not my_dict.has_key('new jersey'):
    return False

nj_dict = my_dict['new jersey']
...

Or, a perhaps somewhat verbose method, is to use the get method:

value = my_dict.get('new jersey', {}).get('middlesex county', {}).get('salesmen', 0)

But for a somewhat more succinct way, you might want to look at using a collections.defaultdict, which is part of the standard library since python 2.5.

import collections

def state_struct(): return collections.defaultdict(county_struct)
def county_struct(): return collections.defaultdict(job_struct)
def job_struct(): return 0

my_dict = collections.defaultdict(state_struct)

print my_dict['new jersey']['middlesex county']['salesmen']

I'm making assumptions about the meaning of your data structure here, but it should be easy to adjust for what you actually want to do.

@allyourcode 2009-03-11 20:30:20

Unless your dataset is going to stay pretty small, you might want to consider using a relational database. It will do exactly what you want: make it easy to add counts, selecting subsets of counts, and even aggregate counts by state, county, occupation, or any combination of these.

@vartec 2009-03-11 17:17:40

As for "obnoxious try/catch blocks":

d = {}
d.setdefault('key',{}).setdefault('inner key',{})['inner inner key'] = 'value'
print d

yields

{'key': {'inner key': {'inner inner key': 'value'}}}

You can use this to convert from your flat dictionary format to structured format:

fd = {('new jersey', 'mercer county', 'plumbers'): 3,
 ('new jersey', 'mercer county', 'programmers'): 81,
 ('new jersey', 'middlesex county', 'programmers'): 81,
 ('new jersey', 'middlesex county', 'salesmen'): 62,
 ('new york', 'queens county', 'plumbers'): 9,
 ('new york', 'queens county', 'salesmen'): 36}

for (k1,k2,k3), v in fd.iteritems():
    d.setdefault(k1, {}).setdefault(k2, {})[k3] = v

@S.Lott 2009-03-11 17:29:12

Since you have a star-schema design, you might want to structure it more like a relational table and less like a dictionary.

import collections

class Jobs( object ):
    def __init__( self, state, county, title, count ):
        self.state= state
        self.count= county
        self.title= title
        self.count= count

facts = [
    Jobs( 'new jersey', 'mercer county', 'plumbers', 3 ),
    ...

def groupBy( facts, name ):
    total= collections.defaultdict( int )
    for f in facts:
        key= getattr( f, name )
        total[key] += f.count

That kind of thing can go a long way to creating a data warehouse-like design without the SQL overheads.

Related Questions

Sponsored Content

23 Answered Questions

[SOLVED] Check if a given key already exists in a dictionary

  • 2009-10-21 19:05:09
  • Mohan Gulati
  • 2845160 View
  • 2683 Score
  • 23 Answer
  • Tags:   python dictionary

34 Answered Questions

[SOLVED] How do I sort a dictionary by value?

14 Answered Questions

[SOLVED] Iterating over dictionaries using 'for' loops

15 Answered Questions

[SOLVED] What are metaclasses in Python?

38 Answered Questions

[SOLVED] What does the "yield" keyword do?

15 Answered Questions

[SOLVED] Add new keys to a dictionary?

39 Answered Questions

[SOLVED] How to merge two dictionaries in a single expression?

27 Answered Questions

[SOLVED] What does if __name__ == "__main__": do?

27 Answered Questions

[SOLVED] What is the best way to iterate over a dictionary?

  • 2008-09-26 18:20:06
  • Jake Stewart
  • 1354451 View
  • 2253 Score
  • 27 Answer
  • Tags:   c# dictionary loops

19 Answered Questions

Sponsored Content