By snakile


2011-01-29 11:55:53 8 Comments

Using Python 3.x, I have a list of strings for which I would like to perform a natural alphabetical sort.

Natural sort: The order by which files in Windows are sorted.

For instance, the following list is naturally sorted (what I want):

['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

And here's the "sorted" version of the above list (what I have):

['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']

I'm looking for a sort function which behaves like the first one.

15 comments

@Claudiu 2013-04-18 18:37:05

Here's a much more pythonic version of Mark Byer's answer:

import re

def natural_sort_key(s, _nsre=re.compile('([0-9]+)')):
    return [int(text) if text.isdigit() else text.lower()
            for text in _nsre.split(s)]    

Now this function can be used as a key in any function that uses it, like list.sort, sorted, max, etc.

As a lambda:

lambda s: [int(t) if t.isdigit() else t.lower() for t in re.split('(\d+)', s)]

@Antti Haapala 2013-09-05 03:56:56

You'd probably want to precompile the pattern though...

@wim 2014-01-22 17:17:35

re module compiles and caches regexes automagically, so there is no need to precompile

@Claudiu 2014-01-22 17:51:31

@wim: it caches the last X usages, so it's technically possible to use X+5 regexes and then do a natural sort over and over, at which point this wouldn't be cached. but probably negligible in the long run

@The Unfun Cat 2015-05-08 18:51:38

I did not do it, but perhaps the reason was that it can't handle tuples, like a regular python sort.

@Zitrax 2015-11-23 12:45:52

The X usages mentioned by @Claudiu seem to be 100 on Python 2.7 and 512 on Python 3.4. And also note that when the limit is reached the cache is completely cleared (so it's not only the oldest one that is thrown out).

@Joschua 2016-05-19 14:02:10

@Zitrax Why / How does it make sense to clear the cache completely?

@Zitrax 2016-05-19 14:30:33

@Joschua I just mentioned what python is doing internally - perhaps it might just be the simplest solution. You can read more about this issue here: stackoverflow.com/q/17325281/11722

@Varadaraju G 2018-03-12 12:10:14

a = ['H1', 'H100', 'H10', 'H3', 'H2', 'H6', 'H11', 'H50', 'H5', 'H99', 'H8']
b = ''
c = []

def bubble(bad_list):#bubble sort method
        length = len(bad_list) - 1
        sorted = False

        while not sorted:
                sorted = True
                for i in range(length):
                        if bad_list[i] > bad_list[i+1]:
                                sorted = False
                                bad_list[i], bad_list[i+1] = bad_list[i+1], bad_list[i] #sort the integer list 
                                a[i], a[i+1] = a[i+1], a[i] #sort the main list based on the integer list index value

for a_string in a: #extract the number in the string character by character
        for letter in a_string:
                if letter.isdigit():
                        #print letter
                        b += letter
        c.append(b)
        b = ''

print 'Before sorting....'
print a
c = map(int, c) #converting string list into number list
print c
bubble(c)

print 'After sorting....'
print c
print a

Acknowledgments:

Bubble Sort Homework

How to read a string one letter at a time in python

@piRSquared 2018-02-19 23:45:31

Value Of This Post

My point is to offer a non regex solution that can be applied generally.
I'll create three functions:

  1. find_first_digit which I borrowed from @AnuragUniyal. It will find the position of the first digit or non-digit in a string.
  2. split_digits which is a generator that picks apart a string into digit and non digit chunks. It will also yield integers when it is a digit.
  3. natural_key just wraps split_digits into a tuple. This is what we use as a key for sorted, max, min.

Functions

def find_first_digit(s, non=False):
    for i, x in enumerate(s):
        if x.isdigit() ^ non:
            return i
    return -1

def split_digits(s, case=False):
    non = True
    while s:
        i = find_first_digit(s, non)
        if i == 0:
            non = not non
        elif i == -1:
            yield int(s) if s.isdigit() else s if case else s.lower()
            s = ''
        else:
            x, s = s[:i], s[i:]
            yield int(x) if x.isdigit() else x if case else x.lower()

def natural_key(s, *args, **kwargs):
    return tuple(split_digits(s, *args, **kwargs))

We can see that it is general in that we can have multiple digit chunks:

# Note that the key has lower case letters
natural_key('asl;dkfDFKJ:sdlkfjdf809lkasdjfa_543_hh')

('asl;dkfdfkj:sdlkfjdf', 809, 'lkasdjfa_', 543, '_hh')

Or leave as case sensitive:

natural_key('asl;dkfDFKJ:sdlkfjdf809lkasdjfa_543_hh', True)

('asl;dkfDFKJ:sdlkfjdf', 809, 'lkasdjfa_', 543, '_hh')

We can see that it sorts the OP's list in the appropriate order

sorted(
    ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13'],
    key=natural_key
)

['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

But it can handle more complicated lists as well:

sorted(
    ['f_1', 'e_1', 'a_2', 'g_0', 'd_0_12:2', 'd_0_1_:2'],
    key=natural_key
)

['a_2', 'd_0_1_:2', 'd_0_12:2', 'e_1', 'f_1', 'g_0']

My regex equivalent would be

def int_maybe(x):
    return int(x) if str(x).isdigit() else x

def split_digits_re(s, case=False):
    parts = re.findall('\d+|\D+', s)
    if not case:
        return map(int_maybe, (x.lower() for x in parts))
    else:
        return map(int_maybe, parts)

def natural_key_re(s, *args, **kwargs):
    return tuple(split_digits_re(s, *args, **kwargs))

@Benjamin Drung 2018-02-08 14:56:28

Based on the answers here, I wrote a natural_sorted function that behaves like the built-in function sorted:

# Copyright (C) 2018, Benjamin Drung <[email protected]>
#
# Permission to use, copy, modify, and/or distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
# ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
# ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
# OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

import re

def natural_sorted(iterable, key=None, reverse=False):
    """Return a new naturally sorted list from the items in *iterable*.

    The returned list is in natural sort order. The string is ordered
    lexicographically (using the Unicode code point number to order individual
    characters), except that multi-digit numbers are ordered as a single
    character.

    Has two optional arguments which must be specified as keyword arguments.

    *key* specifies a function of one argument that is used to extract a
    comparison key from each list element: ``key=str.lower``.  The default value
    is ``None`` (compare the elements directly).

    *reverse* is a boolean value.  If set to ``True``, then the list elements are
    sorted as if each comparison were reversed.

    The :func:`natural_sorted` function is guaranteed to be stable. A sort is
    stable if it guarantees not to change the relative order of elements that
    compare equal --- this is helpful for sorting in multiple passes (for
    example, sort by department, then by salary grade).
    """
    prog = re.compile(r"(\d+)")

    def alphanum_key(element):
        """Split given key in list of strings and digits"""
        return [int(c) if c.isdigit() else c for c in prog.split(key(element)
                if key else element)]

    return sorted(iterable, key=alphanum_key, reverse=reverse)

The source code is also available in my GitHub snippets repository: https://github.com/bdrung/snippets/blob/master/natural_sorted.py

@SethMMorton 2013-08-24 05:38:42

There is a third party library for this on PyPI called natsort (full disclosure, I am the package's author). For your case, you can do either of the following:

>>> from natsort import natsorted, ns
>>> x = ['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']
>>> natsorted(x, key=lambda y: y.lower())
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> natsorted(x, alg=ns.IGNORECASE)  # or alg=ns.IC
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

You should note that natsort uses a general algorithm so it should work for just about any input that you throw at it. If you want more details on why you might choose a library to do this rather than rolling your own function, check out the natsort documentation's How It Works page, in particular the Special Cases Everywhere! section.


If you need a sorting key instead of a sorting function, use either of the below formulas.

>>> from natsort import natsort_keygen, ns
>>> l1 = ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> l2 = l1[:]
>>> natsort_key1 = natsort_keygen(key=lambda y: y.lower())
>>> l1.sort(key=natsort_key1)
>>> l1
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> natsort_key2 = natsort_keygen(alg=ns.IGNORECASE)
>>> l2.sort(key=natsort_key2)
>>> l2
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

@Martin Thoma 2014-07-17 18:51:57

I also think it's quite interesting that natsort also sorts correct when the number is not at the end: like it's often the case for filenames. Feel free to include the following example: pastebin.com/9cwCLdEK

@Johny Vaknin 2017-08-14 14:49:34

I suggest you simply use the key keyword argument of sorted to achieve your desired list
For example:

to_order= [e2,E1,e5,E4,e3]
ordered= sorted(to_order, key= lambda x: x.lower())
    # ordered should be [E1,e2,e3,E4,e5]

@user19087 2017-01-11 06:48:27

Most likely functools.cmp_to_key() is closely tied to the underlying implementation of python's sort. Besides, the cmp parameter is legacy. The modern way is to transform the input items into objects that support the desired rich comparison operations.

Under CPython 2.x, objects of disparate types can be ordered even if the respective rich comparison operators haven't been implemented. Under CPython 3.x, objects of different types must explicitly support the comparison. See How does Python compare string and int? which links to the official documentation. Most of the answers depend on this implicit ordering. Switching to Python 3.x will require a new type to implement and unify comparisons between numbers and strings.

Python 2.7.12 (default, Sep 29 2016, 13:30:34) 
>>> (0,"foo") < ("foo",0)
True  
Python 3.5.2 (default, Oct 14 2016, 12:54:53) 
>>> (0,"foo") < ("foo",0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  TypeError: unorderable types: int() < str()

There are three different approaches. The first uses nested classes to take advantage of Python's Iterable comparison algorithm. The second unrolls this nesting into a single class. The third foregoes subclassing str to focus on performance. All are timed; the second is twice as fast while the third almost six times faster. Subclassing str isn't required, and was probably a bad idea in the first place, but it does come with certain conveniences.

The sort characters are duplicated to force ordering by case, and case-swapped to force lower case letter to sort first; this is the typical definition of "natural sort". I couldn't decide on the type of grouping; some might prefer the following, which also brings significant performance benefits:

d = lambda s: s.lower()+s.swapcase()

Where utilized, the comparison operators are set to that of object so they won't be ignored by functools.total_ordering.

import functools
import itertools


@functools.total_ordering
class NaturalStringA(str):
    def __repr__(self):
        return "{}({})".format\
            ( type(self).__name__
            , super().__repr__()
            )
    d = lambda c, s: [ c.NaturalStringPart("".join(v))
                        for k,v in
                       itertools.groupby(s, c.isdigit)
                     ]
    d = classmethod(d)
    @functools.total_ordering
    class NaturalStringPart(str):
        d = lambda s: "".join(c.lower()+c.swapcase() for c in s)
        d = staticmethod(d)
        def __lt__(self, other):
            if not isinstance(self, type(other)):
                return NotImplemented
            try:
                return int(self) < int(other)
            except ValueError:
                if self.isdigit():
                    return True
                elif other.isdigit():
                    return False
                else:
                    return self.d(self) < self.d(other)
        def __eq__(self, other):
            if not isinstance(self, type(other)):
                return NotImplemented
            try:
                return int(self) == int(other)
            except ValueError:
                if self.isdigit() or other.isdigit():
                    return False
                else:
                    return self.d(self) == self.d(other)
        __le__ = object.__le__
        __ne__ = object.__ne__
        __gt__ = object.__gt__
        __ge__ = object.__ge__
    def __lt__(self, other):
        return self.d(self) < self.d(other)
    def __eq__(self, other):
        return self.d(self) == self.d(other)
    __le__ = object.__le__
    __ne__ = object.__ne__
    __gt__ = object.__gt__
    __ge__ = object.__ge__
import functools
import itertools


@functools.total_ordering
class NaturalStringB(str):
    def __repr__(self):
        return "{}({})".format\
            ( type(self).__name__
            , super().__repr__()
            )
    d = lambda s: "".join(c.lower()+c.swapcase() for c in s)
    d = staticmethod(d)
    def __lt__(self, other):
        if not isinstance(self, type(other)):
            return NotImplemented
        groups = map(lambda i: itertools.groupby(i, type(self).isdigit), (self, other))
        zipped = itertools.zip_longest(*groups)
        for s,o in zipped:
            if s is None:
                return True
            if o is None:
                return False
            s_k, s_v = s[0], "".join(s[1])
            o_k, o_v = o[0], "".join(o[1])
            if s_k and o_k:
                s_v, o_v = int(s_v), int(o_v)
                if s_v == o_v:
                    continue
                return s_v < o_v
            elif s_k:
                return True
            elif o_k:
                return False
            else:
                s_v, o_v = self.d(s_v), self.d(o_v)
                if s_v == o_v:
                    continue
                return s_v < o_v
        return False
    def __eq__(self, other):
        if not isinstance(self, type(other)):
            return NotImplemented
        groups = map(lambda i: itertools.groupby(i, type(self).isdigit), (self, other))
        zipped = itertools.zip_longest(*groups)
        for s,o in zipped:
            if s is None or o is None:
                return False
            s_k, s_v = s[0], "".join(s[1])
            o_k, o_v = o[0], "".join(o[1])
            if s_k and o_k:
                s_v, o_v = int(s_v), int(o_v)
                if s_v == o_v:
                    continue
                return False
            elif s_k or o_k:
                return False
            else:
                s_v, o_v = self.d(s_v), self.d(o_v)
                if s_v == o_v:
                    continue
                return False
        return True
    __le__ = object.__le__
    __ne__ = object.__ne__
    __gt__ = object.__gt__
    __ge__ = object.__ge__
import functools
import itertools
import enum


class OrderingType(enum.Enum):
    PerWordSwapCase         = lambda s: s.lower()+s.swapcase()
    PerCharacterSwapCase    = lambda s: "".join(c.lower()+c.swapcase() for c in s)


class NaturalOrdering:
    @classmethod
    def by(cls, ordering):
        def wrapper(string):
            return cls(string, ordering)
        return wrapper
    def __init__(self, string, ordering=OrderingType.PerCharacterSwapCase):
        self.string = string
        self.groups = [ (k,int("".join(v)))
                            if k else
                        (k,ordering("".join(v)))
                            for k,v in
                        itertools.groupby(string, str.isdigit)
                      ]
    def __repr__(self):
        return "{}({})".format\
            ( type(self).__name__
            , self.string
            )
    def __lesser(self, other, default):
        if not isinstance(self, type(other)):
            return NotImplemented
        for s,o in itertools.zip_longest(self.groups, other.groups):
            if s is None:
                return True
            if o is None:
                return False
            s_k, s_v = s
            o_k, o_v = o
            if s_k and o_k:
                if s_v == o_v:
                    continue
                return s_v < o_v
            elif s_k:
                return True
            elif o_k:
                return False
            else:
                if s_v == o_v:
                    continue
                return s_v < o_v
        return default
    def __lt__(self, other):
        return self.__lesser(other, default=False)
    def __le__(self, other):
        return self.__lesser(other, default=True)
    def __eq__(self, other):
        if not isinstance(self, type(other)):
            return NotImplemented
        for s,o in itertools.zip_longest(self.groups, other.groups):
            if s is None or o is None:
                return False
            s_k, s_v = s
            o_k, o_v = o
            if s_k and o_k:
                if s_v == o_v:
                    continue
                return False
            elif s_k or o_k:
                return False
            else:
                if s_v == o_v:
                    continue
                return False
        return True
    # functools.total_ordering doesn't create single-call wrappers if both
    # __le__ and __lt__ exist, so do it manually.
    def __gt__(self, other):
        op_result = self.__le__(other)
        if op_result is NotImplemented:
            return op_result
        return not op_result
    def __ge__(self, other):
        op_result = self.__lt__(other)
        if op_result is NotImplemented:
            return op_result
        return not op_result
    # __ne__ is the only implied ordering relationship, it automatically
    # delegates to __eq__
>>> import natsort
>>> import timeit
>>> l1 = ['Apple', 'corn', 'apPlE', 'arbour', 'Corn', 'Banana', 'apple', 'banana']
>>> l2 = list(map(str, range(30)))
>>> l3 = ["{} {}".format(x,y) for x in l1 for y in l2]
>>> print(timeit.timeit('sorted(l3+["0"], key=NaturalStringA)', number=10000, globals=globals()))
362.4729259099986
>>> print(timeit.timeit('sorted(l3+["0"], key=NaturalStringB)', number=10000, globals=globals()))
189.7340817489967
>>> print(timeit.timeit('sorted(l3+["0"], key=NaturalOrdering.by(OrderingType.PerCharacterSwapCase))', number=10000, globals=globals()))
69.34636392899847
>>> print(timeit.timeit('natsort.natsorted(l3+["0"], alg=natsort.ns.GROUPLETTERS | natsort.ns.LOWERCASEFIRST)', number=10000, globals=globals()))
98.2531585780016

Natural sorting is both pretty complicated and vaguely defined as a problem. Don't forget to run unicodedata.normalize(...) beforehand, and consider use str.casefold() rather than str.lower(). There are probably subtle encoding issues I haven't considered. So I tentatively recommend the natsort library. I took a quick glance at the github repository; the code maintenance has been stellar.

All the algorithms I've seen depend on tricks such as duplicating and lowering characters, and swapping case. While this doubles the running time, an alternative would require a total natural ordering on the input character set. I don't think this is part of the unicode specification, and since there are many more unicode digits than [0-9], creating such a sorting would be equally daunting. If you want locale-aware comparisons, prepare your strings with locale.strxfrm per Python's Sorting HOW TO.

@Camilo 2017-03-14 15:16:56

Given:

data=['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']

Similar to SergO's solution, a 1-liner without external libraries would be:

data.sort(key=lambda x : int(x[3:]))

or

sorted_data=sorted(data, key=lambda x : int(x[3:]))

Explanation:

This solution uses the key feature of sort to define a function that will be employed for the sorting. Because we know that every data entry is preceded by 'elm' the sorting function converts to integer the portion of the string after the 3rd character (i.e. int(x[3:])). If the numerical part of the data is in a different location, then this part of the function would have to change.

Cheers

@robert king 2011-06-06 11:08:09

One option is to turn the string into a tuple and replace digits using expanded form http://wiki.answers.com/Q/What_does_expanded_form_mean

that way a90 would become ("a",90,0) and a1 would become ("a",1)

below is some sample code (which isn't very efficient due to the way It removes leading 0's from numbers)

alist=["something1",
    "something12",
    "something17",
    "something2",
    "something25and_then_33",
    "something25and_then_34",
    "something29",
    "beta1.1",
    "beta2.3.0",
    "beta2.33.1",
    "a001",
    "a2",
    "z002",
    "z1"]

def key(k):
    nums=set(list("0123456789"))
        chars=set(list(k))
    chars=chars-nums
    for i in range(len(k)):
        for c in chars:
            k=k.replace(c+"0",c)
    l=list(k)
    base=10
    j=0
    for i in range(len(l)-1,-1,-1):
        try:
            l[i]=int(l[i])*base**j
            j+=1
        except:
            j=0
    l=tuple(l)
    print l
    return l

print sorted(alist,key=key)

output:

('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 1)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 10, 2)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 10, 7)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 2)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 20, 5, 'a', 'n', 'd', '_', 't', 'h', 'e', 'n', '_', 30, 3)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 20, 5, 'a', 'n', 'd', '_', 't', 'h', 'e', 'n', '_', 30, 4)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 20, 9)
('b', 'e', 't', 'a', 1, '.', 1)
('b', 'e', 't', 'a', 2, '.', 3, '.')
('b', 'e', 't', 'a', 2, '.', 30, 3, '.', 1)
('a', 1)
('a', 2)
('z', 2)
('z', 1)
['a001', 'a2', 'beta1.1', 'beta2.3.0', 'beta2.33.1', 'something1', 'something2', 'something12', 'something17', 'something25and_then_33', 'something25and_then_34', 'something29', 'z1', 'z002']

@SethMMorton 2017-01-03 06:35:38

Unfortunately, this solution only works for Python 2.X. For Python 3, ('b', 1) < ('b', 'e', 't', 'a', 1, '.', 1) will return TypeError: unorderable types: int() < str()

@Jerod 2016-05-04 19:30:47

And now for something more* elegant (pythonic) -just a touch

There are many implementations out there, and while some have come close, none quite captured the elegance modern python affords.

  • Tested using python(3.5.1)
  • Included an additional list to demonstrate that it works when the numbers are mid string
  • Didn't test, however, I am assuming that if your list was sizable it would be more efficient to compile the regex beforehand
    • I'm sure someone will correct me if this is an erroneous assumption

Quicky
from re import compile, split    
dre = compile(r'(\d+)')
mylist.sort(key=lambda l: [int(s) if s.isdigit() else s.lower() for s in split(dre, l)])
Full-Code
#!/usr/bin/python3
# coding=utf-8
"""
Natural-Sort Test
"""

from re import compile, split

dre = compile(r'(\d+)')
mylist = ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13', 'elm']
mylist2 = ['e0lm', 'e1lm', 'E2lm', 'e9lm', 'e10lm', 'E12lm', 'e13lm', 'elm', 'e01lm']

mylist.sort(key=lambda l: [int(s) if s.isdigit() else s.lower() for s in split(dre, l)])
mylist2.sort(key=lambda l: [int(s) if s.isdigit() else s.lower() for s in split(dre, l)])

print(mylist)  
  # ['elm', 'elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
print(mylist2)  
  # ['e0lm', 'e1lm', 'e01lm', 'E2lm', 'e9lm', 'e10lm', 'E12lm', 'e13lm', 'elm']

Caution when using

  • from os.path import split
    • you will need to differentiate the imports

Inspiration from

@Jared Goguen 2016-05-04 19:35:55

And to think I forgot about dre.

@SergO 2015-07-15 14:16:56

data = ['elm13', 'elm9', 'elm0', 'elm1', 'Elm11', 'Elm2', 'elm10']

Let's analyse the data. The digit capacity of all elements is 2. And there are 3 letters in common literal part 'elm'.

So, the maximal length of element is 5. We can increase this value to make sure (for example, to 8).

Bearing that in mind, we've got a one-line solution:

data.sort(key=lambda x: '{0:0>8}'.format(x).lower())

without regular expressions and external libraries!

print(data)

>>> ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'elm13']

Explanation:

for elm in data:
    print('{0:0>8}'.format(elm).lower())

>>>
0000elm0
0000elm1
0000elm2
0000elm9
000elm10
000elm11
000elm13

@Bade 2017-01-03 21:05:36

@SergO: Thanks. Some of the solutions here are not explained well.

@Jerod 2017-09-27 13:46:10

This doesn't handle dynamic/unknown length data. It also sorts differently than other solutions for data that has numbers within the data opposed to at the end. *This isn't necessarily undesirable but I think it's good to point out.

@roganartu 2017-11-27 16:42:17

If you need to handle dynamic length data you can use width = max(data, key=len) to calculate what to sub in for the 8 above and then sub it into the format string with '{0:0>{width}}'.format(x, width=width)

@Scott Lawton 2014-12-11 18:42:56

The above answers are good for the specific example that was shown, but miss several useful cases for the more general question of natural sort. I just got bit by one of those cases, so created a more thorough solution:

def natural_sort_key(string_or_number):
    """
    by Scott S. Lawton <[email protected]> 2014-12-11; public domain and/or CC0 license

    handles cases where simple 'int' approach fails, e.g.
        ['0.501', '0.55'] floating point with different number of significant digits
        [0.01, 0.1, 1]    already numeric so regex and other string functions won't work (and aren't required)
        ['elm1', 'Elm2']  ASCII vs. letters (not case sensitive)
    """

    def try_float(astring):
        try:
            return float(astring)
        except:
            return astring

    if isinstance(string_or_number, basestring):
        string_or_number = string_or_number.lower()

        if len(re.findall('[.]\d', string_or_number)) <= 1:
            # assume a floating point value, e.g. to correctly sort ['0.501', '0.55']
            # '.' for decimal is locale-specific, e.g. correct for the Anglosphere and Asia but not continental Europe
            return [try_float(s) for s in re.split(r'([\d.]+)', string_or_number)]
        else:
            # assume distinct fields, e.g. IP address, phone number with '.', etc.
            # caveat: might want to first split by whitespace
            # TBD: for unicode, replace isdigit with isdecimal
            return [int(s) if s.isdigit() else s for s in re.split(r'(\d+)', string_or_number)]
    else:
        # consider: add code to recurse for lists/tuples and perhaps other iterables
        return string_or_number

Test code and several links (on and off of StackOverflow) are here: http://productarchitect.com/code/better-natural-sort.py

Feedback welcome. That's not meant to be a definitive solution; just a step forward.

@SethMMorton 2015-11-17 23:58:05

In your test script to which you link, natsorted and humansorted fail because they were used incorrectly... you tried to pass natsorted as a key but its actually the sorting function itself. You should have tried natsort_keygen().

@beauburrier 2012-01-20 10:53:53

I wrote a function based on http://www.codinghorror.com/blog/2007/12/sorting-for-humans-natural-sort-order.html which adds the ability to still pass in your own 'key' parameter. I need this in order to perform a natural sort of lists that contain more complex objects (not just strings).

import re

def natural_sort(list, key=lambda s:s):
    """
    Sort the list into natural alphanumeric order.
    """
    def get_alphanum_key_func(key):
        convert = lambda text: int(text) if text.isdigit() else text 
        return lambda s: [convert(c) for c in re.split('([0-9]+)', key(s))]
    sort_key = get_alphanum_key_func(key)
    list.sort(key=sort_key)

For example:

my_list = [{'name':'b'}, {'name':'10'}, {'name':'a'}, {'name':'1'}, {'name':'9'}]
natural_sort(my_list, key=lambda x: x['name'])
print my_list
[{'name': '1'}, {'name': '9'}, {'name': '10'}, {'name': 'a'}, {'name': 'b'}]

@Claudiu 2013-04-18 18:49:31

a simpler way to do this would be to define natural_sort_key, and then when sorting a list you could do chain your keys, e.g.: list.sort(key=lambda el: natural_sort_key(el['name']))

@Mark Byers 2011-01-29 12:01:31

Try this:

import re

def natural_sort(l): 
    convert = lambda text: int(text) if text.isdigit() else text.lower() 
    alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 
    return sorted(l, key = alphanum_key)

Output:

['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

See it working online: ideone.

Code adapted from here: Sorting for Humans : Natural Sort Order.

@Apalala 2011-01-30 16:44:17

+1 Why the lambdas assigned to names?

@jperelli 2012-08-15 16:33:58

why do you use return sorted(l, key) instead of l.sort(key)? Is it for any performance gain or just to be more pythonic?

@huggie 2012-08-30 05:00:00

@jperelli I think the ladder would change the original list in the caller. But most likely the caller wants another shallow copy of the list.

@user19087 2017-01-11 00:25:50

Just for the record, this can't handle all inputs: the str/int splits must line up, otherwise you'll create comparisons like ["foo",0] < [0,"foo"] for the input ["foo0","0foo"], which raises a TypeError.

@Florian Kusche 2017-10-30 09:13:46

@user19087: In fact it does work, because re.split('([0-9]+)', '0foo') returns ['', '0', 'foo']. Because of that, strings will always be on even indexes and integers on odd indexes in the array.

@SilentGhost 2011-01-29 12:01:33

>>> import re
>>> sorted(lst, key=lambda x: int(re.findall(r'\d+$', x)[0]))
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

@user225312 2011-01-29 12:09:34

@SilentGhost: Can you explain the re part?

@SilentGhost 2011-01-29 12:11:34

@AA: matches digits at the end of the string.

@Ignacio Vazquez-Abrams 2011-01-29 12:11:36

@snakile 2011-01-29 12:26:07

Your implementation only solves the numbers problem. The implementation fails if the strings don't have numbers in them. Try it on ['silent','ghost'] for instance (list index out of range).

@SilentGhost 2011-01-29 13:17:54

@snaklie: your question fails to provide decent example case. You haven't explained what you're trying to do, and neither you have updated your question with this new information. You haven't posted anything you have tried, so please don't be so dismissive of my telepathy attempt.

@snakile 2011-01-29 13:43:19

@SilentGhost: First, I gave you an upvote because I think your answer is useful (even though it doesn't solve my problem). Second, I cannot cover all the possible cases with examples. I think I've given a pretty clear definition to natural sort. I don't think it's a good idea to give a complex example or a long definition to such a simple concept. You're welcome to edit my question if you can think of a better formulation to the problem.

@SilentGhost 2011-01-29 13:51:50

@snakile: but I don't know how to improve your question, because for me it's not clear at all how you'd want to deal with strings that do not have digits, but have different cases, lengths, etc.

@snakile 2011-01-29 14:06:18

@SilentGhost: I'd want to deal with such strings the same way Windows deals with such file names when it sorts files by name (ignore cases, etc). It seems clear to me, but anything I say seems clear to me, so I'm not to judge whether it's clear or not.

@David Heffernan 2011-01-29 16:02:09

@snakile you have come nowhere near close defining natural search. That would be quite hard to do and would require a lot of detail. If you want the sort order used by windows explorer do you know that there is a simple api call that provides this?

Related Questions

Sponsored Content

40 Answered Questions

[SOLVED] Sort array of objects by string property value

59 Answered Questions

[SOLVED] How do you split a list into evenly sized chunks?

11 Answered Questions

[SOLVED] Calling a function of a module by using its name (a string)

  • 2008-08-06 03:36:08
  • ricree
  • 561599 View
  • 1400 Score
  • 11 Answer
  • Tags:   python object

28 Answered Questions

10 Answered Questions

32 Answered Questions

[SOLVED] "Least Astonishment" and the Mutable Default Argument

9 Answered Questions

17 Answered Questions

[SOLVED] Does Python have a string 'contains' substring method?

19 Answered Questions

[SOLVED] Pythonic way to create a long multi-line string

191 Answered Questions

[SOLVED] Hidden features of Python

Sponsored Content