By jespern


2008-11-23 12:15:52 8 Comments

I have a list of arbitrary length, and I need to split it up into equal size chunks and operate on it. There are some obvious ways to do this, like keeping a counter and two lists, and when the second list fills up, add it to the first list and empty the second list for the next round of data, but this is potentially extremely expensive.

I was wondering if anyone had a good solution to this for lists of any length, e.g. using generators.

I was looking for something useful in itertools but I couldn't find anything obviously useful. Might've missed it, though.

Related question: What is the most “pythonic” way to iterate over a list in chunks?

30 comments

@Realfun 2019-12-03 18:34:33

An old school approach that does not require itertools but still works with arbitrary generators:

def chunks(g, n):
  """divide a generator 'g' into small chunks
  Yields:
    a chunk that has 'n' or less items
  """
  n = max(1, n)
  buff = []
  for item in g:
    buff.append(item)
    if len(buff) == n:
      yield buff
      buff = []
  if buff:
    yield buff

@Ned Batchelder 2008-11-23 12:33:53

Here's a generator that yields the chunks you want:

def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]

import pprint
pprint.pprint(list(chunks(range(10, 75), 10)))
[[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74]]

If you're using Python 2, you should use xrange() instead of range():

def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in xrange(0, len(lst), n):
        yield lst[i:i + n]

Also you can simply use list comprehension instead of writing a function, though it's a good idea to encapsulate operations like this in named functions so that your code is easier to understand. Python 3:

[lst[i:i + n] for i in range(0, len(lst), n)]

Python 2 version:

[lst[i:i + n] for i in xrange(0, len(lst), n)]

@jespern 2008-11-23 12:51:10

What happens if we can't tell the length of the list? Try this on itertools.repeat([ 1, 2, 3 ]), e.g.

@Ned Batchelder 2008-11-23 13:53:37

That's an interesting extension to the question, but the original question clearly asked about operating on a list.

@n611x007 2014-04-24 09:19:24

@jespern I guess with an infinite or indefinite-length list you go to the related question that J.F. Sebastian linked: What is the most “pythonic” way to iterate over a list in chunks?

@dgan 2018-02-04 14:19:55

this functions needs to be in the damn standard library

@SomethingSomething 2018-04-03 11:33:22

I'd add a generator expression example, in addition to the list comprehension ones. (Simply use () rather than [])

@Calimo 2018-06-14 11:51:04

-1. The question specifically asks for "evenly sized chunks". This isn't a valid answer to the question as the last chunk will be arbitrarily small.

@Ned Batchelder 2018-06-14 15:29:23

@Calimo: what do you suggest? I hand you a list with 47 elements. How would you like to split it into "evenly sized chunks"? The OP accepted the answer, so they are clearly OK with the last differently sized chunk. Perhaps the English phrase is imprecise?

@Calimo 2018-06-14 15:46:42

@NedBatchelder I agree the question is pretty ill-defined, but you can split a list of 47 elements in 5 chunks of 9, 9, 9, 10 and 10 elements, instead of 7, 10, 10, 10 and 10. It is not exactly even, but that's what I had in mind when I googled the "even sized chunks" keywords. This means you need n to define the number of chunks, not their size. An other answer below suggests a way to do it actually. Your answer is basically the same as the ones in the linked "related question".

@Alvaro 2019-07-04 12:46:37

Most people will be looking at this for batch processing and rate limiting, so it usually doesn't matter if the last chunk is smaller

@marsipan 2019-10-26 15:41:38

On my system, I don't get the same output, but rather a list of range objects. To get the same output as you I used this: [list(s) for s in chunks(range(10, 75), 10)]

@Yasen 2019-11-27 12:33:35

Please don't name your variables l, it looks exactly like 1 and is confusing. People are copying your code and think this is ok.

@Moj 2013-06-05 08:54:26

I know this is kind of old but nobody yet mentioned numpy.array_split:

import numpy as np

lst = range(50)
np.array_split(lst, 5)
# [array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
#  array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),
#  array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29]),
#  array([30, 31, 32, 33, 34, 35, 36, 37, 38, 39]),
#  array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])]

@FizxMike 2015-09-09 03:03:50

This allows you to set the total number of chunks, not the number of elements per chunk.

@Moj 2015-09-09 07:27:58

you can do the math yourself. if you have 10 elements you can group them into 2, 5 elements chunks or five 2 elements chunks

@MiniQuark 2016-06-28 17:26:29

+1 This is my favorite solution, as it splits the array into evenly sized arrays, while other solutions don't (in all other solutions I looked at, the last array may be arbitrarily small).

@Baldrickk 2018-05-18 11:12:16

@MiniQuark but what does this do when the number of blocks isn't a factor of the original array size?

@MiniQuark 2018-05-18 15:31:50

@Baldrickk If you split N elements into K chunks, then the first N%K chunks will have N//K+1 elements, and the rest will have N//K elements. For example, if you split an array containing 108 elements into 5 chunks, then the first 108%5=3 chunks will contain 108//5+1=22 elements, and the rest of the chunks will have 108//5=21 elements.

@Robert 2019-10-31 18:43:58

cleanest solution. Thank you.

@J-L 2019-07-15 15:27:09

This question reminds me of the Perl 6 .comb(n) method. It breaks up strings into n-sized chunks. (There's more to it than that, but I'll leave out the details.)

It's easy enough to implement a similar function in Python3 as a lambda expression:

comb = lambda s,n: (s[i:i+n] for i in range(0,len(s),n))

Then you can call it like this:

some_list = list(range(0, 20))  # creates a list of 20 elements
generator = comb(some_list, 4)  # creates a generator that will generate lists of 4 elements
for sublist in generator:
    print(sublist)  # prints a sublist of four elements, as it's generated

Of course, you don't have to assign the generator to a variable; you can just loop over it directly like this:

for sublist in comb(some_list, 4):
    print(sublist)  # prints a sublist of four elements, as it's generated

As a bonus, this comb() function also operates on strings:

list( comb('catdogant', 3) )  # returns ['cat', 'dog', 'ant']

@Ravi Anand 2019-07-09 14:04:12

python pydash package could be a good choice.

from pydash.arrays import chunk
ids = ['22', '89', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '1']
chunk_ids = chunk(ids,5)
print(chunk_ids)
# output: [['22', '89', '2', '3', '4'], ['5', '6', '7', '8', '9'], ['10', '11', '1']]

for more checkout pydash chunk list

@ajaest 2019-05-23 10:35:48

If you don't care about the order:

> from itertools import groupby
> batch_no = 3
> data = 'abcdefgh'

> [
    [x[1] for x in x[1]] 
    for x in 
    groupby(
      sorted(
        (x[0] % batch_no, x[1]) 
        for x in 
        enumerate(data)
      ),
      key=lambda x: x[0]
    )
  ]

[['a', 'd', 'g'], ['b', 'e', 'h'], ['c', 'f']]

This solution doesn't generates sets of same size, but distributes values so batches are as big as possible while keeping the number of generated batches.

@luckydonald 2019-04-20 18:28:54

Lazy loading version

import pprint
pprint.pprint(list(chunks(range(10, 75), 10)))
[range(10, 20),
 range(20, 30),
 range(30, 40),
 range(40, 50),
 range(50, 60),
 range(60, 70),
 range(70, 75)]

Confer this implementation's result with the example usage result of the accepted answer.

Many of the above functions assume that the length of the whole iterable are known up front, or at least are cheap to calculate.

For some streamed objects that would mean loading the full data into memory first (e.g. to download the whole file) to get the length information.

If you however don't know the the full size yet, you can use this code instead:

def chunks(iterable, size):
    """
    Yield successive chunks from iterable, being `size` long.

    https://stackoverflow.com/a/55776536/3423324
    :param iterable: The object you want to split into pieces.
    :param size: The size each of the resulting pieces should have.
    """
    i = 0
    while True:
        sliced = iterable[i:i + size]
        if len(sliced) == 0:
            # to suppress stuff like `range(max, max)`.
            break
        # end if
        yield sliced
        if len(sliced) < size:
            # our slice is not the full length, so we must have passed the end of the iterator
            break
        # end if
        i += size  # so we start the next chunk at the right place.
    # end while
# end def

This works because the slice command will return less/no elements if you passed the end of an iterable:

"abc"[0:2] == 'ab'
"abc"[2:4] == 'c'
"abc"[4:6] == ''

We now use that result of the slice, and calculate the length of that generated chunk. If it is less than what we expect, we know we can end the iteration.

That way the iterator will not be executed unless access.

@atzz 2008-11-23 12:40:39

If you know list size:

def SplitList(mylist, chunk_size):
    return [mylist[offs:offs+chunk_size] for offs in range(0, len(mylist), chunk_size)]

If you don't (an iterator):

def IterChunks(sequence, chunk_size):
    res = []
    for item in sequence:
        res.append(item)
        if len(res) >= chunk_size:
            yield res
            res = []
    if res:
        yield res  # yield the last, incomplete, portion

In the latter case, it can be rephrased in a more beautiful way if you can be sure that the sequence always contains a whole number of chunks of given size (i.e. there is no incomplete last chunk).

@Jason Dunkelberger 2015-08-07 23:31:15

I am sad this is buried so far down. The IterChunks works for everything and is the general solution and has no caveats that I know of.

@Mott The Tuple 2019-04-13 18:06:14

You can use Dask to split a list into evenly sized chunks. Dask has the added benefit of memory conservation which is best for very large data. For best results you should load your list directly into a dask dataframe to conserve memory if your list is very large. Depending on what exactly you want to do with the lists, Dask has an entire API of functions you can use: http://docs.dask.org/en/latest/dataframe-api.html

import pandas as pd
import dask.dataframe as dd 

split = 4
my_list = range(100)
df = dd.from_pandas(pd.DataFrame(my_list), npartitions = split)
my_list = [ df.get_partition(n).compute().iloc[:,0].tolist() for n in range(split) ]

# [[1,2,3,..],[26,27,28...],[51,52,53...],[76,77,78...]]

@oremj 2009-11-17 20:17:16

If you want something super simple:

def chunks(l, n):
    n = max(1, n)
    return (l[i:i+n] for i in xrange(0, len(l), n))

Use range() instead of xrange() in the case of Python 3.x

@J-P 2011-08-20 13:54:34

Or (if we're doing different representations of this particular function) you could define a lambda function via: lambda x,y: [ x[i:i+y] for i in range(0,len(x),y)] . I love this list-comprehension method!

@alwbtc 2017-06-01 06:45:56

after return there must be [, not (

@Mr_and_Mrs_D 2017-11-25 19:42:43

@alwbtc - no it's correct it's a generator

@Bob Stein 2018-05-15 17:49:13

"Super simple" means not having to debug infinite loops -- kudos for the max().

@mit 2018-10-19 12:36:59

there is nothing simple about this solution

@np8 2019-08-14 08:59:37

Note that the outcome with input list ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']would be [['A', 'B', 'C'], ['D', 'E', 'F'], ['G', 'H', 'I'], ['J']] and not [['A', 'B', 'C'], ['D', 'E', 'F'], ['G', 'H'], ['I', 'J']]

@senderle 2014-02-26 15:02:00

I'm surprised nobody has thought of using iter's two-argument form:

from itertools import islice

def chunk(it, size):
    it = iter(it)
    return iter(lambda: tuple(islice(it, size)), ())

Demo:

>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]

This works with any iterable and produces output lazily. It returns tuples rather than iterators, but I think it has a certain elegance nonetheless. It also doesn't pad; if you want padding, a simple variation on the above will suffice:

from itertools import islice, chain, repeat

def chunk_pad(it, size, padval=None):
    it = chain(iter(it), repeat(padval))
    return iter(lambda: tuple(islice(it, size)), (padval,) * size)

Demo:

>>> list(chunk_pad(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk_pad(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]

Like the izip_longest-based solutions, the above always pads. As far as I know, there's no one- or two-line itertools recipe for a function that optionally pads. By combining the above two approaches, this one comes pretty close:

_no_padding = object()

def chunk(it, size, padval=_no_padding):
    if padval == _no_padding:
        it = iter(it)
        sentinel = ()
    else:
        it = chain(iter(it), repeat(padval))
        sentinel = (padval,) * size
    return iter(lambda: tuple(islice(it, size)), sentinel)

Demo:

>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]
>>> list(chunk(range(14), 3, None))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]

I believe this is the shortest chunker proposed that offers optional padding.

As Tomasz Gandor observed, the two padding chunkers will stop unexpectedly if they encounter a long sequence of pad values. Here's a final variation that works around that problem in a reasonable way:

_no_padding = object()
def chunk(it, size, padval=_no_padding):
    it = iter(it)
    chunker = iter(lambda: tuple(islice(it, size)), ())
    if padval == _no_padding:
        yield from chunker
    else:
        for ch in chunker:
            yield ch if len(ch) == size else ch + (padval,) * (size - len(ch))

Demo:

>>> list(chunk([1, 2, (), (), 5], 2))
[(1, 2), ((), ()), (5,)]
>>> list(chunk([1, 2, None, None, 5], 2, None))
[(1, 2), (None, None), (5, None)]

@ThomasH 2016-09-15 19:58:02

Wonderful, your simple version is my favorite. Others too came up with the basic islice(it, size) expression and embedded it (like I had done) in a loop construct. Only you thought of the two-argument version of iter() (I was completely unaware of), which makes it super-elegant (and probably most performance-effective). I had no idea that the first argument to iter changes to a 0-argument function when given the sentinel. You return a (pot. infinite) iterator of chunks, can use a (pot. infinite) iterator as input, have no len() and no array slices. Awesome!

@Kerr 2017-08-16 14:30:02

This is why I read down through the answers rather than scanning just the top couple. Optional padding was a requirement in my case, and I too learned about the two-argument form of iter.

@Tomasz Gandor 2018-11-16 11:34:49

I upvoted this, but still - let's not overhype it! First of all, lambda can be bad (slow closure over it iterator. Secondly, and most importanlty - you will end prematurely if a chunk of padval actually exists in your iterable, and should be processed.

@senderle 2018-11-16 12:38:39

@TomaszGandor, I take your first point! Although my understanding is that lambda isn't any slower than an ordinary function, of course you're right that the function call and closure look-up will slow this down. I don't know what the relative performance hit of this would be vs. the izip_longest approach, for example -- I suspect it might be a complex trade-off. But... isn't the padval issue shared by every answer here that offers a padval parameter?

@Tomasz Gandor 2018-11-16 18:48:40

I don't know if 'every' (just read top answers, and noticed this one). Let me illustrate this:chunk_pad([1, 2, None, None, 5], 2) SHOULD generate: (1, 2), (None, None), (5, None), instead it just generates (1, 2). Same for chunk([1, 2, (), (), 5], 2), but the second generated item should be ((), ()). The problem can't be made to go away by adding more ifs, it's intrinsic to using iter with sentinel.

@senderle 2018-11-16 22:09:31

@TomaszGandor I see, I hadn't understood what you were saying. You're right, that problem is somewhat unique to this answer. I believe there is a way around it though — I'll give it a bit of thought.

@Tomasz Gandor 2018-11-16 22:14:06

You don't need a way around, if you know that all-padding chunks are indeed invalid. This is not always the case, but for many scenarios this will work OK.

@senderle 2018-11-17 01:19:42

@TomaszGandor, fair enough! But it wasn't too hard to create a version that fixes this. (Also, note that the very first version, which uses () as the sentinel, does work correctly. This is because tuple(islice(it, size)) yields () when it is empty.)

@Adeojo Emmanuel IMM 2018-04-17 01:32:01

def split(arr, size):
    L = len(arr)
    assert 0 < size <= L
    s, r = divmod(L, size)
    t = s + 1
    a = ([arr[p:p+t] for p in range(0, r*t, t)] + [arr[p:p+s] for p in range(r*t, L, s)])
    return a

inspired from http://wordaligned.org/articles/slicing-a-list-evenly-with-python

@pylang 2018-08-26 01:40:11

Here is a list of additional approaches:

Given

import itertools as it
import collections as ct

import more_itertools as mit


iterable = range(11)
n = 3

Code

The Standard Library

list(it.zip_longest(*[iter(iterable)] * n))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]

d = {}
for i, x in enumerate(iterable):
    d.setdefault(i//n, []).append(x)

list(d.values())
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

dd = ct.defaultdict(list)
for i, x in enumerate(iterable):
    dd[i//n].append(x)

list(dd.values())
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

more_itertools+

list(mit.chunked(iterable, n))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

list(mit.sliced(iterable, n))
# [range(0, 3), range(3, 6), range(6, 9), range(9, 11)]

list(mit.grouper(n, iterable))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]

list(mit.windowed(iterable, len(iterable)//n, step=n))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]

References

+ A third-party library that implements itertools recipes and more. > pip install more_itertools

@Arthur Sult 2018-03-23 18:27:35

I dislike idea of splitting elements by chunk size, e.g. script can devide 101 to 3 chunks as [50, 50, 1]. For my needs I needed spliting proportionly, and keeping order same. First I wrote my own script, which works fine, and it's very simple. But I've seen later this answer, where script is better than mine, I reccomend it. Here's my script:

def proportional_dividing(N, n):
    """
    N - length of array (bigger number)
    n - number of chunks (smaller number)
    output - arr, containing N numbers, diveded roundly to n chunks
    """
    arr = []
    if N == 0:
        return arr
    elif n == 0:
        arr.append(N)
        return arr
    r = N // n
    for i in range(n-1):
        arr.append(r)
    arr.append(N-r*(n-1))

    last_n = arr[-1]
    # last number always will be r <= last_n < 2*r
    # when last_n == r it's ok, but when last_n > r ...
    if last_n > r:
        # ... and if difference too big (bigger than 1), then
        if abs(r-last_n) > 1:
            #[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7] # N=29, n=12
            # we need to give unnecessary numbers to first elements back
            diff = last_n - r
            for k in range(diff):
                arr[k] += 1
            arr[-1] = r
            # and we receive [3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2]
    return arr

def split_items(items, chunks):
    arr = proportional_dividing(len(items), chunks)
    splitted = []
    for chunk_size in arr:
        splitted.append(items[:chunk_size])
        items = items[chunk_size:]
    print(splitted)
    return splitted

items = [1,2,3,4,5,6,7,8,9,10,11]
chunks = 3
split_items(items, chunks)
split_items(['a','b','c','d','e','f','g','h','i','g','k','l', 'm'], 3)
split_items(['a','b','c','d','e','f','g','h','i','g','k','l', 'm', 'n'], 3)
split_items(range(100), 4)
split_items(range(99), 4)
split_items(range(101), 4)

and output:

[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11]]
[['a', 'b', 'c', 'd'], ['e', 'f', 'g', 'h'], ['i', 'g', 'k', 'l', 'm']]
[['a', 'b', 'c', 'd', 'e'], ['f', 'g', 'h', 'i', 'g'], ['k', 'l', 'm', 'n']]
[range(0, 25), range(25, 50), range(50, 75), range(75, 100)]
[range(0, 25), range(25, 50), range(50, 75), range(75, 99)]
[range(0, 25), range(25, 50), range(50, 75), range(75, 101)]

@Alex T 2018-01-07 08:58:54

I was curious about the performance of different approaches and here it is:

Tested on Python 3.5.1

import time
batch_size = 7
arr_len = 298937

#---------slice-------------

print("\r\nslice")
start = time.time()
arr = [i for i in range(0, arr_len)]
while True:
    if not arr:
        break

    tmp = arr[0:batch_size]
    arr = arr[batch_size:-1]
print(time.time() - start)

#-----------index-----------

print("\r\nindex")
arr = [i for i in range(0, arr_len)]
start = time.time()
for i in range(0, round(len(arr) / batch_size + 1)):
    tmp = arr[batch_size * i : batch_size * (i + 1)]
print(time.time() - start)

#----------batches 1------------

def batch(iterable, n=1):
    l = len(iterable)
    for ndx in range(0, l, n):
        yield iterable[ndx:min(ndx + n, l)]

print("\r\nbatches 1")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
    tmp = x
print(time.time() - start)

#----------batches 2------------

from itertools import islice, chain

def batch(iterable, size):
    sourceiter = iter(iterable)
    while True:
        batchiter = islice(sourceiter, size)
        yield chain([next(batchiter)], batchiter)


print("\r\nbatches 2")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
    tmp = x
print(time.time() - start)

#---------chunks-------------
def chunks(l, n):
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]
print("\r\nchunks")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in chunks(arr, batch_size):
    tmp = x
print(time.time() - start)

#-----------grouper-----------

from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)

def grouper(iterable, n, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

arr = [i for i in range(0, arr_len)]
print("\r\ngrouper")
start = time.time()
for x in grouper(arr, batch_size):
    tmp = x
print(time.time() - start)

Results:

slice
31.18285083770752

index
0.02184295654296875

batches 1
0.03503894805908203

batches 2
0.22681021690368652

chunks
0.019841909408569336

grouper
0.006506919860839844

@Azat Ibrakov 2018-10-06 09:24:56

benchmarking using time library is not a great idea when we have timeit module

@Peter Gerdes 2017-01-11 09:18:53

I have one solution below which does work but more important than that solution is a few comments on other approaches. First, a good solution shouldn't require that one loop through the sub-iterators in order. If I run

g = paged_iter(list(range(50)), 11))
i0 = next(g)
i1 = next(g)
list(i1)
list(i0)

The appropriate output for the last command is

 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

not

 []

As most of the itertools based solutions here return. This isn't just the usual boring restriction about accessing iterators in order. Imagine a consumer trying to clean up poorly entered data which reversed the appropriate order of blocks of 5, i.e., the data looks like [B5, A5, D5, C5] and should look like [A5, B5, C5, D5] (where A5 is just five elements not a sublist). This consumer would look at the claimed behavior of the grouping function and not hesitate to write a loop like

i = 0
out = []
for it in paged_iter(data,5)
    if (i % 2 == 0):
         swapped = it
    else: 
         out += list(it)
         out += list(swapped)
    i = i + 1

This will produce mysteriously wrong results if you sneakily assume that sub-iterators are always fully used in order. It gets even worse if you want to interleave elements from the chunks.

Second, a decent number of the suggested solutions implicitly rely on the fact that iterators have a deterministic order (they don't e.g. set) and while some of the solutions using islice may be ok it worries me.

Third, the itertools grouper approach works but the recipe relies on internal behavior of the zip_longest (or zip) functions that isn't part of their published behavior. In particular, the grouper function only works because in zip_longest(i0...in) the next function is always called in order next(i0), next(i1), ... next(in) before starting over. As grouper passes n copies of the same iterator object it relies on this behavior.

Finally, while the solution below can be improved if you make the assumption criticized above that sub-iterators are accessed in order and fully perused without this assumption one MUST implicitly (via call chain) or explicitly (via deques or other data structure) store elements for each subiterator somewhere. So don't bother wasting time (as I did) assuming one could get around this with some clever trick.

def paged_iter(iterat, n):
    itr = iter(iterat)
    deq = None
    try:
        while(True):
            deq = collections.deque(maxlen=n)
            for q in range(n):
                deq.append(next(itr))
            yield (i for i in deq)
    except StopIteration:
        yield (i for i in deq)

@Peter Gerdes 2019-05-28 20:31:20

Wow! That's great! Wish I had figured it out. Thanks.

@Анатолий Панин 2017-04-17 15:38:56

One more solution

def make_chunks(data, chunk_size): 
    while data:
        chunk, data = data[:chunk_size], data[chunk_size:]
        yield chunk

>>> for chunk in make_chunks([1, 2, 3, 4, 5, 6, 7], 2):
...     print chunk
... 
[1, 2]
[3, 4]
[5, 6]
[7]
>>> 

@George B 2017-11-03 12:38:56

I don't think I saw this option, so just to add another one :)) :

def chunks(iterable, chunk_size):
  i = 0;
  while i < len(iterable):
    yield iterable[i:i+chunk_size]
    i += chunk_size

@guyskk 2017-10-30 08:13:46

No magic, but simple and correct:

def chunks(iterable, n):
    """Yield successive n-sized chunks from iterable."""
    values = []
    for i, item in enumerate(iterable, 1):
        values.append(item)
        if i % n == 0:
            yield values
            values = []
    if values:
        yield values

@tzot 2008-11-23 15:48:53

Directly from the (old) Python documentation (recipes for itertools):

from itertools import izip, chain, repeat

def grouper(n, iterable, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)

The current version, as suggested by J.F.Sebastian:

#from itertools import izip_longest as zip_longest # for Python 2.x
from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)

def grouper(n, iterable, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

I guess Guido's time machine works—worked—will work—will have worked—was working again.

These solutions work because [iter(iterable)]*n (or the equivalent in the earlier version) creates one iterator, repeated n times in the list. izip_longest then effectively performs a round-robin of "each" iterator; because this is the same iterator, it is advanced by each such call, resulting in each such zip-roundrobin generating one tuple of n items.

@tzot 2011-04-19 13:09:44

@ninjagecko: list(grouper(3, range(10))) returns [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, None, None)], and all tuples are of length 3. Please elaborate on your comment because I can't understand it; what do you call a thing and how do you define it being a multiple of 3 in “expecting your thing to be a multiple of 3”? Thank you in advance.

@ninjagecko 2011-04-19 15:51:59

If it is incorrect behavior for the user's code to have a tuple with None, they need to explicitly raise an error if len('0123456789')%3 != 0. This is not a bad thing, but a thing which could be documented. Oh wait my apologies... it is documented implicitly in by the padvalue=None argument. (Also by '3' I meant 'n') Nice code.

@Michael Dillon 2012-01-30 23:47:08

upvoted this because it works on generators (no len) and uses the generally faster itertools module.

@wim 2013-04-12 05:40:11

A classic example of fancy itertools functional approach turning out some unreadable sludge, when compared to a simple and naive pure python implementation

@tzot 2013-04-12 11:36:07

@wim Given that this answer began as a snippet from the Python documentation, I'd suggest you open an issue on bugs.python.org .

@endolith 2017-10-20 04:06:47

@tzot Apparently it's been brought up and rejected many times: grokbase.com/t/python/python-ideas/126tzj5djb/…

@Juan Carlos Ramirez 2019-03-14 16:11:52

For reference, the solution is part of the docs, under itertools recipes: docs.python.org/3/library/itertools.html#itertools-recipes

@pedrosaurio 2019-08-20 06:37:14

Can someone explain or point me to the right concept of why is there a * before [iter(iterable)]*n ?

@tzot 2019-08-21 08:02:56

@pedrosaurio if l==[1, 2, 3] then f(*l) is equivalent to f(1, 2, 3). See that question and the official documentation.

@Andrey Cizov 2017-07-06 22:24:14

This works in v2/v3, is inlineable, generator-based and uses only the standard library:

import itertools
def split_groups(iter_in, group_size):
    return ((x for _, x in item) for _, item in itertools.groupby(enumerate(iter_in), key=lambda x: x[0] // group_size))

@Andrey Cizov 2018-02-24 21:55:48

Just do a (list(x) for x in split_groups('abcdefghij', 4)), then iterate through them: as opposed to many examples here this would work with groups of any size.

@Noich 2015-03-12 12:36:10

I saw the most awesome Python-ish answer in a duplicate of this question:

from itertools import zip_longest

a = range(1, 16)
i = iter(a)
r = list(zip_longest(i, i, i))
>>> print(r)
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, 15)]

You can create n-tuple for any n. If a = range(1, 15), then the result will be:

[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, None)]

If the list is divided evenly, then you can replace zip_longest with zip, otherwise the triplet (13, 14, None) would be lost. Python 3 is used above. For Python 2, use izip_longest.

@Tom Smith 2015-05-18 14:21:47

that is nice if your list and chunks are short, how could you adapt this to split your list in to chunks of 1000 though? you"re not going to code zip(i,i,i,i,i,i,i,i,i,i.....i=1000)

@Wilson F 2015-06-28 04:52:00

zip(i, i, i, ... i) with "chunk_size" arguments to zip() can be written as zip(*[i]*chunk_size) Whether that's a good idea or not is debatable, of course.

@Aaron Hall 2016-07-08 03:37:09

The downside of this is that if you aren't dividing evenly, you'll drop elements, as zip stops at the shortest iterable - & izip_longest would add default elements.

@Ioannis Filippidis 2017-06-21 13:28:18

zip_longest should be used, as done in: stackoverflow.com/a/434411/1959808

@Ioannis Filippidis 2017-06-21 13:34:47

The answer with range(1, 15) is already missing elements, because there are 14 elements in range(1, 15), not 15.

@itub 2017-03-08 17:03:46

Here's an idea using itertools.groupby:

def chunks(l, n):
    c = itertools.count()
    return (it for _, it in itertools.groupby(l, lambda x: next(c)//n))

This returns a generator of generators. If you want a list of lists, just replace the last line with

    return [list(it) for _, it in itertools.groupby(l, lambda x: next(c)//n)]

Example returning list of lists:

>>> chunks('abcdefghij', 4)
[['a', 'b', 'c', 'd'], ['e', 'f', 'g', 'h'], ['i', 'j']]

(So yes, this suffers form the "runt problem", which may or may not be a problem in a given situation.)

@Peter Gerdes 2017-12-19 10:19:49

Again this fails if the sub-iterators are not evaluated in order in the generator case. Let c = chunks('abcdefghij', 4) (as generator). Then set i0 = next(c); i1 = next(c); list(i1) //FINE; list(i0) //UHHOH

@itub 2017-12-19 16:12:31

@PeterGerdes, thank you for noting that omission; I forgot because I always used the groupby generators in order. The documentation does mention this limitation: "Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible."

@Yuri Feldman 2019-05-13 18:18:16

@PeterGerdes I think this can be solved using enumerate instead, like so: [[x for _, x in it] for _, it in itertools.groupby(enumerate(l), lambda x: x[0]//n)] (list(it) is a list of (index, element) pairs due to enumerate)

@Moinuddin Quadri 2017-01-27 23:12:07

You may also use get_chunks function of utilspie library as:

>>> from utilspie import iterutils
>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> list(iterutils.get_chunks(a, 5))
[[1, 2, 3, 4, 5], [6, 7, 8, 9]]

You can install utilspie via pip:

sudo pip install utilspie

Disclaimer: I am the creator of utilspie library.

@Endle_Zhenbo 2017-02-25 14:03:03

Looks cool. How about the performance of this lib?

@AlexG 2016-11-20 04:32:29

You could use numpy's array_split function e.g., np.array_split(np.array(data), 20) to split into 20 nearly equal size chunks.

To make sure chunks are exactly equal in size use np.split.

@vishes_shell 2016-11-03 19:10:45

Since everybody here talking about iterators. boltons has perfect method for that, called iterutils.chunked_iter.

from boltons import iterutils

list(iterutils.chunked_iter(list(range(50)), 11))

Output:

[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
 [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
 [33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
 [44, 45, 46, 47, 48, 49]]

But if you don't want to be mercy on memory, you can use old-way and store the full list in the first place with iterutils.chunked.

@Peter Gerdes 2017-12-19 10:32:55

And this one actually works regardless of order one looks at the subiterators!!

@Claudiu 2016-08-06 20:44:07

As per this answer, the top-voted answer leaves a 'runt' at the end. Here's my solution to really get about as evenly-sized chunks as you can, with no runts. It basically tries to pick exactly the fractional spot where it should split the list, but just rounds it off to the nearest integer:

from __future__ import division  # not needed in Python 3
def n_even_chunks(l, n):
    """Yield n as even chunks as possible from l."""
    last = 0
    for i in range(1, n+1):
        cur = int(round(i * (len(l) / n)))
        yield l[last:cur]
        last = cur

Demonstration:

>>> pprint.pprint(list(n_even_chunks(list(range(100)), 9)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
 [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
 [33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
 [44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55],
 [56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66],
 [67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77],
 [78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88],
 [89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]]
>>> pprint.pprint(list(n_even_chunks(list(range(100)), 11)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8],
 [9, 10, 11, 12, 13, 14, 15, 16, 17],
 [18, 19, 20, 21, 22, 23, 24, 25, 26],
 [27, 28, 29, 30, 31, 32, 33, 34, 35],
 [36, 37, 38, 39, 40, 41, 42, 43, 44],
 [45, 46, 47, 48, 49, 50, 51, 52, 53, 54],
 [55, 56, 57, 58, 59, 60, 61, 62, 63],
 [64, 65, 66, 67, 68, 69, 70, 71, 72],
 [73, 74, 75, 76, 77, 78, 79, 80, 81],
 [82, 83, 84, 85, 86, 87, 88, 89, 90],
 [91, 92, 93, 94, 95, 96, 97, 98, 99]]

Compare to the top-voted chunks answer:

>>> pprint.pprint(list(chunks(list(range(100)), 100//9)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
 [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
 [33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
 [44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54],
 [55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65],
 [66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76],
 [77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87],
 [88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98],
 [99]]
>>> pprint.pprint(list(chunks(list(range(100)), 100//11)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8],
 [9, 10, 11, 12, 13, 14, 15, 16, 17],
 [18, 19, 20, 21, 22, 23, 24, 25, 26],
 [27, 28, 29, 30, 31, 32, 33, 34, 35],
 [36, 37, 38, 39, 40, 41, 42, 43, 44],
 [45, 46, 47, 48, 49, 50, 51, 52, 53],
 [54, 55, 56, 57, 58, 59, 60, 61, 62],
 [63, 64, 65, 66, 67, 68, 69, 70, 71],
 [72, 73, 74, 75, 76, 77, 78, 79, 80],
 [81, 82, 83, 84, 85, 86, 87, 88, 89],
 [90, 91, 92, 93, 94, 95, 96, 97, 98],
 [99]]

@DragonTux 2016-09-05 10:45:29

This solution seems to fail in some situations: - when n > len(l) - for l = [0,1,2,3,4] and n=3 it returns [[0], [1], [2]] instead of [[0,1], [2,3], [4]]

@Claudiu 2016-09-05 17:24:57

@DragonTux: Ah I wrote the function for Python 3 - it gives [[0, 1], [2], [3, 4]]. I added the future import so it works in Python 2 as well

@DragonTux 2016-09-09 15:18:31

Thanks a lot. I keep forgetting the subtle differences between Python 2 and 3.

@Art B 2015-07-02 07:32:49

code:

def split_list(the_list, chunk_size):
    result_list = []
    while the_list:
        result_list.append(the_list[:chunk_size])
        the_list = the_list[chunk_size:]
    return result_list

a_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

print split_list(a_list, 3)

result:

[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]

@Mars 2010-02-16 05:49:47

Without calling len() which is good for large lists:

def splitter(l, n):
    i = 0
    chunk = l[:n]
    while chunk:
        yield chunk
        i += n
        chunk = l[i:i+n]

And this is for iterables:

def isplitter(l, n):
    l = iter(l)
    chunk = list(islice(l, n))
    while chunk:
        yield chunk
        chunk = list(islice(l, n))

The functional flavour of the above:

def isplitter2(l, n):
    return takewhile(bool,
                     (tuple(islice(start, n))
                            for start in repeat(iter(l))))

OR:

def chunks_gen_sentinel(n, seq):
    continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
    return iter(imap(tuple, continuous_slices).next,())

OR:

def chunks_gen_filter(n, seq):
    continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
    return takewhile(bool,imap(tuple, continuous_slices))

@Thomas Wouters 2011-05-30 10:03:12

There is no reason to avoid len() on large lists; it's a constant-time operation.

@Riaz Rizvi 2015-12-16 21:42:56

[AA[i:i+SS] for i in range(len(AA))[::SS]]

Where AA is array, SS is chunk size. For example:

>>> AA=range(10,21);SS=3
>>> [AA[i:i+SS] for i in range(len(AA))[::SS]]
[[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20]]
# or [range(10, 13), range(13, 16), range(16, 19), range(19, 21)] in py3

@F.Tamy 2019-10-26 10:12:39

it is the best and simple.

Related Questions

Sponsored Content

45 Answered Questions

[SOLVED] How to make a flat list out of list of lists?

16 Answered Questions

[SOLVED] How do you change the size of figures drawn with matplotlib?

26 Answered Questions

[SOLVED] How do I concatenate two lists in Python?

28 Answered Questions

[SOLVED] How do I check if a list is empty?

  • 2008-09-10 06:20:11
  • Ray Vega
  • 2549961 View
  • 3235 Score
  • 28 Answer
  • Tags:   python list

13 Answered Questions

[SOLVED] How to randomly select an item from a list?

  • 2008-11-20 18:42:21
  • Ray Vega
  • 1318534 View
  • 1660 Score
  • 13 Answer
  • Tags:   python list random

16 Answered Questions

[SOLVED] How to clone or copy a list?

7 Answered Questions

[SOLVED] How do I get the number of elements in a list?

  • 2009-11-11 00:30:54
  • y2k
  • 3163937 View
  • 1850 Score
  • 7 Answer
  • Tags:   python list

21 Answered Questions

[SOLVED] How do I list all files of a directory?

  • 2010-07-08 19:31:22
  • duhhunjonn
  • 3764060 View
  • 3474 Score
  • 21 Answer
  • Tags:   python directory

32 Answered Questions

[SOLVED] How do I split a string on a delimiter in Bash?

19 Answered Questions

[SOLVED] How to remove an element from a list by index?

  • 2009-03-09 18:16:11
  • Joan Venge
  • 2415384 View
  • 1385 Score
  • 19 Answer
  • Tags:   python list

Sponsored Content