By Ben Blank


2009-01-12 02:48:22 8 Comments

I have a Python script which takes as input a list of integers, which I need to work with four integers at a time. Unfortunately, I don't have control of the input, or I'd have it passed in as a list of four-element tuples. Currently, I'm iterating over it this way:

for i in xrange(0, len(ints), 4):
    # dummy op for example code
    foo += ints[i] * ints[i + 1] + ints[i + 2] * ints[i + 3]

It looks a lot like "C-think", though, which makes me suspect there's a more pythonic way of dealing with this situation. The list is discarded after iterating, so it needn't be preserved. Perhaps something like this would be better?

while ints:
    foo += ints[0] * ints[1] + ints[2] * ints[3]
    ints[0:4] = []

Still doesn't quite "feel" right, though. :-/

Related question: How do you split a list into evenly sized chunks in Python?

30 comments

@nosklo 2009-01-12 03:10:17

def chunker(seq, size):
    return (seq[pos:pos + size] for pos in range(0, len(seq), size))
# (in python 2 use xrange() instead of range() to avoid allocating a list)

Simple. Easy. Fast. Works with any sequence:

text = "I am a very, very helpful text"

for group in chunker(text, 7):
   print repr(group),
# 'I am a ' 'very, v' 'ery hel' 'pful te' 'xt'

print '|'.join(chunker(text, 10))
# I am a ver|y, very he|lpful text

animals = ['cat', 'dog', 'rabbit', 'duck', 'bird', 'cow', 'gnu', 'fish']

for group in chunker(animals, 3):
    print group
# ['cat', 'dog', 'rabbit']
# ['duck', 'bird', 'cow']
# ['gnu', 'fish']

@jfs 2009-01-12 14:39:37

@Carlos Crasborn's version works for any iterable (not just sequences as the above code); it is concise and probably just as fast or even faster. Though it might be a bit obscure (unclear) for people unfamiliar with itertools module.

@Ben Blank 2009-01-12 22:03:20

@J.F. Sebastian — Now that I've gotten the chance to figure out why his code works, I feel compelled to change my accepted answer (which I hate doing). I love this answer, too, @nosklo, but that izip_longest trick seems tailor-made for my situation.

@Matt Williamson 2010-08-08 04:08:34

Agreed. This is the most generic and pythonic way. Clear and concise. (and works on app engine)

@RoboCop87 2014-07-03 15:41:39

I was having trouble using this, but it started working when I replaced the outside parens with square brackets. Is the syntax in the answer Python 3 only?

@Dror 2015-02-24 08:23:32

With 3.x I only had to replace xrange with range. See: stackoverflow.com/a/15014576/671013

@Dror 2015-02-24 08:59:57

Note that chunker returns a generator. Replace the return to: return [...] to get a list.

@Alfe 2016-04-15 10:22:29

Instead of writing a function building and then returning a generator, you could also write a generator directly, using yield: for pos in xrange(0, len(seq), size): yield seq[pos:pos + size]. I'm not sure if internally this would be handled any differently in any relevant aspect, but it might be even a tiny bit clearer.

@apollov 2017-12-22 18:17:27

Note this works only for sequences that support items access by index and won't work for generic iterators, because they may not support __getitem__ method.

@MSeifert 2017-09-29 14:29:12

If you don't mind using an external package you could use iteration_utilities.grouper from iteration_utilties 1. It supports all iterables (not just sequences):

from iteration_utilities import grouper
seq = list(range(20))
for group in grouper(seq, 4):
    print(group)

which prints:

(0, 1, 2, 3)
(4, 5, 6, 7)
(8, 9, 10, 11)
(12, 13, 14, 15)
(16, 17, 18, 19)

In case the length isn't a multiple of the groupsize it also supports filling (the incomplete last group) or truncating (discarding the incomplete last group) the last one:

from iteration_utilities import grouper
seq = list(range(17))
for group in grouper(seq, 4):
    print(group)
# (0, 1, 2, 3)
# (4, 5, 6, 7)
# (8, 9, 10, 11)
# (12, 13, 14, 15)
# (16,)

for group in grouper(seq, 4, fillvalue=None):
    print(group)
# (0, 1, 2, 3)
# (4, 5, 6, 7)
# (8, 9, 10, 11)
# (12, 13, 14, 15)
# (16, None, None, None)

for group in grouper(seq, 4, truncate=True):
    print(group)
# (0, 1, 2, 3)
# (4, 5, 6, 7)
# (8, 9, 10, 11)
# (12, 13, 14, 15)

1 Disclaimer: I'm the author of that package.

@frankish 2017-07-20 15:20:06

I never want my chunks padded, so that requirement is essential. I find that the ability to work on any iterable is also requirement. Given that, I decided to extend on the accepted answer, https://stackoverflow.com/a/434411/1074659.

Performance takes a slight hit in this approach if padding is not wanted due to the need to compare and filter the padded values. However, for large chunk sizes, this utility is very performant.

#!/usr/bin/env python3
from itertools import zip_longest


_UNDEFINED = object()


def chunker(iterable, chunksize, fillvalue=_UNDEFINED):
    """
    Collect data into chunks and optionally pad it.

    Performance worsens as `chunksize` approaches 1.

    Inspired by:
        https://docs.python.org/3/library/itertools.html#itertools-recipes

    """
    args = [iter(iterable)] * chunksize
    chunks = zip_longest(*args, fillvalue=fillvalue)
    yield from (
        filter(lambda val: val is not _UNDEFINED, chunk)
        if chunk[-1] is _UNDEFINED
        else chunk
        for chunk in chunks
    ) if fillvalue is _UNDEFINED else chunks

@serv-inc 2017-07-13 08:27:40

This answer splits a list of strings, f.ex. to achieve PEP8-line length compliance:

def split(what, target_length=79):
    '''splits list of strings into sublists, each 
    having string length at most 79'''
    out = [[]]
    while what:
        if len("', '".join(out[-1])) + len(what[0]) < target_length:
            out[-1].append(what.pop(0))
        else:
            if not out[-1]: # string longer than target_length
                out[-1] = [what.pop(0)]
            out.append([])
    return out

Use as

>>> split(['deferred_income', 'long_term_incentive', 'restricted_stock_deferred', 'shared_receipt_with_poi', 'loan_advances', 'from_messages', 'other', 'director_fees', 'bonus', 'total_stock_value', 'from_poi_to_this_person', 'from_this_person_to_poi', 'restricted_stock', 'salary', 'total_payments', 'exercised_stock_options'], 75)
[['deferred_income', 'long_term_incentive', 'restricted_stock_deferred'], ['shared_receipt_with_poi', 'loan_advances', 'from_messages', 'other'], ['director_fees', 'bonus', 'total_stock_value', 'from_poi_to_this_person'], ['from_this_person_to_poi', 'restricted_stock', 'salary', 'total_payments'], ['exercised_stock_options']]

@Andrey Cizov 2017-07-06 22:20:50

Quite pythonic here (you may also inline the body of the split_groups function)

import itertools
def split_groups(iter_in, group_size):
    return ((x for _, x in item) for _, item in itertools.groupby(enumerate(iter_in), key=lambda x: x[0] // group_size))

for x, y, z, w in split_groups(range(16), 4):
    foo += x * y + z * w

@BallpointBen 2017-06-09 15:50:26

I like this approach. It feels simple and not magical and supports all iterable types and doesn't require imports.

def chunk_iter(iterable, chunk_size):
it = iter(iterable)
while True:
    chunk = tuple(next(it) for _ in range(chunk_size))
    if not chunk:
        break
    yield chunk

@Craz 2009-01-12 04:07:20

Modified from the recipes section of Python's itertools docs:

from itertools import izip_longest

def grouper(iterable, n, fillvalue=None):
    args = [iter(iterable)] * n
    return izip_longest(*args, fillvalue=fillvalue)

Example
In pseudocode to keep the example terse.

grouper('ABCDEFG', 3, 'x') --> 'ABC' 'DEF' 'Gxx'

Note: izip_longest is new to Python 2.6. In Python 3 use zip_longest.

@jfs 2009-01-12 14:53:18

I know it is taken literally from documentation but I'd change the order of parameters: grouper(iterable, chunksize) and izip_longest(*args, fillvalue=fillvalue)

@Ben Blank 2009-01-12 17:47:51

Very nice! Probably the most compact method here, considering it even combines chunking and padding. Unfortunately, it's pretty opaque. Even having read up on izip_longest, I'm still not sure how this works. :-/

@Craz 2009-01-12 20:18:07

@J.F. Sebastian: Thanks. That does follow common convention.

@Ben Blank 2009-01-12 22:00:54

Finally got a chance to play around with this in a python session. For those who are as confused as I was, this is feeding the same iterator to izip_longest multiple times, causing it to consume successive values of the same sequence rather than striped values from separate sequences. I love it!

@gotgenes 2009-08-26 22:48:44

What's the best way to filter back out the fillvalue? ([item for item in items if item is not fillvalue] for items in grouper(iterable))?

@Utku Zihnioglu 2011-02-15 00:01:18

I am not sure if this is the most pythonic answer but it possibly is the best use of [LIST]*n structure.

@David B. 2011-07-23 05:59:44

This works, but it seems interpreter implementation-dependent. Does the itertools.izip_longest specification actually guarantee a striped access order for the iterators (e.g., with 3 iterators A,B, and C, the access ordering will be A,B,C,A,B,C,A,Fill,C and not something like A,A,B,B,C,C,A,Fill,C or A,B,C,C,B,A,A,Fill,C? I could see the latter orderings being useful for cache-line performance optimization. If the single-striping access ordering is not guaranteed, this isn't a theoretically safe solution (although speaking practically, most implementations will single-step the iterators).

@ninjagecko 2012-04-28 14:55:13

You can combine this all into a short one-liner: zip(*[iter(yourList)]*n) (or izip_longest with fillvalue)

@anatoly techtonik 2013-04-28 15:07:45

I suspect that the performance of this grouper recipe for 256k sized chunks will be very poor, because izip_longest will be fed 256k arguments.

@davidgoli 2013-11-26 02:31:01

What is an izip_longest object, why doesn't it behave like a list, and why is it returned from this? Why must I call list() on it, why doesn't it just return a new list?

@jfs 2014-04-23 21:21:31

@techtonik: could you provide a benchmark that compares it with a faster method?

@jfs 2014-04-23 21:26:24

@DavidB.: the recipe is given as the example code in the officical documentation. Unless it is a bug; the behaviour is guaranteed

@anatoly techtonik 2014-04-24 18:57:36

@J.F.Sebastian: No, I can't do benchmarks, but slicing of list to 256k chunks should be faster than rebuilding it iteratively.

@jfs 2014-04-24 19:14:36

@techtonik: what data do you have to support your claim?

@anatoly techtonik 2014-04-28 10:59:12

@J.F.Sebastian: It is not a claim, it is an assumption based on the logic rather than observation. The logic is that amount of loops with slicing (~ len(seq)/len(chunk)) is less than with iterating (~ len(chunk)) when chunk size is big.

@Suor 2014-06-04 20:08:21

To be efficient with large n you will need to manage a pool and feed it with islice like here github.com/Suor/funcy/blob/1.0.0/funcy/seqs.py#L293

@LondonRob 2015-08-14 07:00:22

In several places commenters say "when I finally worked out how this worked...." Maybe a bit of explanation is required. Particularly the list of iterators aspect.

@PaulMcG 2015-10-17 14:53:47

I prefer putting the arguments in order of the size first followed by the sequence. This makes it easy to create partials for chunking by a certain amount, and then just passing different sequences to them.

@Chris_Rands 2016-09-15 08:06:31

It needs to be said that izip_longest is zip_longest in python3

@jfs 2016-12-02 23:37:26

@anatolytechtonik performance is judged by measurements on real data on real hardware, not hypothetical "logic". Over the years, I've used the grouper() recipe many times. I've encountered cases when you cannot use slicing because the input is not a sequence. I don't remember a single case when I would replace the grouper()-like code with the chunker()-like code due to performance concerns (I might replace it for readability in the code for beginners). YMMV

@flutefreak7 2018-04-01 07:40:07

@gotgenes, (filter(None, chunk) for chunk in zip_longest(*[iter(yourList)]*n) will provide a chunk generator. Each chunk is itself a generator (using filter) which will skip the fill values.

@Valentas 2018-10-08 09:48:55

if next() for exhausted iterator happens to be slow (eg. psycopg2 cursor), the last chunk of zip_longest is on average n/2 times slower than it should be.

@ksindi 2016-02-25 13:25:54

def chunker(iterable, n):
    """Yield iterable in chunk sizes.

    >>> chunks = chunker('ABCDEF', n=4)
    >>> chunks.next()
    ['A', 'B', 'C', 'D']
    >>> chunks.next()
    ['E', 'F']
    """
    it = iter(iterable)
    while True:
        chunk = []
        for i in range(n):
            try:
                chunk.append(it.next())
            except StopIteration:
                yield chunk
                raise StopIteration
        yield chunk

if __name__ == '__main__':
    import doctest

    doctest.testmod()

@topkara 2016-02-25 03:10:02

It is easy to make itertools.groupby work for you to get an iterable of iterables, without creating any temporary lists:

groupby(iterable, (lambda x,y: (lambda z: x.next()/y))(count(),100))

Don't get put off by the nested lambdas, outer lambda runs just once to put count() generator and the constant 100 into the scope of the inner lambda.

I use this to send chunks of rows to mysql.

for k,v in groupby(bigdata, (lambda x,y: (lambda z: x.next()/y))(count(),100))):
    cursor.executemany(sql, v)

@Cuadue 2015-04-10 18:07:52

Here is a chunker without imports that supports generators:

def chunks(seq, size):
    it = iter(seq)
    while True:
        ret = tuple(it.next() for _ in range(size))
        if len(ret) == size:
            yield ret
        else:
            raise StopIteration()

Example of use:

>>> def foo():
...     i = 0
...     while True:
...         i += 1
...         yield i
...
>>> c = chunks(foo(), 3)
>>> c.next()
(1, 2, 3)
>>> c.next()
(4, 5, 6)
>>> list(chunks('abcdefg', 2))
[('a', 'b'), ('c', 'd'), ('e', 'f')]

@GingerPlusPlus 2014-12-02 19:32:56

About solution gave by J.F. Sebastian here:

def chunker(iterable, chunksize):
    return zip(*[iter(iterable)]*chunksize)

It's clever, but has one disadvantage - always return tuple. How to get string instead?
Of course you can write ''.join(chunker(...)), but the temporary tuple is constructed anyway.

You can get rid of the temporary tuple by writing own zip, like this:

class IteratorExhausted(Exception):
    pass

def translate_StopIteration(iterable, to=IteratorExhausted):
    for i in iterable:
        yield i
    raise to # StopIteration would get ignored because this is generator,
             # but custom exception can leave the generator.

def custom_zip(*iterables, reductor=tuple):
    iterators = tuple(map(translate_StopIteration, iterables))
    while True:
        try:
            yield reductor(next(i) for i in iterators)
        except IteratorExhausted: # when any of iterators get exhausted.
            break

Then

def chunker(data, size, reductor=tuple):
    return custom_zip(*[iter(data)]*size, reductor=reductor)

Example usage:

>>> for i in chunker('12345', 2):
...     print(repr(i))
...
('1', '2')
('3', '4')
>>> for i in chunker('12345', 2, ''.join):
...     print(repr(i))
...
'12'
'34'

@Alfe 2016-04-15 10:32:46

Not a critique meant for you to change your answer, but rather a comment: Code is a liability. The more code you write the more space you create for bugs to hide. From this point of view, rewriting zip instead of using the existing one seems not to be the best idea.

@GingerPlusPlus 2014-11-26 08:21:04

At first, I designed it to split strings into substrings to parse string containing hex.
Today I turned it into complex, but still simple generator.

def chunker(iterable, size, reductor, condition):
    it = iter(iterable)
    def chunk_generator():
        return (next(it) for _ in range(size))
    chunk = reductor(chunk_generator())
    while condition(chunk):
        yield chunk
        chunk = reductor(chunk_generator())

Arguments:

Obvious ones

  • iterable is any iterable / iterator / generator containg / generating / iterating over input data,
  • size is, of course, size of chunk you want get,

More interesting

  • reductor is a callable, which receives generator iterating over content of chunk.
    I'd expect it to return sequence or string, but I don't demand that.

    You can pass as this argument for example list, tuple, set, frozenset,
    or anything fancier. I'd pass this function, returning string
    (provided that iterable contains / generates / iterates over strings):

    def concatenate(iterable):
        return ''.join(iterable)
    

    Note that reductor can cause closing generator by raising exception.

  • condition is a callable which receives anything what reductor returned.
    It decides to approve & yield it (by returning anything evaluating to True),
    or to decline it & finish generator's work (by returning anything other or raising exception).

    When number of elements in iterable is not divisible by size, when it gets exhausted, reductor will receive generator generating less elements than size.
    Let's call these elements lasts elements.

    I invited two functions to pass as this argument:

    • lambda x:x - the lasts elements will be yielded.

    • lambda x: len(x)==<size> - the lasts elements will be rejected.
      replace <size> using number equal to size

@endolith 2014-11-19 04:09:38

With NumPy it's simple:

ints = array([1, 2, 3, 4, 5, 6, 7, 8])
for int1, int2 in ints.reshape(-1, 2):
    print(int1, int2)

output:

1 2
3 4
5 6
7 8

@John Mee 2014-10-18 08:42:28

To avoid all conversions to a list import itertools and:

>>> for k, g in itertools.groupby(xrange(35), lambda x: x/10):
...     list(g)

Produces:

... 
0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
2 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
3 [30, 31, 32, 33, 34]
>>> 

I checked groupby and it doesn't convert to list or use len so I (think) this will delay resolution of each value until it is actually used. Sadly none of the available answers (at this time) seemed to offer this variation.

Obviously if you need to handle each item in turn nest a for loop over g:

for k,g in itertools.groupby(xrange(35), lambda x: x/10):
    for i in g:
       # do what you need to do with individual items
    # now do what you need to do with the whole group

My specific interest in this was the need to consume a generator to submit changes in batches of up to 1000 to the gmail API:

    messages = a_generator_which_would_not_be_smart_as_a_list
    for idx, batch in groupby(messages, lambda x: x/1000):
        batch_request = BatchHttpRequest()
        for message in batch:
            batch_request.add(self.service.users().messages().modify(userId='me', id=message['id'], body=msg_labels))
        http = httplib2.Http()
        self.credentials.authorize(http)
        batch_request.execute(http=http)

@PaulMcG 2015-10-17 14:33:31

What if the list you are chunking is something other than a sequence of ascending integers?

@John Mee 2015-10-19 04:55:36

@PaulMcGuire see groupby; given a function to describe order then elements of the iterable can be anything, right?

@PaulMcG 2015-10-19 21:36:20

Yes, I'm familiar with groupby. But if messages were the letters "ABCDEFG", then groupby(messages, lambda x: x/3) would give you a TypeError (for trying to divide a string by an int), not 3-letter groupings. Now if you did groupby(enumerate(messages), lambda x: x[0]/3) you might have something. But you didn't say that in your post.

@Tutul 2014-09-01 12:44:40

One-liner, adhoc solution to iterate over a list x in chunks of size 4 -

for a, b, c, d in zip(x[0::4], x[1::4], x[2::4], x[3::4]):
    ... do something with a, b, c and d ...

@Suor 2014-06-04 20:13:23

You can use partition or chunks function from funcy library:

from funcy import partition

for a, b, c, d in partition(4, ints):
    foo += a * b * c * d

These functions also has iterator versions ipartition and ichunks, which will be more efficient in this case.

You can also peek at their implementation.

@senderle 2014-02-26 17:52:46

Another approach would be to use the two-argument form of iter:

from itertools import islice

def group(it, size):
    it = iter(it)
    return iter(lambda: tuple(islice(it, size)), ())

This can be adapted easily to use padding (this is similar to Markus Jarderot’s answer):

from itertools import islice, chain, repeat

def group_pad(it, size, pad=None):
    it = chain(iter(it), repeat(pad))
    return iter(lambda: tuple(islice(it, size)), (pad,) * size)

These can even be combined for optional padding:

_no_pad = object()
def group(it, size, pad=_no_pad):
    if pad == _no_pad:
        it = iter(it)
        sentinel = ()
    else:
        it = chain(iter(it), repeat(pad))
        sentinel = (pad,) * size
    return iter(lambda: tuple(islice(it, size)), sentinel)

@n611x007 2014-04-24 09:50:30

preferable because you have the option to omit the padding!

@Wilfred Hughes 2014-02-20 11:45:37

def group_by(iterable, size):
    """Group an iterable into lists that don't exceed the size given.

    >>> group_by([1,2,3,4,5], 2)
    [[1, 2], [3, 4], [5]]

    """
    sublist = []

    for index, item in enumerate(iterable):
        if index > 0 and index % size == 0:
            yield sublist
            sublist = []

        sublist.append(item)

    if sublist:
        yield sublist

@n611x007 2014-04-24 09:54:59

+1 it omits padding ; yours and bcoughlan's is very similar

@bcoughlan 2013-08-14 23:24:51

I needed a solution that would also work with sets and generators. I couldn't come up with anything very short and pretty, but it's quite readable at least.

def chunker(seq, size):
    res = []
    for el in seq:
        res.append(el)
        if len(res) == size:
            yield res
            res = []
    if res:
        yield res

List:

>>> list(chunker([i for i in range(10)], 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

Set:

>>> list(chunker(set([i for i in range(10)]), 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

Generator:

>>> list(chunker((i for i in range(10)), 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

@n611x007 2014-04-24 09:55:30

+1 it omits padding ; very similar to answer of Wilfred Hughes

@Will 2013-02-21 10:40:10

Using little functions and things really doesn't appeal to me; I prefer to just use slices:

data = [...]
chunk_size = 10000 # or whatever
chunks = [data[i:i+chunk_size] for i in xrange(0,len(data),chunk_size)]
for chunk in chunks:
    ...

@n611x007 2014-04-24 09:57:22

nice but no good for an indefinite stream which has no known len. you can do a test with itertools.repeat or itertools.cycle.

@n611x007 2014-04-24 10:00:41

Also, eats up memory because of using a [...for...] list comprehension to physically build a list instead of using a (...for...) generator expression which would just care about the next element and spare memory

@kriss 2012-12-06 01:56:30

Similar to other proposals, but not exactly identical, I like doing it this way, because it's simple and easy to read:

it = iter([1, 2, 3, 4, 5, 6, 7, 8, 9])
for chunk in zip(it, it, it, it):
    print chunk

>>> (1, 2, 3, 4)
>>> (5, 6, 7, 8)

This way you won't get the last partial chunk. If you want to get (9, None, None, None) as last chunk, just use izip_longest from itertools.

@elhefe 2012-11-11 21:14:55

Yet another answer, the advantages of which are:

1) Easily understandable
2) Works on any iterable, not just sequences (some of the above answers will choke on filehandles)
3) Does not load the chunk into memory all at once
4) Does not make a chunk-long list of references to the same iterator in memory
5) No padding of fill values at the end of the list

That being said, I haven't timed it so it might be slower than some of the more clever methods, and some of the advantages may be irrelevant given the use case.

def chunkiter(iterable, size):
  def inneriter(first, iterator, size):
    yield first
    for _ in xrange(size - 1): 
      yield iterator.next()
  it = iter(iterable)
  while True:
    yield inneriter(it.next(), it, size)

In [2]: i = chunkiter('abcdefgh', 3)
In [3]: for ii in i:                                                
          for c in ii:
            print c,
          print ''
        ...:     
        a b c 
        d e f 
        g h 

Update:
A couple of drawbacks due to the fact the inner and outer loops are pulling values from the same iterator:
1) continue doesn't work as expected in the outer loop - it just continues on to the next item rather than skipping a chunk. However, this doesn't seem like a problem as there's nothing to test in the outer loop.
2) break doesn't work as expected in the inner loop - control will wind up in the inner loop again with the next item in the iterator. To skip whole chunks, either wrap the inner iterator (ii above) in a tuple, e.g. for c in tuple(ii), or set a flag and exhaust the iterator.

@rhettg 2012-05-29 00:50:20

The ideal solution for this problem works with iterators (not just sequences). It should also be fast.

This is the solution provided by the documentation for itertools:

def grouper(n, iterable, fillvalue=None):
    #"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return itertools.izip_longest(fillvalue=fillvalue, *args)

Using ipython's %timeit on my mac book air, I get 47.5 us per loop.

However, this really doesn't work for me since the results are padded to be even sized groups. A solution without the padding is slightly more complicated. The most naive solution might be:

def grouper(size, iterable):
    i = iter(iterable)
    while True:
        out = []
        try:
            for _ in range(size):
                out.append(i.next())
        except StopIteration:
            yield out
            break

        yield out

Simple, but pretty slow: 693 us per loop

The best solution I could come up with uses islice for the inner loop:

def grouper(size, iterable):
    it = iter(iterable)
    while True:
        group = tuple(itertools.islice(it, None, size))
        if not group:
            break
        yield group

With the same dataset, I get 305 us per loop.

Unable to get a pure solution any faster than that, I provide the following solution with an important caveat: If your input data has instances of filldata in it, you could get wrong answer.

def grouper(n, iterable, fillvalue=None):
    #"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    for i in itertools.izip_longest(fillvalue=fillvalue, *args):
        if tuple(i)[-1] == fillvalue:
            yield tuple(v for v in i if v != fillvalue)
        else:
            yield i

I really don't like this answer, but it is significantly faster. 124 us per loop

@ShadowRanger 2016-09-30 01:14:07

You can reduce runtime for recipe #3 by ~10-15% by moving it to the C layer (omitting itertools imports; map must be Py3 map or imap): def grouper(n, it): return takewhile(bool, map(tuple, starmap(islice, repeat((iter(it), n))))). Your final function can be made less brittle by using a sentinel: get rid of the fillvalue argument; add a first line fillvalue = object(), then change the if check to if i[-1] is fillvalue: and the line it controls to yield tuple(v for v in i if v is not fillvalue). Guarantees no value in iterable can be mistaken for the filler value.

@ShadowRanger 2016-09-30 01:26:02

BTW, big thumbs up on #4. I was about to post my optimization of #3 as a better answer (performance-wise) than what had been posted so far, but with the tweak to make it reliable, resilient #4 runs over twice as fast as optimized #3; I did not expect a solution with Python level loops (and no theoretical algorithmic differences AFAICT) to win. I assume #3 loses due to the expense of constructing/iterating islice objects (#3 wins if n is relatively large, e.g. number of groups is small, but that's optimizing for an uncommon case), but I didn't expect it to be quite that extreme.

@Kumba 2017-08-14 20:55:53

For #4, the first branch of the conditional is only ever taken on the last iteration (the final tuple). Instead of reconstituting the final tuple all over again, cache the modulo of the length of the original iterable at the top and use that to slice off the unwanted padding from izip_longest on the final tuple: yield i[:modulo]. Also, for the args variable, tuple it instead of a list: args = (iter(iterable),) * n. Shaves a few more clock cycles off. Last, if we ignore fillvalue and assume None, the conditional can become if None in i for even more clock cycles.

@ShadowRanger 2017-11-13 19:42:59

@Kumba: Your first suggestion assumes the input has known length. If it's an iterator/generator, not a collection with known length, there is nothing to cache. There's no real reason to use such an optimization anyway; you're optimizing the uncommon case (the last yield), while the common case is unaffected.

@catwell 2011-11-29 14:58:00

Posting this as an answer since I cannot comment...

Using map() instead of zip() fixes the padding issue in J.F. Sebastian's answer:

>>> def chunker(iterable, chunksize):
...   return map(None,*[iter(iterable)]*chunksize)

Example:

>>> s = '1234567890'
>>> chunker(s, 3)
[('1', '2', '3'), ('4', '5', '6'), ('7', '8', '9'), ('0', None, None)]
>>> chunker(s, 4)
[('1', '2', '3', '4'), ('5', '6', '7', '8'), ('9', '0', None, None)]
>>> chunker(s, 5)
[('1', '2', '3', '4', '5'), ('6', '7', '8', '9', '0')]

@ShadowRanger 2016-10-01 01:34:46

This is better handled with itertools.izip_longest (Py2)/itertools.zip_longest (Py3); this use of map is doubly-deprecated, and not available in Py3 (you can't pass None as the mapper function, and it stops when the shortest iterable is exhausted, not the longest; it doesn't pad).

@jfs 2009-01-12 15:13:41

Since nobody's mentioned it yet here's a zip() solution:

>>> def chunker(iterable, chunksize):
...     return zip(*[iter(iterable)]*chunksize)

It works only if your sequence's length is always divisible by the chunk size or you don't care about a trailing chunk if it isn't.

Example:

>>> s = '1234567890'
>>> chunker(s, 3)
[('1', '2', '3'), ('4', '5', '6'), ('7', '8', '9')]
>>> chunker(s, 4)
[('1', '2', '3', '4'), ('5', '6', '7', '8')]
>>> chunker(s, 5)
[('1', '2', '3', '4', '5'), ('6', '7', '8', '9', '0')]

Or using itertools.izip to return an iterator instead of a list:

>>> from itertools import izip
>>> def chunker(iterable, chunksize):
...     return izip(*[iter(iterable)]*chunksize)

Padding can be fixed using @ΤΖΩΤΖΙΟΥ's answer:

>>> from itertools import chain, izip, repeat
>>> def chunker(iterable, chunksize, fillvalue=None):
...     it   = chain(iterable, repeat(fillvalue, chunksize-1))
...     args = [it] * chunksize
...     return izip(*args)

@Pedro Henriques 2009-01-12 03:56:20

from itertools import izip_longest

def chunker(iterable, chunksize, filler):
    return izip_longest(*[iter(iterable)]*chunksize, fillvalue=filler)

@jfs 2009-01-12 14:29:50

A readable way to do it is stackoverflow.com/questions/434287/…

@jfs 2009-01-12 14:33:47

I've removed spaces around '=' in the arguments list (see PEP8).

@Robert Rossney 2009-01-12 03:19:30

If the list is large, the highest-performing way to do this will be to use a generator:

def get_chunk(iterable, chunk_size):
    result = []
    for item in iterable:
        result.append(item)
        if len(result) == chunk_size:
            yield tuple(result)
            result = []
    if len(result) > 0:
        yield tuple(result)

for x in get_chunk([1,2,3,4,5,6,7,8,9,10], 3):
    print x

(1, 2, 3)
(4, 5, 6)
(7, 8, 9)
(10,)

@Robert Rossney 2009-01-12 03:40:30

(I think that MizardX's itertools suggestion is functionally equivalent to this.)

@Robert Rossney 2009-01-12 04:15:56

(Actually, on reflection, no I don't. itertools.islice returns an iterator, but it doesn't use an existing one.)

@Valentas 2018-10-08 08:22:50

It is nice and simple, but for some reason even without conversion to tuple 4-7 times slower than the accepted grouper method on iterable = range(100000000) & chunksize up to 10000.

@Valentas 2018-10-08 09:51:35

However, in general I would recommend this method, because the accepted one can be extremely slow when checking for last item is slow docs.python.org/3/library/itertools.html#itertools.zip_longe‌​st

@Brian Clapper 2009-01-12 03:46:12

If the lists are the same size, you can combine them into lists of 4-tuples with zip(). For example:

# Four lists of four elements each.

l1 = range(0, 4)
l2 = range(4, 8)
l3 = range(8, 12)
l4 = range(12, 16)

for i1, i2, i3, i4 in zip(l1, l2, l3, l4):
    ...

Here's what the zip() function produces:

>>> print l1
[0, 1, 2, 3]
>>> print l2
[4, 5, 6, 7]
>>> print l3
[8, 9, 10, 11]
>>> print l4
[12, 13, 14, 15]
>>> print zip(l1, l2, l3, l4)
[(0, 4, 8, 12), (1, 5, 9, 13), (2, 6, 10, 14), (3, 7, 11, 15)]

If the lists are large, and you don't want to combine them into a bigger list, use itertools.izip(), which produces an iterator, rather than a list.

from itertools import izip

for i1, i2, i3, i4 in izip(l1, l2, l3, l4):
    ...

@Markus Jarderot 2009-01-12 03:02:59

import itertools
def chunks(iterable,size):
    it = iter(iterable)
    chunk = tuple(itertools.islice(it,size))
    while chunk:
        yield chunk
        chunk = tuple(itertools.islice(it,size))

# though this will throw ValueError if the length of ints
# isn't a multiple of four:
for x1,x2,x3,x4 in chunks(ints,4):
    foo += x1 + x2 + x3 + x4

for chunk in chunks(ints,4):
    foo += sum(chunk)

Another way:

import itertools
def chunks2(iterable,size,filler=None):
    it = itertools.chain(iterable,itertools.repeat(filler,size-1))
    chunk = tuple(itertools.islice(it,size))
    while len(chunk) == size:
        yield chunk
        chunk = tuple(itertools.islice(it,size))

# x2, x3 and x4 could get the value 0 if the length is not
# a multiple of 4.
for x1,x2,x3,x4 in chunks2(ints,4,0):
    foo += x1 + x2 + x3 + x4

@Sergey Golovchenko 2009-01-12 03:23:08

+1 for using generators, seams like the most "pythonic" out of all suggested solutions

@zenazn 2009-01-12 03:51:16

It's rather long and clumsy for something so easy, which isn't very pythonic at all. I prefer S. Lott's version

@Janus Troelsen 2012-11-25 17:33:14

@zenazn: this will work on generator instances, slicing won't

@dano 2014-08-19 20:27:24

In addition to working properly with generators and other non-sliceable iterators, the first solution also doesn't require a "filler" value if the final chunk is smaller than size, which is sometimes desirable.

@Cuadue 2015-04-10 17:58:07

Also +1 for generators. Other solutions require a len call and so don't work on other generators.

@Tom Myddeltyn 2016-05-05 22:16:15

I would throw a try: block around and catch the value error exception to handle the <4 multiple issue.

@Yuval 2017-06-02 07:18:43

The first one is a good, simple version that doesn't use a fillvalue but still works on any iterable. Nice!

@S.Lott 2009-01-12 03:06:09

I'm a fan of

chunkSize= 4
for i in xrange(0, len(ints), chunkSize):
    chunk = ints[i:i+chunkSize]
    # process chunk of size <= chunkSize

@Veerendra 2017-03-09 08:33:51

wow! really you made it simple. I was struggling do it as simple as possible. Thanks!

@J0ANMM 2017-04-13 11:55:37

Love its simplicity. By the way, in Python3 xrange should be replaced by range.

@dolphin 2017-11-12 04:25:31

@andorov The answer is correct because the end of the range (i+chunkSize) is exclusive, not inclusive.

Related Questions

Sponsored Content

24 Answered Questions

[SOLVED] Difference between append vs. extend list methods in Python

36 Answered Questions

[SOLVED] Making a flat list out of list of lists in Python

24 Answered Questions

[SOLVED] What is the difference between @staticmethod and @classmethod?

16 Answered Questions

[SOLVED] What are metaclasses in Python?

27 Answered Questions

[SOLVED] Finding the index of an item given a list containing it in Python

  • 2008-10-07 01:39:38
  • Eugene M
  • 2778387 View
  • 2366 Score
  • 27 Answer
  • Tags:   python list

28 Answered Questions

[SOLVED] How to concatenate two lists in Python?

  • 2009-11-12 07:04:09
  • y2k
  • 1513915 View
  • 1737 Score
  • 28 Answer
  • Tags:   python list

11 Answered Questions

[SOLVED] Iterating over dictionaries using 'for' loops

57 Answered Questions

[SOLVED] How do you split a list into evenly sized chunks?

25 Answered Questions

[SOLVED] What is the best way to iterate over a dictionary?

  • 2008-09-26 18:20:06
  • Jake Stewart
  • 1211657 View
  • 2045 Score
  • 25 Answer
  • Tags:   c# dictionary loops

6 Answered Questions

[SOLVED] How to get the number of elements in a list in Python?

  • 2009-11-11 00:30:54
  • y2k
  • 2668915 View
  • 1610 Score
  • 6 Answer
  • Tags:   python list

Sponsored Content