By nababs

2018-05-16 15:08:43 8 Comments

I have a list of 42000 numpy arrays (each array is 240x240) that I want to save to a file for use in another python script.

I've tried using pickle and numpy.savez_compressed and I run into Memory Errors (I have 16gb DDR3). I read that hdf5 which is commonly used for deep learning stuff cannot save lists so I'm kind of stuck.

Does anyone have any idea how I can save my data?

EDIT: I previously saved this data into a numpy array onto disk using and it was around 2.3GB but my computer couldn't always handle it so it would sometimes crash if I tried to process it. I read lists might be better so I have moved to using lists of numpy arrays


@jpp 2018-05-16 15:18:23

Assume we have a list of numpy arrays, A, and wish to save these sequentially to a HDF5 file.

We can use the h5py library to create datasets, with each dataset corresponding to an array in A.

import h5py, numpy as np

A = [arr1, arr2, arr3]  # each arrX is a numpy array

with h5py.File('file.h5', 'w', libver='latest') as f:  # use 'latest' for performance

    for idx, arr in enumerate(A):
        dset = f.create_dataset(str(idx), shape=(240, 240), data=arr, chunks=(240, 240)
                                compression='gzip', compression_opts=9)

I use gzip compression here for compatibility reasons, since it ships with every HDF5 installation. You may also wish to consider blosc & lzf filters. I also set chunks equal to shape, under the assumption you intend to read entire arrays rather than partial arrays.

The h5py documentation is an excellent resource to improve your understanding of the HDF5 format, as the h5py API follows the C API closely.

@nababs 2018-05-16 15:50:42

Thank you this is great! I do have some follow-up qs for you (if you want I can post them as seperate qs): 1. Is it storing each of the arrays onto the disk directly? 2. Is it possible to append more arrays into this h5 file? 3. How can I read from this? I'm assuming each array has been tagged with a number?

@jpp 2018-05-16 15:53:12

@nababs, Yep, it might be worth asking as a separate question on bits about HDF5 you don't understand. The short answers to your question are 1) yes, 2) yes.

@max9111 2018-05-16 17:06:14

It should be mentioned, that setting the chunkshape the same as the dataset shape is normally a bad idea. In this case it works, but will fail if a chunk gets bigger than 4GB. Too large chunks also often have a negative impact on compression performance. Also creating that large amount of very small datasets isn't normally not recommendable (both for speed and compression efficiency)

@jpp 2018-05-16 17:07:56

@max9111, Excellent points. I'm not sure what the use case is. For example, extracting one dataset now and then, or extracting a range of datasets multiple times. We need more information from OP on how the data is used to provide further guidance. As we know, a list of numpy arrays is already pretty inefficient :).

@nababs 2018-05-17 16:57:43

@jpp do you think you can help with this?…

Related Questions

Sponsored Content

32 Answered Questions

[SOLVED] How do I list all files of a directory?

  • 2010-07-08 19:31:22
  • duhhunjonn
  • 2490984 View
  • 2849 Score
  • 32 Answer
  • Tags:   python directory

27 Answered Questions

[SOLVED] How to concatenate two lists in Python?

  • 2009-11-12 07:04:09
  • y2k
  • 1416650 View
  • 1658 Score
  • 27 Answer
  • Tags:   python list

13 Answered Questions

[SOLVED] Getting the last element of a list in Python

  • 2009-05-30 19:28:53
  • Janusz
  • 1276012 View
  • 1401 Score
  • 13 Answer
  • Tags:   python list indexing

58 Answered Questions

[SOLVED] How do you split a list into evenly sized chunks?

34 Answered Questions

[SOLVED] Making a flat list out of list of lists in Python

27 Answered Questions

[SOLVED] How do I check if a list is empty?

  • 2008-09-10 06:20:11
  • Ray Vega
  • 1760834 View
  • 2745 Score
  • 27 Answer
  • Tags:   python list is-empty

23 Answered Questions

[SOLVED] Difference between append vs. extend list methods in Python

24 Answered Questions

[SOLVED] Finding the index of an item given a list containing it in Python

  • 2008-10-07 01:39:38
  • Eugene M
  • 2609648 View
  • 2258 Score
  • 24 Answer
  • Tags:   python list

6 Answered Questions

[SOLVED] How to get the number of elements in a list in Python?

  • 2009-11-11 00:30:54
  • y2k
  • 2494478 View
  • 1547 Score
  • 6 Answer
  • Tags:   python list

18 Answered Questions

[SOLVED] How to clone or copy a list?

Sponsored Content