By nababs

2018-05-16 15:08:43 8 Comments

I have a list of 42000 numpy arrays (each array is 240x240) that I want to save to a file for use in another python script.

I've tried using pickle and numpy.savez_compressed and I run into Memory Errors (I have 16gb DDR3). I read that hdf5 which is commonly used for deep learning stuff cannot save lists so I'm kind of stuck.

Does anyone have any idea how I can save my data?

EDIT: I previously saved this data into a numpy array onto disk using and it was around 2.3GB but my computer couldn't always handle it so it would sometimes crash if I tried to process it. I read lists might be better so I have moved to using lists of numpy arrays


@jpp 2018-05-16 15:18:23

Assume we have a list of numpy arrays, A, and wish to save these sequentially to a HDF5 file.

We can use the h5py library to create datasets, with each dataset corresponding to an array in A.

import h5py, numpy as np

A = [arr1, arr2, arr3]  # each arrX is a numpy array

with h5py.File('file.h5', 'w', libver='latest') as f:  # use 'latest' for performance

    for idx, arr in enumerate(A):
        dset = f.create_dataset(str(idx), shape=(240, 240), data=arr, chunks=(240, 240)
                                compression='gzip', compression_opts=9)

I use gzip compression here for compatibility reasons, since it ships with every HDF5 installation. You may also wish to consider blosc & lzf filters. I also set chunks equal to shape, under the assumption you intend to read entire arrays rather than partial arrays.

The h5py documentation is an excellent resource to improve your understanding of the HDF5 format, as the h5py API follows the C API closely.

@nababs 2018-05-16 15:50:42

Thank you this is great! I do have some follow-up qs for you (if you want I can post them as seperate qs): 1. Is it storing each of the arrays onto the disk directly? 2. Is it possible to append more arrays into this h5 file? 3. How can I read from this? I'm assuming each array has been tagged with a number?

@jpp 2018-05-16 15:53:12

@nababs, Yep, it might be worth asking as a separate question on bits about HDF5 you don't understand. The short answers to your question are 1) yes, 2) yes.

@max9111 2018-05-16 17:06:14

It should be mentioned, that setting the chunkshape the same as the dataset shape is normally a bad idea. In this case it works, but will fail if a chunk gets bigger than 4GB. Too large chunks also often have a negative impact on compression performance. Also creating that large amount of very small datasets isn't normally not recommendable (both for speed and compression efficiency)

@jpp 2018-05-16 17:07:56

@max9111, Excellent points. I'm not sure what the use case is. For example, extracting one dataset now and then, or extracting a range of datasets multiple times. We need more information from OP on how the data is used to provide further guidance. As we know, a list of numpy arrays is already pretty inefficient :).

@SantoshGupta7 2018-10-16 03:00:58

Could this work for variable length arrays?

Related Questions

Sponsored Content

24 Answered Questions

[SOLVED] Difference between append vs. extend list methods in Python

33 Answered Questions

[SOLVED] How do I list all files of a directory?

  • 2010-07-08 19:31:22
  • duhhunjonn
  • 2650471 View
  • 2980 Score
  • 33 Answer
  • Tags:   python directory

30 Answered Questions

[SOLVED] How do I check if a list is empty?

  • 2008-09-10 06:20:11
  • Ray Vega
  • 1866465 View
  • 2870 Score
  • 30 Answer
  • Tags:   python list is-empty

36 Answered Questions

[SOLVED] Making a flat list out of list of lists in Python

27 Answered Questions

[SOLVED] Finding the index of an item given a list containing it in Python

  • 2008-10-07 01:39:38
  • Eugene M
  • 2779880 View
  • 2367 Score
  • 27 Answer
  • Tags:   python list

28 Answered Questions

[SOLVED] How to concatenate two lists in Python?

  • 2009-11-12 07:04:09
  • y2k
  • 1514860 View
  • 1739 Score
  • 28 Answer
  • Tags:   python list

57 Answered Questions

[SOLVED] How do you split a list into evenly sized chunks?

14 Answered Questions

[SOLVED] Getting the last element of a list in Python

  • 2009-05-30 19:28:53
  • Janusz
  • 1354467 View
  • 1472 Score
  • 14 Answer
  • Tags:   python list indexing

6 Answered Questions

[SOLVED] How to get the number of elements in a list in Python?

  • 2009-11-11 00:30:54
  • y2k
  • 2670487 View
  • 1611 Score
  • 6 Answer
  • Tags:   python list

18 Answered Questions

[SOLVED] How to clone or copy a list?

Sponsored Content