By nababs


2018-05-16 15:08:43 8 Comments

I have a list of 42000 numpy arrays (each array is 240x240) that I want to save to a file for use in another python script.

I've tried using pickle and numpy.savez_compressed and I run into Memory Errors (I have 16gb DDR3). I read that hdf5 which is commonly used for deep learning stuff cannot save lists so I'm kind of stuck.

Does anyone have any idea how I can save my data?

EDIT: I previously saved this data into a numpy array onto disk using np.save and it was around 2.3GB but my computer couldn't always handle it so it would sometimes crash if I tried to process it. I read lists might be better so I have moved to using lists of numpy arrays

1 comments

@jpp 2018-05-16 15:18:23

Assume we have a list of numpy arrays, A, and wish to save these sequentially to a HDF5 file.

We can use the h5py library to create datasets, with each dataset corresponding to an array in A.

import h5py, numpy as np

A = [arr1, arr2, arr3]  # each arrX is a numpy array

with h5py.File('file.h5', 'w', libver='latest') as f:  # use 'latest' for performance

    for idx, arr in enumerate(A):
        dset = f.create_dataset(str(idx), shape=(240, 240), data=arr, chunks=(240, 240)
                                compression='gzip', compression_opts=9)

I use gzip compression here for compatibility reasons, since it ships with every HDF5 installation. You may also wish to consider blosc & lzf filters. I also set chunks equal to shape, under the assumption you intend to read entire arrays rather than partial arrays.

The h5py documentation is an excellent resource to improve your understanding of the HDF5 format, as the h5py API follows the C API closely.

@nababs 2018-05-16 15:50:42

Thank you this is great! I do have some follow-up qs for you (if you want I can post them as seperate qs): 1. Is it storing each of the arrays onto the disk directly? 2. Is it possible to append more arrays into this h5 file? 3. How can I read from this? I'm assuming each array has been tagged with a number?

@jpp 2018-05-16 15:53:12

@nababs, Yep, it might be worth asking as a separate question on bits about HDF5 you don't understand. The short answers to your question are 1) yes, 2) yes.

@max9111 2018-05-16 17:06:14

It should be mentioned, that setting the chunkshape the same as the dataset shape is normally a bad idea. In this case it works, but will fail if a chunk gets bigger than 4GB. Too large chunks also often have a negative impact on compression performance. Also creating that large amount of very small datasets isn't normally not recommendable (both for speed and compression efficiency)

@jpp 2018-05-16 17:07:56

@max9111, Excellent points. I'm not sure what the use case is. For example, extracting one dataset now and then, or extracting a range of datasets multiple times. We need more information from OP on how the data is used to provide further guidance. As we know, a list of numpy arrays is already pretty inefficient :).

@SantoshGupta7 2018-10-16 03:00:58

Could this work for variable length arrays?

Related Questions

Sponsored Content

39 Answered Questions

[SOLVED] How to make a flat list out of list of lists?

31 Answered Questions

[SOLVED] How to concatenate two lists in Python?

23 Answered Questions

[SOLVED] How do I list all files of a directory?

  • 2010-07-08 19:31:22
  • duhhunjonn
  • 2876267 View
  • 3184 Score
  • 23 Answer
  • Tags:   python directory

34 Answered Questions

[SOLVED] How do I check if a list is empty?

  • 2008-09-10 06:20:11
  • Ray Vega
  • 2003702 View
  • 3023 Score
  • 34 Answer
  • Tags:   python list

6 Answered Questions

[SOLVED] How to get the number of elements in a list in Python?

  • 2009-11-11 00:30:54
  • y2k
  • 2876872 View
  • 1696 Score
  • 6 Answer
  • Tags:   python list

19 Answered Questions

[SOLVED] How to clone or copy a list?

28 Answered Questions

[SOLVED] Finding the index of an item given a list containing it in Python

  • 2008-10-07 01:39:38
  • Eugene M
  • 3008024 View
  • 2498 Score
  • 28 Answer
  • Tags:   python list

14 Answered Questions

[SOLVED] Getting the last element of a list in Python

  • 2009-05-30 19:28:53
  • Janusz
  • 1463846 View
  • 1555 Score
  • 14 Answer
  • Tags:   python list indexing

57 Answered Questions

[SOLVED] How do you split a list into evenly sized chunks?

24 Answered Questions

[SOLVED] Difference between append vs. extend list methods in Python

Sponsored Content