2009-06-14 23:02:14 8 Comments

What are the advantages of NumPy over regular Python lists?

I have approximately 100 financial markets series, and I am going to create a cube array of 100x100x100 = 1 million cells. I will be regressing (3-variable) each x with each y and z, to fill the array with standard errors.

I have heard that for "large matrices" I should use NumPy as opposed to Python lists, for performance and scalability reasons. Thing is, I know Python lists and they seem to work for me.

What will the benefits be if I move to NumPy?

What if I had 1000 series (that is, 1 billion floating point cells in the cube)?

### Related Questions

#### Sponsored Content

#### 19 Answered Questions

#### 11 Answered Questions

#### 7 Answered Questions

#### 15 Answered Questions

### [SOLVED] Convert two lists into a dictionary in Python

**2008-10-16 19:05:47****Guido****622294**View**970**Score**15**Answer- Tags: python list dictionary

#### 25 Answered Questions

### [SOLVED] How do I concatenate two lists in Python?

**2009-11-12 07:04:09****y2k****1899633**View**2054**Score**25**Answer- Tags: python list concatenation

#### 20 Answered Questions

### [SOLVED] What is the difference between Python's list methods append and extend?

**2008-10-31 05:55:36****Claudiu****2661680**View**3119**Score**20**Answer- Tags: python list data-structures append extend

#### 15 Answered Questions

### [SOLVED] What are metaclasses in Python?

**2008-09-19 06:10:46****e-satis****704726**View**5164**Score**15**Answer- Tags: python oop metaclass python-datamodel

## 6 comments

## @Constantin Höing 2019-06-08 16:02:21

Numpy provides a number of very powerful math tools to work with on numpy arrays or matrices if you will. Since numpy is very well optimized and written in C it runs much faster compared if you would write your on code in python. Not to speak that it would be a pain in the ass to code all of the functions yourself.

## @Parvez Khan 2019-02-05 12:46:45

All have highlighted almost all major differences between numpy array and python list, I will just brief them out here:

Numpy arrays have a fixed size at creation, unlike python lists (which can grow dynamically). Changing the size of ndarray will create a new array and delete the original.

The elements in a Numpy array are all required to be of the same data type (we can have the heterogeneous type as well but that will not gonna permit you mathematical operations) and thus will be the same size in memory

Numpy arrays are facilitated advances mathematical and other types of operations on large numbers of data. Typically such operations are executed more efficiently and with less code than is possible using pythons build in sequences

## @tom10 2009-06-15 04:59:38

Alex mentioned memory efficiency, and Roberto mentions convenience, and these are both good points. For a few more ideas, I'll mention

speedandfunctionality.Functionality: You get a lot built in with NumPy, FFTs, convolutions, fast searching, basic statistics, linear algebra, histograms, etc. And really, who can live without FFTs?

Speed: Here's a test on doing a sum over a list and a NumPy array, showing that the sum on the NumPy array is 10x faster (in this test -- mileage may vary).

which on my systems (while I'm running a backup) gives:

## @Roberto Bonvallet 2009-06-14 23:38:50

NumPy is not just more efficient; it is also more convenient. You get a lot of vector and matrix operations for free, which sometimes allow one to avoid unnecessary work. And they are also efficiently implemented.

For example, you could read your cube directly from a file into an array:

Sum along the second dimension:

Find which cells are above a threshold:

Remove every even-indexed slice along the third dimension:

Also, many useful libraries work with NumPy arrays. For example, statistical analysis and visualization libraries.

Even if you don't have performance problems, learning NumPy is worth the effort.

## @Thomas Browne 2009-06-14 23:54:11

Thanks - you have provided another good reason in your third example, as indeed, I will be searching the matrix for cells above threshold. Moreover, I was loading up from sqlLite. The file approach will be much more efficient.

## @Eliezer 2014-09-11 02:35:09

Here's a nice answer from the FAQ on the scipy.org website:

What advantages do NumPy arrays offer over (nested) Python lists?## @Alex Martelli 2009-06-14 23:16:23

NumPy's arrays are more compact than Python lists -- a list of lists as you describe, in Python, would take at least 20 MB or so, while a NumPy 3D array with single-precision floats in the cells would fit in 4 MB. Access in reading and writing items is also faster with NumPy.

Maybe you don't care that much for just a million cells, but you definitely would for a billion cells -- neither approach would fit in a 32-bit architecture, but with 64-bit builds NumPy would get away with 4 GB or so, Python alone would need at least about 12 GB (lots of pointers which double in size) -- a much costlier piece of hardware!

The difference is mostly due to "indirectness" -- a Python list is an array of pointers to Python objects, at least 4 bytes per pointer plus 16 bytes for even the smallest Python object (4 for type pointer, 4 for reference count, 4 for value -- and the memory allocators rounds up to 16). A NumPy array is an array of uniform values -- single-precision numbers takes 4 bytes each, double-precision ones, 8 bytes. Less flexible, but you pay substantially for the flexibility of standard Python lists!

## @Thomas Browne 2009-06-14 23:23:17

Alex - always the good answer. Thank you - point made. I'll go with Numpy for scalability and indeed for efficiency. I'm thinking I'll also soon be needing to learn parallel programming in Python, and invest in some OpenCL capable hardware ;)

## @Jack Simpson 2016-06-08 12:41:38

I've been trying to use "sys.getsizeof()" to compare the size of Python lists and NumPy arrays with the same number of elements and it doesn't seem to indicate that the NumPy arrays were that much smaller. Is this the case or is sys.getsizeof() having issues figuring out how big a NumPy array is?

## @Bakuriu 2016-08-09 19:40:00

@JackSimpson

`getsizeof`

isn't reliable. The documentation clearly states that:Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.This means that if you have nested python lists the size of the elements isn't taken into account.## @PM 2Ring 2016-10-10 12:38:38

`getsizeof`

on a list only tells you how much RAM the list object itself consumes and the RAM consumed by the pointers in its data array, it doesn't tell you how much RAM is consumed by the objects that those pointers refer to.## @lmiguelvargasf 2017-05-06 18:49:20

@AlexMartelli, could you please let me know where are you getting these numbers?

## @ShadowRanger 2018-11-08 02:18:41

Just a heads up, your estimate on the size of the equivalent Python list of list of lists is off. The 4 GB numpy array of C

`float`

s (4 bytes) would translate to something closer to 32 GB worth of`list`

s and Python`float`

s (which are actually C`double`

s), not 12 GB; each`float`

on 64 bit Python occupies ~24 bytes (assuming no alignment losses in the allocator), plus another 8 bytes in the`list`

to hold the reference (and that ignores the overallocation and object headers for the`list`

s themselves, which might add another GB depending on exactly how much overallocation occurs).## @ShadowRanger 2018-11-08 02:23:19

You could get the Python list of list of lists down as low as 8 GB if all of the stored

`float`

s were reference to thesame`float`

, but given Python has no`float`

caching, anything other than the same value over and over (not useful) would require you to manually implement interning for your`float`

s to achieve that memory reduction, and it seems rather unlikely you'd have so few unique`float`

s that interning would help out (since the intern cache itself would end up consuming a ton of memory eventually).