2017-10-24 11:42:22 8 Comments

Comparing results of zonal statistics in QGIS and R, I noticed that the results they both yield do somehow differ. Manually checking, I also found that both QGIS and R seem to partly produce erroneous results. I wonder whether there is a robust solution to calculating mean values of raster cells within polygons.

My use case: I have a Raster Layer with cell size = 1 x 1 km² and a Polygon Layer with quadratic polygons of size = 250 x 250 m². Both are projected in EPSG:5677.

I used raster -> zonal stats with the "mean" option in QGIS ("QGIS.Result") and tried

```
R.Result1<-extract(Raster, Polygons, method="simple", fun=mean)
```

as well as

```
R.Result2<-extract(Raster, Polygons, method="simple", fun=mean, weights=TRUE, normalizeWeights=TRUE)
```

in R with the `raster`

package.
The results look like this:

```
ID QGIS.Result R.Result1 R.Result2
1 -63.9505964 -64.0000000 -64.60000000
2 -67.0000000 -67.0000000 -67.00000000
3 -67.0000000 -67.0000000 -67.00000000
4 -62.3765259 -64.0000000 -61.85714286
... ... ... ...
19 -99.9498123 -97.5999908 -99.19999186
20 -92.7999878 -92.7999878 -92.79998779
21 -92.7999878 -92.7999878 -92.79998779
22 -102.5493486 -92.2999954 -99.19999186
23 -97.5452805 -89.3999939 -92.79998779
24 -97.5452805 -89.3999939 -92.79998779
25 -87.4895458 -87.0000000 -87.33333333
26 -86.0000000 -86.0000000 -86.00000000
27 -86.0000000 -86.0000000 -86.00000000
... ... ... ...
```

Results differ whenever a Polygon intersects with more than one raster cell - note in the example below that:

all three methods yield the same result for polygons 20,21,26,27 which all do intersect with only one raster cell and different results for the other polygons which do intersect with more than one raster cell.

the QGIS result gives a smaller value for polygon 22 than for polygon 19 although it should be the other way round. R.Result2 gives the same result for both polygons. Only R.Result1 performs better here.

polygon 25 intersects with Raster values 88 and 86. R.Result1 is the mean of both: 87. However, as polygon 25 contains a larger amount of value 88 than of value 86, the result should be biased towards 88 (like in QGIS.Result and R.Result2).

Thus, all three methods seem not to provide what I need: a robust, coverage-based mean of the raster layer for each polygon. Is there a way to obtain such (preferably in R)?

The used data can be found here: https://www.dropbox.com/s/bctjidc4dz2j396/testdata.7z?dl=0

### Related Questions

#### Sponsored Content

#### 1 Answered Questions

### [SOLVED] Zonal statistics on raster layers that intersect line

**2018-11-29 14:14:53****Maarten****54**View**2**Score**1**Answer- Tags: spatial-analyst arcgis-pro overlay zonal-statistics

#### 1 Answered Questions

### [SOLVED] Zonal stats for complete pixels qgis

**2018-07-27 13:38:40****Fabio Daniel Trinco****188**View**3**Score**1**Answer- Tags: qgis zonal-statistics

#### 2 Answered Questions

### [SOLVED] How does Zonal Statistics work exactly?

**2017-09-11 15:57:37****GeoEki****8014**View**9**Score**2**Answer- Tags: qgis raster layers saga zonal-statistics

#### 0 Answered Questions

### Zonal statistics results in some NULL values in QGIS?

**2017-04-28 03:11:46****Ichiro****185**View**0**Score**0**Answer- Tags: qgis zonal-statistics null

#### 0 Answered Questions

### Zonal statistics behavior

**2018-01-18 07:20:46****Sumiko****93**View**3**Score**0**Answer- Tags: qgis-processing zonal-statistics

#### 0 Answered Questions

### Calculating Mean and Sum in QGIS Zonal Statistics?

**2018-01-11 15:26:16****zepedroramiao****120**View**5**Score**0**Answer- Tags: qgis zonal-statistics

#### 3 Answered Questions

### [SOLVED] Using QGIS Zonal Stats Plugin from Python Console?

**2013-01-05 13:10:12****Thomas****7055**View**9**Score**3**Answer- Tags: pyqgis zonal-statistics

#### 1 Answered Questions

### [SOLVED] Extract results from zonal statistics

**2016-04-22 13:48:39****Nausi****155**View**1**Score**1**Answer- Tags: qgis zonal-statistics

#### 1 Answered Questions

### [SOLVED] Resampling gives higher MEAN when using zonal statistics

**2015-11-30 22:52:26****Cicatrixx****152**View**2**Score**1**Answer- Tags: arcgis-desktop arcgis-10.1 statistics zonal-statistics resampling

#### 2 Answered Questions

### [SOLVED] What to do when raster cell size is larger than zonal vector feature in order to collect statistics?

**2013-05-23 12:02:18****XNSTT****4396**View**1**Score**2**Answer- Tags: arcgis-10.0 raster arcgis-desktop spatial-statistics zonal-statistics

## 2 comments

## @dbaston 2018-07-11 13:44:49

QGIS and R's

`raster`

package use different methods to estimate zonal statistics. Briefly:QGIS compares the centroid of each raster cell to the polygon boundary, initially considering cells to be wholly within or outside of the polygon based on the centroid. However, if fewer than two cell centroids fall within the polygon, an exact vector-based calculation is performed instead. That calculation is essentially the same as xunilk's answer.

R's raster package also uses a centroid test to determine if cells are inside or outside of the polygon. When

`weights=TRUE`

, it disaggregates the raster by a factor of 10 before performing the analysis, which reduces the error incurred by ignoring partially covered cells but reduces performance substantially.If you're working in R, I would recommend the exactextractr package (I am the author). It provides a replacement for

`raster::extract`

, using an exact vector-based calculation that is hundreds of times faster than the original in most cases.Here's an example using the test data provided:

## @yenats 2018-08-28 20:00:00

however I try, I cannot install exactextractr on my windows machine: 'ERROR: compilation failed for package 'exactextractr''

## @dbaston 2018-08-28 20:11:29

@yenats Do you have the R build tools installed? (cran.r-project.org/bin/windows/Rtools) Feel free to post a GitHub issue with the full output you're seeing from

`devtools::install_github`

.## @yenats 2018-08-29 10:08:11

yes, I have the R build tools installed. 'devtools::install_github('isciences/exactextractr')' would also install several packages; however at the end of the process it rolls everything back.

## @dbaston 2018-08-29 19:44:00

@yenats I'd need to see the full command output to be helpful. Feel free to email it to me (contact info in profile) if you don't want to post it on GitHub.

## @xunilk 2017-10-24 15:48:31

A Robust Zonal stats in QGIS can be implemented with

PyQGIS. Following code was run with your layers by using a filter to select onlyID_sub.patbetween 19 and 27 (showed in your image).After running code at Python Console of QGIS it was obtained this result:

where at first column are raster values, at second column weight factor and at third column

idPolygons feature. You can observe that for polygons 20,21,26,27, its factors is always one (as expected). In other cases is proportional to intersection of Polygons feature with grid representing raster.Results were corroborated with

Value ToolandQuickWKTQGIS plugins and they were as expected.