By G-wizard


2015-08-10 21:33:59 8 Comments

Update: The bug has been fixed in the ArcGIS 10.4 release

I am using ArcGIS 10.2.2 to determine zonal statistics for a number of zones. If there is any NoData in the value raster, I want the zone results to be "NoData", precisely as advertised by the tools description. This tool description states:

DATA — Within any particular zone, only cells that have a value in the input Value raster will be used in determining the output value for that zone. NoData cells in the Value raster will be ignored in the statistic calculation.

NODATA — Within any particular zone, if any NoData cells exist in the Value raster, it is deemed that there is insufficient information to perform statistical calculations for all the cells in that zone; therefore, the entire zone will receive the NoData value on the output raster.

Please have a look at my setup in this picture: enter image description here

I am using the NODATA option with a value raster that has one NoData pixel, and therefore expect the resulting zone value (zone 61154) to be 'NoData'. Instead, I get a value of 12.74 (rounded to 13 in the image), which confuses me on two levels: First, I expected 'NoData', and second, the resulting value of 12.74 is mathematically impossible, because the mean cannot be larger than the maximum value in the value raster, which is 10 in this case.

If I am using the DATA option, I get a value of about 9.1, which makes sense. We tested this on different datasets, computers, and ArcGIS versions.

What am I missing here?

Edit / Additional comment: I just noticed that the 'Count' attribute is also wrong for that particular zone. There are indeed 421 cells in that zone, but the tool only counted 297. Calculating 421 minus 297 results in 124 - oddly enough, this is the "position" where the NoData pixel is located, if one counts the pixels from upper left to lower right in the zone. The tool might be getting the cell count wrong (too low), which might explain the increase of the average.

Edit: Here is a link to the data I am using.

Edit: Dan Patterson and I did some further debugging here at the ESRI forum.

3 comments

@Mike T 2015-08-19 07:06:53

Similar to another answer, move the raster data into NumPy masked arrays to calculated your statistics. Assuming two overlaying rasters with same shape, this is simple:

import numpy as np
zones = arcpy.RasterToNumPyArray("zones")
value = np.ma.masked_equal(arcpy.RasterToNumPyArray("value"),
                           arcpy.Raster("value").noDataValue)
print("Zone\tCount\tNoData\tMean")
for z in np.unique(zones):
    sel = (zones == z)
    print z, sel.sum(), value.mask[sel].sum(), value[sel].mean()

Shows:

Zone    Count   NoData  Mean
61131   53   0   8.92452830189
61154   421   1   9.04523809524
61207   1   0   8.0
61317   35   0   7.2
61644   644   0   7.90838509317
61677   12   0   7.41666666667
61789   7   0   9.0
61871   193   0   7.98445595855
187472   349   0   8.5787965616

@GISGe 2015-08-18 11:42:10

There is a bug that seems to correspond to what you're experiencing - it's registered as BUG-000084883 - The 'Ignore NoData in calculations' option in Zonal Statistics as Table tool {and Zonal Statistics tool} is not honored when checked off, producing incorrect results.

It occurs with 10.3 and 10.2.2 but not 10.1. Did you try the tool with this version?

@UdderlyAstray 2015-08-18 17:52:47

This sounds like a good approach although I personally do not know how to run older versions of the tool. Does someone know where to point me to attempt this work around?

@G-wizard 2015-08-18 21:24:58

Thanks @GISGe. Where did you find this? Is there a link where this bug is documented?

@GISGe 2015-08-19 06:03:39

@G-wizard - I've added the link in my answer. As an Esri international staff I have access to a more detailed description than what you can see, that's how I can tell you the bug also applies to the Zonal Statistics tool and is not found in 10.1.

@GISGe 2015-08-19 06:04:48

@UdderlyAstray - if you want to run an older version of the tool, you have to install that older version of ArcGIS.

@G-wizard 2015-08-19 18:02:29

Thanks again, @GISGe, since this is what I'm looking for (bug officially confirmed), I'm marking this answer as the correct one, although others have also confirmed this by doing tests.

@GISGe 2015-08-19 19:06:57

@G-wizard - Thanks! Hopefully it will be solved in 10.4, but we need a bit of patience to be sure (the beta program will start end of this month).

@FelixIP 2015-08-17 03:34:34

It is a bug. Something terribly wrong with cell count.

Correct mean (9.0452380952381) times correct number of non-empty cells (420) divided by 297 (that is a cell count reported by tool) results in 12.7912457912458. That is a wrong average reported by tool.

Results of my own toy size grids test:

enter image description here

@radouxju 2015-08-17 10:49:22

I confirm I have the same problem with 10.3 , NODATA and "MEAN"

@G-wizard 2015-08-17 15:17:41

Thanks both for confirming this. But differences in mean value aside, am I wrong in assuming that the result should actually not be any value, but 'NODATA'? The describtion of the tool leads me to believe that. Says: "NODATA — Within any particular zone, if any NoData cells exist in the Value raster, it is deemed that there is insufficient information to perform statistical calculations for all the cells in that zone; therefore, the entire zone will receive the NoData value on the output raster." Since there is one pixel with "NODATA", the zonal stats should also be "NODATA'. Correct?

@c0ba1t 2015-08-17 16:01:18

@G-wizard, you are correct, as stated in the Tool Description. somewhat analogous to the #DIV/0! in excel.

Related Questions

Sponsored Content

3 Answered Questions

2 Answered Questions

0 Answered Questions

0 Answered Questions

0 Answered Questions

Reshape Matrix and perform Zonal Statistics in Matlab

Sponsored Content