By Doon_Bogan


2013-11-19 19:45:30 8 Comments

I am trying to create zonal statistics using Python and Gdal. I have a polygon shapefile and a raster file, and in order to do so, I am using a piece of code I found in StackExchange.

The raster and shapefile used can be found here

Here is the code I am using:

#!/usr/bin/python
#coding: utf-8

## Code from Stack Exchange: http://gis.stackexchange.com/questions/21888/how-to-overlay-shapefile-and-raster
## User: ustroetz

# Calculates statistics (mean) on values of a raster within the zones of an polygon shapefile

import gdal, ogr, osr, numpy

def zonal_stats(input_value_raster, input_zone_polygon):



 # Open data
    raster = gdal.Open(input_value_raster)
    driver = ogr.GetDriverByName('ESRI Shapefile')
    shp = driver.Open(input_zone_polygon)
    lyr = shp.GetLayer()

    # get raster georeference info
    transform = raster.GetGeoTransform()
    xOrigin = transform[0]
    yOrigin = transform[3]
    pixelWidth = transform[1]
    pixelHeight = transform[5]

    # reproject geometry to same projection as raster
    sourceSR = lyr.GetSpatialRef()
    targetSR = osr.SpatialReference()
    targetSR.ImportFromWkt(raster.GetProjectionRef())
    coordTrans = osr.CoordinateTransformation(sourceSR,targetSR)
    feat = lyr.GetNextFeature()
    geom = feat.GetGeometryRef()
    geom.Transform(coordTrans)

    # Get extent of geometry
    ring = geom.GetGeometryRef(0)
    numpoints = ring.GetPointCount()
    pointsX = []; pointsY = []
    for p in range(numpoints):
            lon, lat, z = ring.GetPoint(p)
            pointsX.append(lon)
            pointsY.append(lat)
    xmin = min(pointsX)
    xmax = max(pointsX)
    ymin = min(pointsY)
    ymax = max(pointsY)

    # Specify offset and rows and columns to read
    xoff = int((xmin - xOrigin)/pixelWidth)
    yoff = int((yOrigin - ymax)/pixelWidth)
    xcount = int((xmax - xmin)/pixelWidth)+1
    ycount = int((ymax - ymin)/pixelWidth)+1


    # create memory target raster
    target_ds = gdal.GetDriverByName('MEM').Create('', xcount, ycount, gdal.GDT_Byte)
    target_ds.SetGeoTransform((
        xmin, pixelWidth, 0,
        ymax, 0, pixelHeight,
    ))

    # create for target raster the same projection as for the value raster
    raster_srs = osr.SpatialReference()
    raster_srs.ImportFromWkt(raster.GetProjectionRef())
    target_ds.SetProjection(raster_srs.ExportToWkt())

    # rasterize zone polygon to raster
    gdal.RasterizeLayer(target_ds, [1], lyr, burn_values=[1])

    # read raster as arrays
    banddataraster = raster.GetRasterBand(1)
    dataraster = banddataraster.ReadAsArray(xoff, yoff, xcount, ycount).astype(numpy.float)

    bandmask = target_ds.GetRasterBand(1)
    datamask = bandmask.ReadAsArray(0, 0, xcount, ycount).astype(numpy.float)

    # mask zone of raster
    zoneraster = numpy.ma.masked_array(dataraster,  numpy.logical_not(datamask))

    # calculate mean of zonal raster
    return numpy.mean(zoneraster) 

However, when I implement it, the output (numpy.mean(zoneraster)) is just a number and not an array or a raster. What am I missing? Is there something wrong with the files imput? The code?

2 comments

@ninnghazad 2014-03-18 11:08:47

With multiprocessing, for fastness! Has a little different output-formatting.

#!/usr/bin/python
import gdal, ogr, osr, numpy, sys
from multiprocessing import Pool


# Raster dataset
input_value_raster = sys.argv[1]

# Vector dataset(zones)
input_zone_polygon = sys.argv[2]

# Open data
rast = gdal.Open(input_value_raster)
shp = ogr.Open(input_zone_polygon)

# Get raster georeference info
transform = rast.GetGeoTransform()
xOrigin = transform[0]
yOrigin = transform[3]
pixelWidth = transform[1]
pixelHeight = transform[5]

def proc(fid):

    # Each process needs its own pointer it seems.
    shape = ogr.Open(input_zone_polygon)
    lyr = shape.GetLayer()
    feat = lyr.GetFeature(fid)
    raster = gdal.Open(input_value_raster)

    # Get extent of feat
    geom = feat.GetGeometryRef()
    if (geom.GetGeometryName() == 'MULTIPOLYGON'):
        count = 0
        pointsX = []; pointsY = []
        for polygon in geom:
            geomInner = geom.GetGeometryRef(count)    
            ring = geomInner.GetGeometryRef(0)
            numpoints = ring.GetPointCount()
            for p in range(numpoints):
                    lon, lat, z = ring.GetPoint(p)
                    pointsX.append(lon)
                    pointsY.append(lat)    
            count += 1
    elif (geom.GetGeometryName() == 'POLYGON'):
        ring = geom.GetGeometryRef(0)
        numpoints = ring.GetPointCount()
        pointsX = []; pointsY = []
        for p in range(numpoints):
                lon, lat, z = ring.GetPoint(p)
                pointsX.append(lon)
                pointsY.append(lat)

    else:
        sys.exit()

    xmin = min(pointsX)
    xmax = max(pointsX)
    ymin = min(pointsY)
    ymax = max(pointsY)

    # Specify offset and rows and columns to read
    xoff = int((xmin - xOrigin)/pixelWidth)
    yoff = int((yOrigin - ymax)/pixelWidth)
    xcount = int((xmax - xmin)/pixelWidth)+1
    ycount = int((ymax - ymin)/pixelWidth)+1

    # Create memory target raster
    target_ds = gdal.GetDriverByName('MEM').Create('', xcount, ycount, gdal.GDT_Byte)
    target_ds.SetGeoTransform((
        xmin, pixelWidth, 0,
        ymax, 0, pixelHeight,
    ))

    # Create for target raster the same projection as for the value raster
    raster_srs = osr.SpatialReference()
    raster_srs.ImportFromWkt(raster.GetProjectionRef())
    target_ds.SetProjection(raster_srs.ExportToWkt())

    # Rasterize zone polygon to raster
    gdal.RasterizeLayer(target_ds, [1], lyr, burn_values=[1])

    # Read raster as arrays
    banddataraster = raster.GetRasterBand(1)
    dataraster = banddataraster.ReadAsArray(xoff, yoff, xcount, ycount).astype(numpy.float)

    bandmask = target_ds.GetRasterBand(1)
    datamask = bandmask.ReadAsArray(0, 0, xcount, ycount).astype(numpy.float)

    # Mask zone of raster
    zoneraster = numpy.ma.masked_array(dataraster,  numpy.logical_not(datamask))

    # Calculate statistics of zonal raster
    value = numpy.max(zoneraster)
    #print value
    return str(fid)+": "+str(value)


# Start the processes
layer = shp.GetLayer()
featList = range(layer.GetFeatureCount())
pool = Pool(processes=24)   
print pool.map(proc,featList,8)




@ustroetz 2013-11-20 17:44:53

If you want to get zonal statistics for several features in one shapefile, you have to loop over the zonal_stats function. You can write the results of the loop for example to a dictionary. Below is the modified zonal_stats function together with a loop, looping over the input shapefile. As an output you get a Dictionary containing for each Feature ID the mean of the covered raster.

For your sample dataset the dictionary for the first five features looks like this

{0: 114.57909872798288, 1: 21.889622561136882, 2: 287.79623686237102, 3: 35.350804240486838, 4: 19.63043032511781}

The two functions:

import gdal, ogr, osr, numpy


def zonal_stats(feat, input_zone_polygon, input_value_raster):

    # Open data
    raster = gdal.Open(input_value_raster)
    shp = ogr.Open(input_zone_polygon)
    lyr = shp.GetLayer()

    # Get raster georeference info
    transform = raster.GetGeoTransform()
    xOrigin = transform[0]
    yOrigin = transform[3]
    pixelWidth = transform[1]
    pixelHeight = transform[5]

    # Get extent of feat
    geom = feat.GetGeometryRef()
    if (geom.GetGeometryName() == 'MULTIPOLYGON'):
        count = 0
        pointsX = []; pointsY = []
        for polygon in geom:
            geomInner = geom.GetGeometryRef(count)    
            ring = geomInner.GetGeometryRef(0)
            numpoints = ring.GetPointCount()
            for p in range(numpoints):
                    lon, lat, z = ring.GetPoint(p)
                    pointsX.append(lon)
                    pointsY.append(lat)    
            count += 1
    elif (geom.GetGeometryName() == 'POLYGON'):
        ring = geom.GetGeometryRef(0)
        numpoints = ring.GetPointCount()
        pointsX = []; pointsY = []
        for p in range(numpoints):
                lon, lat, z = ring.GetPoint(p)
                pointsX.append(lon)
                pointsY.append(lat)

    else:
        sys.exit()

    xmin = min(pointsX)
    xmax = max(pointsX)
    ymin = min(pointsY)
    ymax = max(pointsY)

    # Specify offset and rows and columns to read
    xoff = int((xmin - xOrigin)/pixelWidth)
    yoff = int((yOrigin - ymax)/pixelWidth)
    xcount = int((xmax - xmin)/pixelWidth)+1
    ycount = int((ymax - ymin)/pixelWidth)+1

    # Create memory target raster
    target_ds = gdal.GetDriverByName('MEM').Create('', xcount, ycount, gdal.GDT_Byte)
    target_ds.SetGeoTransform((
        xmin, pixelWidth, 0,
        ymax, 0, pixelHeight,
    ))

    # Create for target raster the same projection as for the value raster
    raster_srs = osr.SpatialReference()
    raster_srs.ImportFromWkt(raster.GetProjectionRef())
    target_ds.SetProjection(raster_srs.ExportToWkt())

    # Rasterize zone polygon to raster
    gdal.RasterizeLayer(target_ds, [1], lyr, burn_values=[1])

    # Read raster as arrays
    banddataraster = raster.GetRasterBand(1)
    dataraster = banddataraster.ReadAsArray(xoff, yoff, xcount, ycount).astype(numpy.float)

    bandmask = target_ds.GetRasterBand(1)
    datamask = bandmask.ReadAsArray(0, 0, xcount, ycount).astype(numpy.float)

    # Mask zone of raster
    zoneraster = numpy.ma.masked_array(dataraster,  numpy.logical_not(datamask))

    # Calculate statistics of zonal raster
    return numpy.mean(zoneraster)


def loop_zonal_stats(input_zone_polygon, input_value_raster):

    shp = ogr.Open(input_zone_polygon)
    lyr = shp.GetLayer()
    featList = range(lyr.GetFeatureCount())
    statDict = {}

    for FID in featList:
        feat = lyr.GetFeature(FID)
        meanValue = zonal_stats(feat, input_zone_polygon, input_value_raster)
        statDict[FID] = meanValue
    return statDict



# Raster dataset
input_value_raster = 'popc_0ADProj.tif'
# Vector dataset(zones)
input_zone_polygon = 'borders_tribes.shp'

print loop_zonal_stats(input_zone_polygon, input_value_raster)

@Doon_Bogan 2013-11-20 18:55:57

I have another issue with the code in the section #get extent of geometry. My features can be multiple polygons. Here's my output for geom.GetGeomRef(i), where i' is each value of geom.GetGeometryCount(): DATABASE: 7 LINEARRING (0.980468987882861 8.28964924042946, ..., 0.980468987882861 8.28964924042946) DATABASE: 8 POLYGON ((43.148330027378911 11.721389243497216,..., 11.72443924429683,43.148330027378911 11.721389243497216))` I modified the code to deal with these special cases (code written in next comment) but typically, it won't deal with polygons like "Database 8".

@Doon_Bogan 2013-11-20 18:57:07

My modified section is as follows: # Get extent of geometry geonumbers=geom.GetGeometryCount() xmin_all=[] (...) ymax_all=[] for i in range(geonumbers): ring_i = geom.GetGeometryRef(i) numpoints = ring_i.GetPointCount() print ring_i pointsX_i = []; pointsY_i = [] for p in range(numpoints): lon, lat, z = ring_i.GetPoint(p) pointsX_i.append(lon) pointsY_i.append(lat) xmin_all.append(min(pointsX_i)) xmax_all.append(max(pointsX_i)) ymin_all.append(min(pointsY_i)) ymax_all.append(max(pointsY_i)) xmin=min(xmin_all) xmax=max(xmax_all) ymin=min(ymin_all) ymax=max(ymax_all)

@ustroetz 2013-11-21 05:34:16

@Doon_Bogan I modified the function, so that it accepts polygons and multipolygons. I test run it with your sample dataset and it worked fine. Thanks for catching that bug. Please let me know if it works now for you.

@Deep 2014-07-18 17:26:01

How to the "# Get extent of feat " step is different than using GetEnvelope() on feature geometry object ? I think it will automatically take care of "Polygon" or "MultiPolygon".

Related Questions

Sponsored Content

0 Answered Questions

Layer spatial reference system units EPSG::32633

  • 2019-02-01 14:12:56
  • RemoteSensing
  • 33 View
  • 1 Score
  • 0 Answer
  • Tags:   qgis python gdalogr

2 Answered Questions

1 Answered Questions

[SOLVED] Processing vector to raster faster with R

0 Answered Questions

Unexpected elevation values from SRTM using Zonal statistics

0 Answered Questions

On-the-fly zonal statistics in QGIS

2 Answered Questions

1 Answered Questions

[SOLVED] Multiple Subsetting

  • 2014-10-30 18:18:21
  • user37082
  • 173 View
  • 0 Score
  • 1 Answer
  • Tags:   python r

0 Answered Questions

Change coordinates of raster file

1 Answered Questions

[SOLVED] Extract results from zonal statistics

0 Answered Questions

Zonal Statistics using Gdal and Python

Sponsored Content