By mbcaradima


2014-09-06 21:24:08 8 Comments

I am running ArcGIS 10.1 with 64-bit GP on Windows 8.1 x64.

I want to calculate the zonal statistics of buffer polygons based on parcels and a kernel density raster. The 4.7 million polygons overlap and are stored in a single feature class, making the problem particularly difficult. I've tried to use the supplemental Spatial Analyst Tools developed by ESRI in various ways, but the script isn't fast enough. I've tried to compute the zonal statistics piece by piece, using small census areas (with several hundred buffer polygons each) and large ones (thousands of buffer polygons). None of these implementations perform adequately. I've included the following features in a Python script:

  • loading buffer polygons into the in_memory workspace before loading into the ZS tool
  • creating feature layers where needed
  • writing the results to a single text file

The code for performing ZS one polygon at a time is posted below, however the tool only completes about one parcel every minute. I'm wondering if anyone has any insight on how to improve zonal statistics performance or the script in general.

import arcpy, os, string, csv
from arcpy import env

# arcpy.ImportToolbox("D:\ClusterAnalysis_Deployment\SpatialAnalystSupplementalTools\Spatial Analyst Supplemental Tools.pyt")
arcpy.CheckOutExtension("Spatial")

# LOCATIONS #
# Local machine paths
gdb = r"D:\path\to\geodatabase.gdb"
results = r"\path\to\results"

# Virtual machine paths

# INPUT DATA #
b = gdb + "\\" + "ParcelBuffers"
kd = gdb + "\\" + "kdensity750"

env.workspace = gdb
env.overwriteOutput = True

# TEMPORARY DATA #
buffers = arcpy.MakeFeatureLayer_management(b, "bdata")
kdensity = arcpy.Raster(kd)

parcelList = []

cursor = arcpy.da.SearchCursor(buffers, ("PIN"))
for row in cursor:
    parcelList.append(row[0])
del cursor

countDict = {}
countDict["Count:"] = 0

print "Script setup... done"

# GEOPROCESSING #
for PIN in parcelList:
    parcel_ram = "in_memory" + "\\" + "parcel"
    zs_table = results + "\\" + "zs_table_" + str(PIN) + ".dbf"
    solution = results + "\\" + "ZS_solutions.txt"

    arcpy.SelectLayerByAttribute_management(buffers, "NEW_SELECTION", "\"PIN\" = \'" + str(PIN) + "\'")
    count = int(arcpy.GetCount_management(buffers).getOutput(0))
    arcpy.CopyFeatures_management(buffers, parcel_ram)
    arcpy.SelectLayerByAttribute_management(buffers, "CLEAR_SELECTION")
    arcpy.gp.ZonalStatisticsAsTable_sa(parcel_ram, "PIN", kdensity, zs_table, "DATA", "ALL")

    table = arcpy.gp.MakeTableView_management(zs_table, "zs_table_lyr")

    fields = arcpy.ListFields(table)
    field_names = map(lambda n:n.name, fields)
    header = string.join(field_names, "\t")

    with open(solution, 'w') as x:
        x.write(header + "\n")
        cursor = arcpy.SearchCursor(table)
        for row in cursor:
            values = string.join(map(lambda n: str(row.getValue(n)), field_names), "\t")
            x.write(values + "\n")
        del row, cursor

    countDict["Count:"] = countDict["Count:"] + count
    print "Zonal statistics computed and saved for parcel: " + str(countDict["Count:"])

    arcpy.Delete_management(parcel_ram)
    arcpy.Delete_management(zs_table)
    arcpy.Delete_management(table)
    del count

After some further testing of Felix's script I think the way my data sets are set up is wrong, leading to the spatial join feature query to pull up polygons that do overlap. Here are the data sets and their relevant attributes:

  • fishnet feature (ID'd by long integer "FnID")
  • minBound feature (ID'd by long integer "FnID")
  • parentLR (Join FID is FnID and Target FID is parID, where parID is long integer field created for buffer polygons)

The script:

##  Zonal statistics on a set of overlapping
import arcpy, traceback, os, sys, time
from arcpy import env

arcpy.CheckOutExtension("Spatial")

env.workspace = "C:\Geoprocessing\ClusterAnalysis"
env.overwriteOutput = True

# Field name of fishnet ID
parID="FnID"
parID2="FnID_1"

# Location of input map document containing the minBound (minimum bounding geometry)
# layer and parentLR (spatial join of fishnet to polygons) layer

mapDocument = env.workspace + "\\" + "ClusterAnalysis.mxd"
raster = env.workspace + "\\" + "ClusterAnalysis.gdb\intKD750"
rast = env.workspace + "\\" + "ClusterAnalysis.gdb\parcelRaster"
kdensity = arcpy.Raster(raster)

dbf="stat.dbf" # output zonal statistics table
joinLR="SD.shp" # intermediate data set for determining non-overlapping polygons
subset="subset"

##try:
def showPyMessage():
    arcpy.AddMessage(str(time.ctime()) + " - " + message)
def Get_V(aKey):
    try:
        return smallDict[aKey]
    except:
        return (-1)
def pgonsCount(aLayer):
        result=arcpy.GetCount_management(aLayer)
        return int(result.getOutput(0))

# Define variables based on layers in ArcMap document
mxd = arcpy.mapping.MapDocument(mapDocument)
layers = arcpy.mapping.ListLayers(mxd)
minBound,parentLR=layers[0],layers[1]

# Query parentLR polygons based on fishnet grid of minBound polygon
with arcpy.da.SearchCursor(minBound, ("[email protected]","FnID")) as clipper:
    for rcrd in clipper:
        feat=rcrd[0]
        env.extent=feat.extent
        fp='"FnID"='+str(rcrd[1])
        parentLR.definitionQuery=fp
        nF=pgonsCount(parentLR)
        arcpy.AddMessage("Processing subset %i containing %i polygons" %(rcrd[1],nF))
        arcpy.AddMessage("Defining neighbours...")
        arcpy.SpatialJoin_analysis(parentLR, parentLR, joinLR, "JOIN_ONE_TO_MANY")
        arcpy.AddMessage("Creating empty dictionary")

        dictFeatures = {}
        with arcpy.da.SearchCursor(parentLR, parID) as cursor:
            for row in cursor:
                dictFeatures[row[0]]=()
            del row, cursor

        arcpy.AddMessage("Assigning neighbours...")
        nF=pgonsCount(joinLR)
        arcpy.SetProgressor("step", "", 0, nF,1)
        with arcpy.da.SearchCursor(joinLR, (parID,parID2)) as cursor:
            for row in cursor:
                aKey=row[0]
                aList=dictFeatures[aKey]
                aList+=(row[1],)
                dictFeatures[aKey]=aList
                arcpy.SetProgressorPosition()
            del row, cursor
        arcpy.AddMessage("Defining non-overlapping subsets...")
        runNo=0
        while (True):
            parentLR.definitionQuery=fp
            toShow,toHide=(),()
            nF=len(dictFeatures)
            arcpy.SetProgressor("step", "", 0, nF,1)
            for item in dictFeatures:
                if item not in toShow and item not in toHide:
                    toShow+=(item,)
                    toHide+=(dictFeatures[item])
                arcpy.SetProgressorPosition()
            m=len(toShow)
            quer='"parID" IN '+str(toShow)+ " AND "+fp
            if m==1:
                quer='"parID" = '+str(toShow[0])+ " AND "+fp
            parentLR.definitionQuery=quer
            runNo+=1
            arcpy.AddMessage("Run %i, %i polygon(s) found" % (runNo,m))
            arcpy.AddMessage("Running Statistics...")
            arcpy.gp.ZonalStatisticsAsTable_sa(parentLR, "PIN", kdensity, dbf,"DATA", "MEAN")
            arcpy.AddMessage("Data transfer...")
            smallDict={}
            with arcpy.da.SearchCursor(dbf, (parID,"MEAN")) as cursor:
                for row in cursor:
                    smallDict[row[0]]=row[1]
                del row, cursor
            with arcpy.da.UpdateCursor(parentLR, (parID,"MEAN")) as cursor:
                for row in cursor:
                    aKey=row[0]
                    row[1]=Get_V(aKey)
                    cursor.updateRow(row)
                del row, cursor
            for item in toShow:
                del dictFeatures[item]
            m=len(dictFeatures)
            if m==0:
                break
        parentLR.definitionQuery=fp
        del smallDict, dictFeatures
parentLR.definitionQuery=''

Produces the zonal statistics error: http://resources.arcgis.com/en/help/main/10.1/index.html#//00vq0000000s010423

2 comments

@FelixIP 2014-09-09 03:16:41

Pre-processing:

  1. Create fishnet - polygons with extent=extent of overlapping polygons. Calculate field ID = FID.
  2. Spatial join overlapping polygons with fishnet polygons using HAVE_THEIR_CENTER_IN.
  3. Dissolve polygons, case field fishnet's ID
  4. Use Minimum Bounding geometry on step 3 output.

Black polygons in below picture are outputs of step 4 (minBound in script). Colored polygons (unique fishnet ID's) are outputs of step 2 (parentLR in script) enter image description here

The script assumes:

  • minBound is 1st layer in TOC
  • parentLR is 2nd layer
  • Unique identifier of parentLR is parID - long integer
  • Unique identifier of minBound is ID - long integer

Change the name of raster to sample.

minBound is used to browse area of interest and narrow processing extent and number of polygons to process at a time, by modifuing parentLR definition query. Critical part is selecting non-overlapping groups of polygons, accomplished by analysis of spatial join table with ONE_TO_MANY option and modifying definition query of parentLR. Performance more than 2000 polygons per minute, thus less than 2 days to process 4.7 millin polygons:). Windows 7, RAM 8 GB, 64 bit.

##  Zonal statistics on a set of overlapping 
import arcpy, traceback, os, sys
from arcpy import env
env.overwriteOutput = True
parID="parID"
parID2="parID_1"
dem=r'C:\From_MXD\ARC2MDEM_Clip'
env.workspace = "C:\\From_MXD"
dbf="stat.dbf"
joinLR="SD.shp"
subset="subset"

try:
    def showPyMessage():
        arcpy.AddMessage(str(time.ctime()) + " - " + message)
    def Get_V(aKey):
        try:
            return smallDict[aKey]
        except:
            return (-1)
    def pgonsCount(aLayer):
            result=arcpy.GetCount_management(aLayer)
            return int(result.getOutput(0))    
    mxd = arcpy.mapping.MapDocument("CURRENT")
    layers = arcpy.mapping.ListLayers(mxd)
    minBound,parentLR=layers[0],layers[1]
    with arcpy.da.SearchCursor(minBound, ("[email protected]","ID")) as clipper:
        for rcrd in clipper:
            feat=rcrd[0]
            env.extent=feat.extent
            fp='"ID"='+str(rcrd[1])
            parentLR.definitionQuery=fp
            nF=pgonsCount(parentLR)  
            arcpy.AddMessage("Processing subset %i containing %i polygons" %(rcrd[1],nF))
            arcpy.AddMessage("Defining neighbours...")
            arcpy.SpatialJoin_analysis(parentLR, parentLR, joinLR,"JOIN_ONE_TO_MANY")
            arcpy.AddMessage("Creating empty dictionary")
            dictFeatures = {}
            with arcpy.da.SearchCursor(parentLR, parID) as cursor:
                for row in cursor:
                    dictFeatures[row[0]]=()
                del row, cursor
            arcpy.AddMessage("Assigning neighbours...")
            nF=pgonsCount(joinLR)
            arcpy.SetProgressor("step", "", 0, nF,1)
            with arcpy.da.SearchCursor(joinLR, (parID,parID2)) as cursor:
                for row in cursor:
                    aKey=row[0]
                    aList=dictFeatures[aKey]
                    aList+=(row[1],)
                    dictFeatures[aKey]=aList
                    arcpy.SetProgressorPosition()
                del row, cursor    
            arcpy.AddMessage("Defining non-overlapping subsets...")
            runNo=0
            while (True):
                parentLR.definitionQuery=fp
                toShow,toHide=(),()
                nF=len(dictFeatures)    
                arcpy.SetProgressor("step", "", 0, nF,1)
                for item in dictFeatures:
                    if item not in toShow and item not in toHide:
                        toShow+=(item,)
                        toHide+=(dictFeatures[item])
                    arcpy.SetProgressorPosition()
                m=len(toShow)
                quer='"parID" IN '+str(toShow)+ " AND "+fp
                if m==1:
                    quer='"parID" = '+str(toShow[0])+ " AND "+fp
                parentLR.definitionQuery=quer
                runNo+=1
                arcpy.AddMessage("Run %i, %i polygon(s) found" % (runNo,m))
                arcpy.AddMessage("Running Statistics...")
                arcpy.gp.ZonalStatisticsAsTable_sa(parentLR, parID, dem, dbf, "DATA", "MEAN")
                arcpy.AddMessage("Data transfer...")
                smallDict={}
                with arcpy.da.SearchCursor(dbf, (parID,"MEAN")) as cursor:
                    for row in cursor:
                        smallDict[row[0]]=row[1]
                    del row, cursor    
                with arcpy.da.UpdateCursor(parentLR, (parID,"MEAN")) as cursor:
                    for row in cursor:
                        aKey=row[0]
                        row[1]=Get_V(aKey)
                        cursor.updateRow(row)
                    del row, cursor
                for item in toShow:
                    del dictFeatures[item]
                m=len(dictFeatures)
                if m==0:
                    break
            parentLR.definitionQuery=fp
            del smallDict, dictFeatures
    parentLR.definitionQuery=''
except:
    message = "\n*** PYTHON ERRORS *** "; showPyMessage()
    message = "Python Traceback Info: " + traceback.format_tb(sys.exc_info()[2])[0]; showPyMessage()
    message = "Python Error Info: " +  str(sys.exc_type)+ ": " + str(sys.exc_value) + "\n"; showPyMessage()            

You might increase the speed by removing multiple progress lines from the script. I suggest to test it on a smaller subset of polygons, e.g. 100 thousands. I also think that the size of fishnet has to be optimised. Please note I am deriving just mean value from raster. I've tried to document script to my best

@mbcaradima 2014-09-13 20:43:26

Hi Felix, thank you for your response. I worked through a subset of my data using your workflow but dissolving the polygons by fishnet grid is intensive. I'm wondering whether it would be more efficient to obtain the minimum bounding geometry on an individual fishnet grid (and perhaps adjacent ones as well to contain all polygons with areas outside the grid). Dissolving 160,000

@FelixIP 2014-09-13 22:34:22

Correct. This might be even faster. 1st layer - fishnet. Set env.extent=parentLR.extent just before zonal statistics. All this to make task of ZS easier, so it works on a tiny subset of raster.

@mbcaradima 2014-09-14 00:01:59

Felix, I have one more question regarding your script. The parID field is for identifying the parentLR data set (polygons spatially joined with fishnet grids), but I'm not sure what the data/field the parID2 variable is referring to. I also believe that this script will overwrite the DBF table being output. At the moment I am pre-processing the data required for the script, but any clarification on these variables would be welcome. Thank you.

@FelixIP 2014-09-14 00:33:21

Note, that I do spatial join parentLR to itself. Because output, yes, recreated every time, cannot have duplicate field names, arcgis adds suffix _1 to second field. Thus if your unique id name is 'prno' make parID2='prno_1'

@FelixIP 2014-09-14 00:44:08

Yes, DBF is replaced all the times, but it has all the records required to updated hugely 'thinned'parent layer.

@mbcaradima 2014-09-14 19:06:38

I've almost fully replicated your workflow however the zonal statistics tool gets an error that is unrelated to the topic at hand. Error 010423: help.arcgis.com/EN/arcgisdesktop/10.0/help/index.html#//… I have checked that the value raster meets the conditions for the zonal statistics tool however the selected parentLR records are questionable. I think the script solves the problem I posted about but I would like to post a commented and updated copy for future users.

@FelixIP 2014-09-14 19:54:38

Based on my experience, error messages very often have nothing to do with true issue. Would be nice to have a subset of that raster to play with. Remember, you cannot do majority and median stats, unless raster is integer...

@mbcaradima 2014-09-14 20:33:10

Hi Felix, I've posed an update to my testing in my original post above. Thanks for your help.

@FelixIP 2014-09-06 21:43:25

  1. Convert raster to points
  2. Intersect points with polygons, output type POINT
  3. Statistics on raster value, using polygon id as case field. You might consider to split the area, if raster too big

@whuber 2014-09-06 22:51:52

Given the magnitude of the problem, I doubt there is any chance this will work. Have you tested it on any datasets? How about on datasets this large?

@FelixIP 2014-09-07 00:13:10

I am using this approach on a regular basis. Yes, with that many polygons and very big raster it is unlikely to produce result in one go. With very little programming efforts, much less than in above script, can be done though. This is why i suggested to split the area. Smart slicing of raster required - set of overlapping rectangles - to process only those polygons that are completely within individual clipping rectangle

@whuber 2014-09-07 13:33:37

Thank you--it is very good to read about your own experience because it lends credibility to your proposal. In re-reading the question it appears to me that the OP has actually tried these techniques of breaking the problem into smaller ones and found them wanting with his/her data, but "none of these implementations perform adequately."

@mbcaradima 2014-09-07 20:08:52

Felix, the study area of the kernel density raster is quite large and would require about ~4 billion points to represent as a vector. I don't think this would simplify the workflow. It is an interesting suggestion though.

@FelixIP 2014-09-07 20:37:54

Guys, I am withdrawing my suggestion,sorry. I ran a test at home using 30 000 overlapping polygons and 7000*7000 raster. Did all the possible improvements, working with GRID (not FGDB), shapefiles (not FC in FGDB), dictionaries (not build-in statistics). Managed to achieve 100 polygons/minute only. Yes, out of date slow PC and still 100/m is not fast enough. I am thinking of 1) creating not overlapping polygons, 2) one to many spatial join with parents 3)merging bits statistics (no SD though). I like 'smart' slicing polygons though created in the process.

@FelixIP 2014-09-09 03:19:10

Please try it, I'd love to see if it works if at all

Related Questions

Sponsored Content

1 Answered Questions

Zonal statistics error in arcpy

1 Answered Questions

1 Answered Questions

[SOLVED] Performing overlapping zonal statistics in ArcGIS?

1 Answered Questions

[SOLVED] Zonal statistics for millions of polygons

1 Answered Questions

1 Answered Questions

1 Answered Questions

[SOLVED] Arcpy to append results of pivot table

0 Answered Questions

Sponsored Content