The Commercial arcpy Library#
Geospatial analyses with ESRI’s commercial ArcGIS software and arcpy package. This notebook cannot be run in Jupyter and code blocks require the commercial arcpy library from ESRI.
Introduction#
ESRI’s ArcGIS Pro software comes with its own conda environment that enables working with the commercial arcpy
library. arcpy
is accessible either directly in ArcGIS Pro
or by using ESRI’s conda environment, which can be managed through ArcGIS Pro. Since arcpy
is tied to the use of a commercial environment, it can only be accessed in jupyter notebooks directly hosted at esri.com (however, these will not work with the examples in this section). This page explains the basic usage of arcpy
in external IDEs (e.g., PyCharm, Atom, or Spyder), the integration of licenses, and basic functionality of arcpy
.
Requirements
Make sure to understand the difference between rasters and shapefiles as explained in the geospatial data section.
Note
The development of ArcMap with its Python2 executable is discontinued. For this reason, only the usage of ArcGIS Pro with its Python3 environment is explained here.
Linux Users
ArcGIS Pro and the wrappers that constitute the arcpy
library are Windows only applications. At the time of writing this introduction ArcGIS Pro and arcpy
cannot be used on Linux or macOS platforms.
Background#
Why can working with arcpy
in Python still be a powerful tool even though there are many license and platform restrictions?
If you are working for a company with engineering services, the company likely uses ArcGIS because of its popularity and commercial support service. In this case, a company (or research institution) covers the license fees and it is equally likely that a Windows operating system is used for similar reasons. Among the popular functionalities of ArcGIS software are Raster Calculator, Shapefile Feature manager, or tools for statistical analysis of geo-databases. Thanks to arcpy
, such popular ArcGIS tools can be embedded in Python scripts, thus enabling workflows to be automated and efficiency to be significantly increased.
This page shows the basics for manipulating raster and shapefile data in Python with arcpy
. There are many more methods implemented in arcpy
and only the fundamentals are explained here.
Use arcpy
with External IDEs#
Setup Interpreter#
The Python installation section explains how to set up PyCharm IDE with a conda environment. To create a new conda environment in PyCharm with ESRI’s conda environment, the Location must be defined differently (see the original screenshot):
If ArcGIS Pro was globally installed by the system administrator, use:
%PROGRAMFILES%\ArcGIS\Pro\bin\Python\Scripts\propy
If ArcGIS Pro was installed for individual users, use:
%LOCALAPPDATA%\Programs\ArcGIS\Pro\bin\Python\Scripts\propy
It may occur that it is not necessary to add the propy
directory. Moreover, find the conda.exe
or python.exe
in the above directories and define it in the field Conda executable. Finalize the creation of the new environment with a click on Create.
To install more packages follow the descriptions provided by Esri.
Tip
If you are struggling with the above instructions, have a look at the developer’s docs on running stand-alone scripts (look for instructions using propy
rather than python27
).
Import arcpy and its Modules#
arcpy
comes with own classes for geospatial data objects (e.g., arcpy.Raster
for gridded data) and modules for mapping (arcpy.mp
), emulation of the Spatial Analyst (arcpy.sa
), or accessing data (arcpy.da
). A full list is provided on the developer’s website. This page features code blocks using arcpy
and arcpy.sa
, where Spatial Analyst objects are imported using *
to enable direct access to, for example, Con()
(instead of arcpy.sa.Con()
), which corresponds to arcpy
’s conditional statement for rasters.
import arcpy
from arcpy.sa import *
Setup arcpy Workspace#
Geospatial calculations may produce many side products, which can be heavy in size. To better control which data is generated by arcpy
and where it is a good idea to define a workspace for each arcpy script with:
arcpy.env.workspace = "C:\\workspace\\" # or use os.path.dirname(__file__) to go to the script directorys
Important
Avoid blanks in the workspace directory.
It can also be useful to enable overwriting of already existing files with the same name because of the file size. This behavior may or may not be desired and can be controlled as follows.
arcpy.gp.overwriteOutput = True # enable overwriting
arcpy.gp.overwriteOutput = False # disable overwriting
Set Spatial Extents#
Geospatial datasets may have large extents without any data written to large parts. Processing void data cells may lead to unnecessarily long calculation time. Therefore, it is advisable to limit the calculation extent with:
arcpy.env.extent = "MAXOF" # uses the combined extent of all input datasets
arcpy.env.extent = "MINOF" # uses only the overlap of all input datasets
arcpy.env.extent = arcpy.Extent(arcpy.Raster("base.tif")) # uses the controlled extent of a raster
arcpy.env.extent = "Xmin YMin XMax Ymax" # imposes user-defined minimum and maximum coordinates
Checkout Licenses#
Many arcpy
methods need licenses such as Spatial Analyst or 3D. In stand-alone scripts, licenses can be activate (checked out) with arcpy.CheckOutExtension('NAME')
and deactivated (checked in) with arcpy.CheckInExtension('NAME')
. In light of the object orientation, arcpy
operations should be embedded in functions or methods of classes. Therefore, it is recommended to wrap functions or methods using arcpy
with decorators that activate necessary licenses. The following code block provides a wrapper to activate a Spatial Analyst license for a function.
def spatial_license(func):
def wrapper(*args, **kwargs):
arcpy.CheckOutExtension('Spatial')
func(*args, **kwargs)
arcpy.CheckInExtension('Spatial')
return wrapper
Track Errors#
The Python Errors, Logging, and Debugging section provides useful instructions for troubleshooting errors in code or code usage. To identify problems in object-oriented arcpy
scripts, an additional wrapper function is recommended, which writes arcpy
errors to a log file (logger
). The logger.*(MESSAGE)
expressions can also be replaced with print(MESSAGE)
.
def err_info(func):
def wrapper(*args, **kwargs):
arcpy.gp.overwriteOutput = True
logger = logging.getLogger("logfile")
try:
func(*args, **kwargs)
except arcpy.ExecuteError:
logger.info(arcpy.GetMessages(2))
arcpy.AddError(arcpy.GetMessages(2))
except Exception as e:
logger.info(e.args[0])
arcpy.AddError(e.args[0])
except:
logger.info(arcpy.GetMessages())
return wrapper
Spatial Analyst & Raster Operations#
Basics#
arcpy
provides different options to load geospatial rasters (gridded data), which may have different formats such as Esri Grid (no ending, the raster is a folder with other files), GeoTIFF, DAT, and many more. The following script loads a flow depth Grid raster h
and a flow velocity Grid raster u
:
h = arcpy.Raster("geodata/input/rasters/h")
u = arcpy.Raster("geodata/input/rasters/u")
The Froude number can be calculated pixel by pixel from the flow depth and velocity and the gravity constant \(g\)=9.81m/s. The following script calculates the Froude number for all pixels where the flow depth is at least 0.1 m. The raster comparison is achieved through the Spatial Analyst’s Con(if_condition, then, else)
conditional statement. The two rasters (u
and h
) are passed as arcpy.sa.Float()
objects to ensure that the script uses the correct pixel data type.
froude = Con(Float(h) > 0.1, Float(u) / SquareRoot(Float(h) * Float(9.81)))
Better than the 0.1-meter criterion is to calculate the Froude number everywhere where the flow depth and the velocity have a numerical value. arcpy.sa.IsNull()
evaluates where pixels are non-numeric. However, we are interested in the opposite (i.e., pixels that are not non-numeric), which we get through the ~
(not) sign. The below-functionalized calculation of the Froude number takes advantage of the possibility to nest multiple Con()
expressions to check both rasters (u
and h
) for numeric pixels.
In addition, we need a Spatial Analyst license to run this script. Therefore, it makes sense to rewrite the above code block into a function that uses the spatial_license
wrapper function from the above checkout Licenses section. To make the code robust, we also add the above-defined *err_info* wrapper function.
@err_info
@spatial_license
def calculate_froude(h, u):
return Con(~IsNull(h), Con(~IsNull(u), Float(u) / SquareRoot(Float(h) * Float(9.81))))
froude = calculate_froude(h, u)
Find out more about Raster Calculator and Map Algebra on the developer’s website (esri).
Tip
The functions provided with Raster Calculator can also be performed with the open-access libraries gdal
and numpy
. All you need to do is:
Read raster as array.
Use numpy and/or pandas to run typical Raster Calculator operations and many (many many) more on the array.
Write the array back to a raster.
Cell Statistics#
The evaluation of numerical model data often involves calculating statistical values (e.g., minimum or maximum) of one or more rasters. The comparison of several similar rasters is useful, for example, if the same parameter was calculated with two different models or at different dates (e.g., to assess the morphodynamic evolution of rivers). arcpy.sa.CellStatistics([Raster1, Raster2, ... RasterN], TYPE, ...)
enables such statistical evaluations. The following code block illustrates the comparison of flow velocities calculated with two different hydrodynamic numerical models through the calculation of the MEAN
(average) and standard deviation (STD
).
u_basement = arcpy.Raster("geodata/bm/velocity.tif")
u_tuflow = arcpy.Raster("geodata/tf/velocity.dat")
u_mean = CellStatistics([u_basement, u_tuflow], "MEAN")
u_stdv = CellStatistics([u_basement, u_tuflow], "STD")
Read more options statistics types and handling non-numeric data on the developer’s website (Esri).
Tip
An open access alternative to arcpy
’s CellStatistics
is the refrasterstats library <zonal>
(usage: rasterstats.zonal_stats(zone, raster_file_name, stats=['min', 'max', 'median', 'majority', 'sum', '...many more...'])
).
Shapefile Operations#
The geospatial shapefile vector format is an Esri invention. No wonder, arcpy
is good at handling this vector data format. In hydraulic engineering, however, we usually create (draw) shapefiles manually either directly with ArcGIS or its open-source competitor QGIS to delineate, for example, particular flow regions. Examples can be found in the BASEMENT tutorial (explore the creation of elevation point, boundary polygon, and breakline polyline shapefiles). In codes, the processing of shapefiles only becomes important in the analysis of the output of numerical models (e.g., to classify morphological unit features, exactly calculate patch areas, or automatically place reinforcements in construction plans). At this stage, raster data (output of numerical models) must first be converted into shapefiles. This is why this tutorial starts with the conversion of raster data to shapefiles along with the illustration of other functions such as calculating patch area and accessing shapefile attribute tables.
Rasters can be converted to polygon and other shapefile types (e.g., point). The following example features the conversion of a raster to a polygon shapefile. It uses an integer raster of all pixels where the flow depth and velocity are smaller than 1.4 m and 0.15 m/s, respectively. Such shallow and slow-flow regions are called slackwater (according to Wyrick and Pasternack [WP14]). Because slackwater is designated preferred habitat of some aquatic species, we are wondering now how much slackwater area the numerical model predicts in the simulated river section. For this purpose, we convert the slackwater raster to a shapefile and calculate the surface area of the shapefile with the following arcpy
methods:
Convert the raster to a shapefile with
arcpy.RasterToPolygon_conversion()
with arguments:in_raster
is anarcpy.Raster()
object of integer values (using aFloat
raster results in an error!).out_polygon_features
is a string of the output file name and directory.simplify
is an optional string that can be either"NO_SIMPLIFY"
to force exact drawing of polygon boundaries along pixel border, or"SIMPLIFY"
to enable polygon boundaries crossing pixels.
Add a new field to the new polygon shapefile with
arcpy.AddField_management()
with arguments:field_name
can be any string without blanks.field_type
is a string defining if the field is numeric (e.g.,"FLOAT"
or"LONG"
for integer), date/time ("DATE"
),"TEXT"
, or"RASTER"
.field_precision
is an (optional) integer (Long) defining the number of digits that can be stored in the new field.More optional arguments can be set to define the number of decimals or characters, an alternative field name, enable
NULL
, a field domain, or if a field is required.
Calculate patch area with
arcpy.CalculateGeometryAttributes_management()
with arguments:in_features
is a string defining the directory and name of a feature layer.geometry_property
is a nested list of[[Target-field-name, Property], [Another-target-field-name, Another-property], ...]
for calculating geometric properties such as"AREA"
,"HOLE_COUNT"
, or"PART_COUNT"
(and many more).area_unit
can be"SQUARE_METERS"
or"SQUARE_KILOMETERS"
(and many other options for U.S. customary units).More optional arguments can be set to define length units (for perimeter assessments), or a coordinate system and format.
The below code block additionally features the application of these methods and also illustrates how the area value can be read row-by-row from the attribute table of a shapefile with arcpy.da.UpdateCursor(shapefile-name, field-name)
.
# create a slackwater raster that is arcpy.sa.Int(1) where h and u criteria are true and NULL elsewhere
slackwater = Con(Float(h) <= 1.4 & Float(u) <= 0.15, Int(1))
# define directory and name of the new shapefile
new_shp_file = "geodata/shapefiles/slackwater.shp"
# convert slackwater raster to polygon shapefile
arcpy.RasterToPolygon_conversion(in_raster=slackwater, out_polygon_features=new_shp_file, simplify="NO_SIMPLIFY")
# add a new field to the new shapefile's attribute table (name)
arcpy.AddField_management(shp_name, field_name="F_AREA", field_type="FLOAT", field_precision=9)
# calculate area of all polygons in attribute table
area_unit="SQUARE_METERS"
arcpy.CalculateGeometryAttributes_management(in_features=shp_name, geometry_property=[["F_AREA", "AREA"]], area_unit=area_unit)
with arcpy.da.UpdateCursor(shp_name, "F_AREA") as cursor:
for row in cursor:
try:
area += float(row[0])
except ValueError:
print("WARNING: Patch with invalid area value (%s)." % str(row))
print("Sum of all patches = {0} {1}".format(str(area), area_unit))
Warning
Never calculate areas directly from rasters by multiplying the number (quantity) of pixels that fulfill a certain criterion with the pixel size (e.g., 1mx1m). This calculation often fails in practice because of erroneous internal assignments of cell sizes, which can hardly be controlled robustly (in particular when switching between U.S. customary and S.I. units).
Tip
Shapefile handling, deriving geometric properties and many more operations can also be performed with the ogr
library, which comes along with gdal
. Read more in the shapefile handling section.