Vectorize and Rasterize#
This section introduces geospatial dataset conversion with Python. In particular, the goal of this section is to guide to an understanding of conversions from raster to vector data formats and vice versa. For interactive reading and executing code blocks and find geo03-conversion.ipynb, or install Python and JupyterLab locally.
Requirements
Make sure to understand handling of gridded raster data and shapefile handling
Understand the creation of the least cost path raster dataset.
Tips
The functions featured in this section are introduced are partially also implemented in flusstools.
To use those functions, make sure flusstools is installed and import it as follows:
from flusstools import geotools
. Some of the functions shown in this tutorial can then be used withgeotools.function_name()
.
Watch this section as a video
Watch this section as a video on the @Hydro-Morphodynamics channel on YouTube.
Import relevant Libraries#
Make sure to import the relevant packages for handling rasters, shapefiles, and geospatial references:
from osgeo import gdal
from osgeo import osr
from osgeo import ogr
Vectorize#
Raster to Line#
In this section, we convert the least cost path raster dataset (least_cost.tif) into a (poly) line shapefile. For this purpose, we first write a function called offset2coords()
, which represents the inverse of the coords2offset() function, and converts x/y offset (in integer pixel numbers) to coordinates of a geospatial dataset’s geo-transformation:
def offset2coords(geo_transform, offset_x, offset_y):
# get origin and pixel dimensions from geo_transform (osgeo.gdal.Dataset.GetGeoTransform() object)
origin_x = geo_transform[0]
origin_y = geo_transform[3]
pixel_width = geo_transform[1]
pixel_height = geo_transform[5]
# calculate x and y coordinates
coord_x = origin_x + pixel_width * (offset_x + 0.5)
coord_y = origin_y + pixel_height * (offset_y + 0.5)
# return x and y coordinates
return coord_x, coord_y
Note
The offset is added 0.5 pixels in both x and y directions to meet the center of the pixel rather than the top-left pixel corner.
Next, we can write a core function to convert a raster dataset to a line shapefile. We name this function raster2line()
and it builds on the following workflow:
Open a
raster
, its band asarray
, andgeo_transform
(geo-transformation) defined with theraster_file_name
argument in the open_raster function from the raster section.Calculate the maximum distance (
max_distance
) between two pixels that are considered being connect-able, based on the hypothesis that the pixel height Δy and width Δx are the same:Get the
trajectory
of pixels that have a user parameter-definedpixel_value
(e.g.,1
to trace 1-pixels in the binary least_cost.tif) and throw an error if the trajectory is empty (i.e.,np.count_nonzero(trajectory) is 0
).Use the above-defined
offset2coords
function to append point coordinates to apoints
list.Create a
multi_line
object (instance ofogr.Geometry(ogr.wkbMultiLineString)
), which represents the (void) final least cost path.Iterate through all possible combinations of points (excluding combinations of points with themselves) with
itertools.combinations(iterable, r=number-of-combinations=2
).Points are stored in the
points
list.point1
andpoint2
are required to get the distance between pairs of points.If the
distance
between the points is smaller thanmax_distance
, the function creates a line object from the two points and appends it to themulti_line
object.
Create a new shapefile (named
out_shp_fn
) using the create_shp() function (with integrated shapefile name length verification from flusstoolsgeotools.create_shp()
).Add the
multi_line
object as a new feature to the shapefile (according to the descriptions in the shapefile section).Create a
.prj
projection file (recall descriptions in the shapefile section) using the spatial reference system of the inputraster
with the get_srs() function.
The raster2line
function is also implemented in the flusstools.geotools.geotools
script.
from flusstools.geotools import raster2array
def raster2line(raster_file_name, out_shp_fn, pixel_value):
"""
Convert a raster to a line shapefile, where pixel_value determines line start and end points
:param raster_file_name: STR of input raster file name, including directory; must end on ".tif"
:param out_shp_fn: STR of target shapefile name, including directory; must end on ".shp"
:param pixel_value: INT/FLOAT of a pixel value
:return: None (writes new shapefile).
"""
# calculate max. distance between points
# ensures correct neighbourhoods for start and end pts of lines
raster, array, geo_transform = raster2array(raster_file_name)
pixel_width = geo_transform[1]
max_distance = np.ceil(np.sqrt(2 * pixel_width**2))
# extract pixels with the user-defined pixel value from the raster array
trajectory = np.where(array == pixel_value)
if np.count_nonzero(trajectory) is 0:
print("ERROR: The defined pixel_value (%s) does not occur in the raster band." % str(pixel_value))
return None
# convert pixel offset to coordinates and append to nested list of points
points = []
count = 0
for offset_y in trajectory[0]:
offset_x = trajectory[1][count]
points.append(offset2coords(geo_transform, offset_x, offset_y))
count += 1
# create multiline (write points dictionary to line geometry (wkbMultiLineString)
multi_line = ogr.Geometry(ogr.wkbMultiLineString)
for i in itertools.combinations(points, 2):
point1 = ogr.Geometry(ogr.wkbPoint)
point1.AddPoint(i[0][0], i[0][1])
point2 = ogr.Geometry(ogr.wkbPoint)
point2.AddPoint(i[1][0], i[1][1])
distance = point1.Distance(point2)
if distance < max_distance:
line = ogr.Geometry(ogr.wkbLineString)
line.AddPoint(i[0][0], i[0][1])
line.AddPoint(i[1][0], i[1][1])
multi_line.AddGeometry(line)
# write multiline (wkbMultiLineString2shp) to shapefile
new_shp = create_shp(out_shp_fn, layer_name="raster_pts", layer_type="line")
lyr = new_shp.GetLayer()
feature_def = lyr.GetLayerDefn()
new_line_feat = ogr.Feature(feature_def)
new_line_feat.SetGeometry(multi_line)
lyr.CreateFeature(new_line_feat)
# create projection file
srs = get_srs(raster)
make_prj(out_shp_fn, int(srs.GetAuthorityCode(None)))
print("Success: Wrote %s" % str(out_shp_fn))
The raster2line()
function can be called as follows to convert the least cost path from pixel (raster) to line (vector) format:
source_raster_fn = r"" + os.path.abspath("") + "/geodata/rasters/least-cost.tif"
target_shp_fn = r"" + os.path.abspath("") + "/geodata/shapefiles/least-cost.shp"
pixel_value = 1
raster2line(source_raster_fn, target_shp_fn, pixel_value)
Success: Wrote /home/schwindt/jupyter/geodata/shapefiles/least_cost.shp
Challenge
There is a small error in the least_cost
line. Can you find the error? What can be done to fix the error?
Note
Network routing is the functional core of the NetworkX library (see *Open source libraries*). Read more about network analyses on Michael Diener’s GitHub pages.
Raster to Polygon#
gdal
comes with the powerful Polygonize
feature for the easy conversion of a raster dataset to a polygon shapefile. While gdal.Polygonize
enables writing a simple raster2polygon()
Python function, it has the drawback that it can only handle integer values and it merely randomly attributes FID
(automatically attributed identifiers without physical meaning) values by default. Because the FID
values are not meaningful, we will create a float2int()
function to preserve the original value range (uses the raster2array() and create_raster() functions from the raster section):
def float2int(raster_file_name, band_number=1):
"""
:param raster_file_name: STR of target file name, including directory; must end on ".tif"
:param band_number: INT of the raster band number to open (default: 1)
:output: new_raster_file_name (STR)
"""
# use raster2array function to get raster, np.array and the geo transformation
raster, array, geo_transform = raster2array(raster_file_name, band_number=band_number)
# convert np.array to integers
try:
array = array.astype(int)
except ValueError:
print("ERROR: Invalid raster pixel values.")
return raster_file_name
# get spatial reference system
src_srs = get_srs(raster)
# create integer raster
new_name = raster_file_name.split(".tif")[0] + "_int.tif"
create_raster(new_name, array, epsg=int(src_srs.GetAuthorityCode(None)),
rdtype=gdal.GDT_Int32, geo_info=geo_transform)
# return name of integer raster
return new_name
Next, we will create the raster2polygon()
function that implements the following workflow:
Use the
float2int()
function to ensure that any rasterfile_name
provided can be converted to purely integer values.Create a new shapefile (named
out_shp_fn
) using the create_shp() function (also available from flusstools:geotools.create_shp()
).Add a new
ogr.OFTInteger
field (recall how field creation works) in the shapefile section) named by the optionalfield_name
input argument.Run
gdal.Polygonize
with:hSrcBand=raster_band
hMaskBand=None
(optional raster band to define polygons)hOutLayer=dst_layer
iPixValField=0
(if no field was added, set to-1
in order to create anFID
field; if more fields were added, set to1
,2
, … )papszOptions=[]
(no effect forESRI Shapefile
driver type)callback=None
for not using the reporting algorithm (GDALProgressFunc()
)
Create a
.prj
projection file (recall descriptions in the shapefile section) using the spatial reference system of the inputraster
with the get_srs() function.
def raster2polygon(file_name, out_shp_fn, band_number=1, field_name="values"):
"""
Convert a raster to polygon
:param file_name: STR of target file name, including directory; must end on ".tif"
:param out_shp_fn: STR of a shapefile name (with directory e.g., "C:/temp/poly.shp")
:param band_number: INT of the raster band number to open (default: 1)
:param field_name: STR of the field where raster pixel values will be stored (default: "values")
:return: None
"""
# ensure that the input raster contains integer values only and open the input raster
file_name = float2int(file_name)
raster, raster_band = open_raster(file_name, band_number=band_number)
# create new shapefile with the create_shp function
new_shp = create_shp(out_shp_fn, layer_name="raster_data", layer_type="polygon")
dst_layer = new_shp.GetLayer()
# create new field to define values
new_field = ogr.FieldDefn(field_name, ogr.OFTInteger)
dst_layer.CreateField(new_field)
# Polygonize(band, hMaskBand[optional]=None, destination lyr, field ID, papszOptions=[], callback=None)
gdal.Polygonize(raster_band, None, dst_layer, 0, [], callback=None)
# create projection file
srs = get_srs(raster)
make_prj(out_shp_fn, int(srs.GetAuthorityCode(None)))
print("Success: Wrote %s" % str(out_shp_fn))
Note
Polygonize
can also be run from terminal/Anaconda prompt by typinggdal_polygonize
.Both the
float2int()
and theraster2polygon()
functions are also available in flusstools withflusstools.geotools.float2int()
andflusstools.geotools.raster2polygon()
respectively (have a look at the geotools.py script).
The raster2polygon()
function can be implemented, for example, to convert the water depth raster for 1000 CFS (h001000.tif from the River Architect sample datasets) to a polygon shapefile:
src_raster = r"" + os.path.abspath("") + "/geodata/rasters/h001000.tif"
tar_shp = r"" + os.path.abspath("") + "/geodata/shapefiles/h_poly_cls.shp"
raster2polygon(src_raster, tar_shp)
Success: Wrote /home/schwindt/jupyter/geodata/shapefiles/h_poly_cls.shp
Rasterize (Vector Shapefile to Raster)#
Similar to gdal.Polygonize
, gdal.RasterizeLayer
represents a handy option to convert a shapefile into a raster. However, to be precise, a shapefile is not really converted into a raster but burned onto a raster. Tus, values stored in a field of a shapefile feature are used (burned) as pixel values for creating a new raster. Attention is required to ensure that the correct values and data types are used. To this end, the below shown rasterize()
function implements the following workflow that avoids potential conversion headaches:
Open the user-provided input shapefile name and layer.
Read the spatial extent of the layer.
Derive the x-y resolution as a function of the spatial extent and a user-defined
pixel_size
(optional keyword argument with default value).Create a new GeoTIFF raster using the
user-defined
output_raster_file_name
,calculated x and y resolution, and
eType
(default isgdal.GDT_Float32
- recall all data type options listed in the raster section.
Apply the geotransformation defined by the source layer extents and the
pixel_size
.Create one raster
band
, fill theband
with the user-definedno_data_value
(default is-9999
), and set theno_data_value
.Set the spatial reference system of the raster to the same as the source shapefile.
Apply
gdal.RasterizeLayer
withdataset=target_ds
(target raster dataset),bands=[1]
(list(integer) - increase to defined more raster bands and assign other values, for example, from other fields of the source shapefile),layer=source_lyr
(layer with features to burn to the raster),pfnTransformer=None
(read more in the gdal docs),pTransformArg=None
(read more in the gdal docs),burn_values=[0]
(a default value that is burned to the raster),options=["ALL_TOUCHED=TRUE"]
defines that all pixels touched by a polygon get the polygon’s field value - if not set: only pixels that are entirely in the polygon get a value assigned,options=["ATTRIBUTE=" + str(kwargs.get("field_name"))]
defines the field name with values to burn.
def rasterize(in_shp_file_name, out_raster_file_name, pixel_size=10, no_data_value=-9999,
rdtype=gdal.GDT_Float32, **kwargs):
"""
Converts any shapefile to a raster
:param in_shp_file_name: STR of a shapefile name (with directory e.g., "C:/temp/poly.shp")
:param out_raster_file_name: STR of target file name, including directory; must end on ".tif"
:param pixel_size: INT of pixel size (default: 10)
:param no_data_value: Numeric (INT/FLOAT) for no-data pixels (default: -9999)
:param rdtype: gdal.GDALDataType raster data type - default=gdal.GDT_Float32 (32 bit floating point)
:kwarg field_name: name of the shapefile's field with values to burn to the raster
:return: produces the shapefile defined with in_shp_file_name
"""
# open data source
try:
source_ds = ogr.Open(in_shp_file_name)
except RuntimeError as e:
print("Error: Could not open %s." % str(in_shp_file_name))
return None
source_lyr = source_ds.GetLayer()
# read extent
x_min, x_max, y_min, y_max = source_lyr.GetExtent()
# get x and y resolution
x_res = int((x_max - x_min) / pixel_size)
y_res = int((y_max - y_min) / pixel_size)
# create destination data source (GeoTIff raster)
target_ds = gdal.GetDriverByName('GTiff').Create(out_raster_file_name, x_res, y_res, 1, eType=rdtype)
target_ds.SetGeoTransform((x_min, pixel_size, 0, y_max, 0, -pixel_size))
band = target_ds.GetRasterBand(1)
band.Fill(no_data_value)
band.SetNoDataValue(no_data_value)
# get spatial reference system and assign to raster
srs = get_srs(source_ds)
try:
srs.ImportFromEPSG(int(srs.GetAuthorityCode(None)))
except RuntimeError as e:
print(e)
return None
target_ds.SetProjection(srs.ExportToWkt())
# RasterizeLayer(Dataset dataset, int bands, Layer layer, pfnTransformer=None, pTransformArg=None,
# int burn_values=0, options=None, GDALProgressFunc callback=0, callback_data=None)
gdal.RasterizeLayer(target_ds, [1], source_lyr, None, None, burn_values=[0],
options=["ALL_TOUCHED=TRUE", "ATTRIBUTE=" + str(kwargs.get("field_name"))])
# release raster band
band.FlushCache()
Tip
Rasterize
can also be run from terminal/Anaconda prompt with gdal_rasterize
.
Finally, the rasterize()
function can be called to convert the polygonized water depth polygon shapefile /geodata/shapefiles/h_poly_cls.shp (download it as a zip file) back to a raster (this is practically useless but an illustrative exercise). Pay attention to the data type, which is gdal.GDT_Int32
in combination with the correctly defined field_name
argument.
src_shp = r"" + os.path.abspath("") + "/geodata/shapefiles/h_poly_cls.shp"
tar_ras = r"" + os.path.abspath("") + "/geodata/rasters/h_re_rastered.tif"
rasterize(src_shp, tar_ras, pixel_size=5, rdtype=gdal.GDT_Int32, field_name="values")
Exercise
Familiarize with the conversion of rasters and shapefiles in the geospatial ecohydraulics exercise.