Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Flood Return Periods

Get ready by cloning the exercise repository:

git clone https://github.com/Ecohydraulics/Exercise-FloodReturn.git
floods Mangfall Bad Aibling Hochwasser

Figure 1:Flood at the Mangfall River in Bavaria (source: KSS 2020).

Terminology

Flood frequency analysis uses a series of discharge data (e.g., from a gauging station) and evaluates the occurrence probability of a particular discharge. Thus, the occurrence probability defines the frequency of a discharge, which is important for two reasons:

  1. Flood safety: Many legal frames use a recurrence interval (i.e., a return period or frequency in units of years) to define safety levels that buildings and infrastructure must meet.

  2. Ecohydraulics: In arid areas, in particular, it is important to know how long certain discharges are below certain levels, where many aquatic habitats may not be deep enough, too hot, or disconnected from the main channel. Therefore, we want to know the exceedance probability of a given discharge.

The relationship between the exceedance probability and the recurrence interval results from the definition of both terms:

The calculation concept of the return period makes two elementary assumptions. First, it is assumed that the individual flow events have a stationary peak. Second, statistical independence of individual events is assumed. The assumption of statistical independence means that this year a 100-year flood occurs with the same probability as next year, regardless of whether or not a 100-year flood actually occurred this year. Thus, for any given year, the probability of a 100-year flood occurring is 1/100 (or 1/50 for a 50-year flood and so on).

The Probability of a 100-year Flood Occurring in 100 Years is 63%

As engineers we often want to know how likely it is that a 100-year flood will occur within the next 2, 5, 10, ... or 100 years (i.e., what are the likely costs of flood damage associated with a 100-year flood?). The answer to that question is “the opposite likelihood of no 100-year flood occurring in the next 2, 5, or 10 years”. Mathematically that means the annual occurrence probability PrPr of an event with a recurrence interval T=100T=100 years over an observation period of Δt[2,5,10,100]\Delta t \in [2, 5, 10, 100] years is:

Pr(T=100,Δt=2,5,10,100)=(1(11/T)Δt)Pr(T=100, \Delta t=2, 5, 10, 100) = (1 - (1-1/T)^{\Delta t})

Table 1 shows solutions to the probability Pr(T,Δt)Pr(T, \Delta t) function for observation periods Δt\Delta t of 2, 5, 10, and 100 years, as well as recurrence intervals TT of 10, 50, and 100 years.

Table 1:Solutions to the probability function Pr(T,Δt)Pr(T, \Delta t) for selected observation periods Δt\Delta t.

Pr(T,Δt)Pr(T, \Delta t)

Δt\Delta t = 2

Δt\Delta t = 5

Δt\Delta t = 10

Δt\Delta t = 100

TT = 10

19.00%

40.95%

65.13%

100.00%

TT = 50

3.96%

9.61%

18.29%

86.74%

TT = 100

1.99%

4.90%

9.56%

63.40%

Visit the USGS water science school to learn more about flood (and drought) recurrence interval.

Get Discharge Data

Discharge Data Sources

Flow data can be retrieved from gauging stations. In Germany, the “Gewässerkundliches Jahrbuch” provides a compound overview of statistic data from gauging stations. Note that many gauging stations are, as in many other countries, too, managed by state authorities and only a small share of data is available from federal institutions. For example, gauge data for Baden-Württemberg are available at the State Institute for the Environment, Survey and Nature Conservation’s (LUBW) geo portal. The following list provides more sources for discharge data around the globe.

import hydrofunctions as hf
hf.draw_map()` # only runs in JupyterLab

Load Data with pandas

Create a new Python file (e.g., discharge_analysis.py) and import pandas as pd at the beginning. Read the provided flow data series file "daily-flow-series.csv" with pd.read_csv. The header (column names) is in row 36, but we do not use the column names from the csv file and overwrite them with the names argument ("Date" and "Q (CMS)" (for Cubic Meters per Second)). Alternatively, we could use the skiprows argument to indicate where the data content starts in the file. With sep=";", we indicate that columns are separated by a semicolon. The usecols=[0, 2] argument specifies that we only want to read columns 0 (date) and 2 (discharge) because the information content of column 1 (time) is not relevant for daily discharge. The parse_dates=[0] argument lets pandas know that column 0 contains date-formatted values. Alternatively, we could use a dtype={"Date": ... } dictionary to specify the data formats of columns. However, using dtype would require importing datetime and induce unnecessary complexity. In addition, the index_col argument defines the column indices, which need to have a date format for the later analyses. In addition, use the optional keyword argument encoding="latin1" because the provided data file contains some special characters that cannot be recognized with the standard utf-8 encoding.

import pandas as pd
df = pd.read_csv("flow-data/daily-flow-series.csv",
                 header=36,
                 sep=";",
                 names=["Date", "Q (CMS)"],
                 usecols=[0, 2],
                 parse_dates=[0],
                 index_col="Date")

Did everything work? Verify the loaded data_series with print(data_series.head())

If your CSV file has special characters (e.g. 3), you may need to use the optional keyword argument encoding="latin1" because some special characters cannot be recognized with the standard utf-8 encoding.

Plot the Data

Plotting data is not the focus of this exercise and for this reason, there is a ready-to-use function available in the plot_discharge.py script. Make sure that the plot_discharge.py is in the same directory as the above discharge_analysis.py Python script (recall how to load Packages, Modules and Libraries). Use the plot_discharge function in plot_discharge.py as follows:

from plot_discharge import plot_discharge
plot_discharge(df.index, df["Q (CMS)"], title="Daily Flows 1826 - 2016")

On a side note, plot_discharge uses the Matplotlib library.

Construct Series of Annual Maximum Discharge

Flood event recurrence intervals result from statistics of the annual maximum discharge. Therefore, use pandas’ resample function to find annual maximum values. The resample function requires the definition of a DateTimeIndex, which we already implemented by using the index_col argument when we loaded the data. The first (and only required) argument for the resample function is the rule defining the length of the time frame to which re-sampling applies. Here, we use "A" for annual statistics. For using bi-annual or 5-year periods, we could use the rule "5A". More rules can be found at the pandas docs. In addition, we use the argument kind=period, because we are only interested in the year in which the discharge occurred. Finally, we apply .max() to run maximum statistics on the data frame. Since the re-sampled dataframe is again a dataframe, all dataframe methods can also be applied to it. That is, instead of max() we can as well use min(), sum(), median(), mean() and so on (see pandas dataframe methods).

annual_max_df = df.resample(rule="A", kind="period").max()

Because we use kind="period", the row indices of annual_max_df correspond to time periods of years. For instance, the row index 1826 corresponds to the period 1826-01-01 through 1826-12-31. However, we need integer numbers of years rather than periods for the calculation of return periods. To get integer formats of years, we transfer the year of each period into a new column of the data frame and reset the row indices. Resetting the row indices to default integer indices through (drop=True) is not absolutely necessary, but serves the physical correctness of the data frame. The argument inplace=True replaces the indices inside annual_max_df (otherwise, we needed to write annual_max_df = annual_max_df.reset_index(drop=True)).

annual_max_df["year"] = annual_max_df.index.year
annual_max_df.reset_index(inplace=True, drop=True)
print(annual_max_df.head()

Optionally, plot the annual maxima with:

plot_discharge(annual_max_df["year"], annual_max_df["Q (CMS)"], title="Annual Flows 1826 - 2016")

Calculate Exceedance Probability and Recurrence Intervals

The exceedance probability PrPr of a particular event within the observation period is:

Pr(i)=(Ni+1)/(N+1)Pr(i) = (N - i + 1) / (N + 1)

where

To rank the events, we first need to sort the maximum annual discharge data frame (annual_max_df) by the smallest to largest discharge value (rather than in time):

annual_max_df_sorted = annual_max_df.sort_values(by="Q (CMS)")

Then, we derive the number of observations NN (n = annual_max_df_sorted.shape[0]) and add a "rank" column, in which we simply enumerate the rows using the range method.

n = annual_max_df_sorted.shape[0]
annual_max_df_sorted.insert(0, "rank", range(1, 1 + n)

Now, we have all ingredients to calculate the probability of every event with the above shown Pr(rank=i)Pr(rank=i)-formula.

annual_max_df_sorted["pr"] = (n - annual_max_df_sorted["rank"] + 1) / (n + 1)

Recall, the recurrence interval (here: return period in years) is the inverse of the exceedance probability and we can add it to the data frame with:

annual_max_df_sorted["return-period"] = 1 / annual_max_df_sorted["pr"]

Check the resulting highest discharge and its return period:

print(annual_max_df_sorted.tail()

Plot the resulting probability and return curves with the plot functions provided in the plot_result.py Python script:

plot_q_freq(annual_max_df_sorted)
plot_q_return_period(annual_max_df_sorted)

Outside the Box

The here shown method is only an interpolation. For extrapolating return periods beyond the length of the observation period (e.g., for extreme events such as a 1000-year flood), a prediction model is necessary (e.g., Gumbel distributed-extrapolation).

After all, there is already software that calculates return periods, freely available at the U.S. Army Corps of Engineers Hydrologic Engineering Center (HEC) U.S. Army Corps of Engineeers, 2016: HEC-SPP. HEC-SPP enables the calculation of flow event frequencies and return periods according to US standards. So if you are not working in or for the United States, you still may want to have your code ready. Moreover, HEC-SPP requires pre-processing of discharge data (i.e., it only works with annual maxima).

References
  1. U.S. Army Corps of Engineeers. (2016). Hydrologic Engineering Centers River Analysis System (HEC-RAS). U.S. Army Corps of Engineeers (USACE). http://www.hec.usace.army.mil/software/hec-ras/