{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Files, NumPy & Pandas\n", "\n", "Basic (text) file handling, *NumPy*, *pandas*, and *DateTime*. For interactive reading and executing code blocks [](https://mybinder.org/v2/gh/hydro-informatics/jupyter-python-course/main) and find *b06-pybum.ipynb* or {ref}`install-python` locally along with {ref}`jupyter`.\n", "\n", "```{admonition} Watch this section in video format\n", ":class: tip, dropdown\n", "\n", "
Watch this section as a video on the @hydroinformatics channel on YouTube.
\n", "```\n", "\n", "# Load and Write Basic Data Files\n", "\n", "Data can be stored in many different (text) file formats such as *txt* or *csv* files. Python provides the `open(file)` and `write(...)` functions to read and write data from nearby every text file format. In addition, there are packages such as `csv` (for *csv* files), which simplify handling specific file types. The following sections illustrate the use of the `load(file)` and `write(...)` functions. The later shown *pandas* module provides more functions to import and export numeric data along with row and column headers.\n", "\n", "(open-modes)=\n", "## Load (Open) Text File Data \n", "\n", "The `open` command loads text files as file object in Python. The syntax of the `open` command is: " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```python\n", "open(\"file-name\", \"mode\")\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "where:\n", "\n", "* `file-name` is the file to open (e.g., `\"data.txt\"`); if the file is not in the script directory, the *filename* needs to be extended by the full directory (path) to the data file (e.g., `\"C:/experiment1/data.txt\"`).\n", "* `mode` defines the access type and it can take the following values:\n", " - `\"r\"` - read-only (default value if no `\"mode\"` value is provided); the file cannot be modified nor overwritten.\n", " - `\"rb\"` - read-only in binary format; the binary format is advantageous if the file is not a text file but media such as pictures or videos.\n", " - `\"r+\"` - read and write.\n", " - `\"w\"` - write-only; a new file is created if a file with the provided `file-name` does not yet exist.\n", " - `\"wb\"` - write-only in binary mode.\n", " - `\"w+\"` - create, write and read.\n", " - `\"wb+\"` - write and read in binary mode.\n", " - `\"a\"` - append new data to a file; the write-pointer is placed at the end of the file and a new file is created if a file with the provided `file name` does not yet exist.\n", " - `\"ab\"` - append new data in binary mode.\n", " - `\"a+\"` - both append (write at the end) and read.\n", " - `\"ab+\"` - append and read data in binary mode.\n", "\n", "When `\"r\"` or `\"w\"` modes are used, the file pointer (i.e, the blinking cursor that you can see, for example, in Word documents) is placed at the beginning of the file. For `\"a\"` modes, the file pointer is placed at the end of the file.\n", "\n", "It is good practice to read and write data from and to a file within a `with` statement to avoid file lock issues. For example, the following code block creates a new text file within a `with` statement:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "with open(\"data/new.csv\", mode=\"w+\") as file:\n", " file.write(\"And yet it moves.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(read)=\n", "## Read-only\n", "\n", "Once the file object is created, we can parse the file and copy the file data content to a desired Python {ref}`data type ` (e.g., a list, tuple or dictionary). Parsing the data works with {ref}`for-loopsWatch this section as a video on the @hydroinformatics channel on YouTube.
\n", "```\n", "\n", "## Installation\n", "\n", "*NumPy* can be installed through *Anaconda* ({ref}`recall instructionsWatch this section as a video on the @hydroinformatics channel on YouTube.
\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Installation\n", "\n", "*pandas* can be installed through *Anaconda* ({ref}`recall instructions\n", " | Test 1 | \n", "Test 2 | \n", "Test 3 | \n", "Test 4 | \n", "
---|---|---|---|---|
count | \n", "18.000000 | \n", "16.000000 | \n", "15.000000 | \n", "18.000000 | \n", "
mean | \n", "4.111111 | \n", "4.250000 | \n", "4.533333 | \n", "5.555556 | \n", "
std | \n", "2.298053 | \n", "2.792848 | \n", "2.386470 | \n", "2.617188 | \n", "
min | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "
25% | \n", "2.250000 | \n", "2.000000 | \n", "3.000000 | \n", "4.000000 | \n", "
50% | \n", "4.000000 | \n", "4.000000 | \n", "4.000000 | \n", "5.500000 | \n", "
75% | \n", "5.000000 | \n", "6.000000 | \n", "6.000000 | \n", "7.000000 | \n", "
max | \n", "9.000000 | \n", "10.000000 | \n", "9.000000 | \n", "10.000000 | \n", "