Visualization

Replicate a plot

Your predeccessor has created the graph shown below for some publication. Since then things have changed and you need to reproduce the plot with new data. However, the code that produced it was lost (it only ever existed in an interactive ipython session, the computer on which it was run has long been replaced). In addition the person who created the plot is no longer in academia and cannot be reached.

From looking at the plot you notice multiple features:

  • There are two subplots (plt.subplot or plt.subplots?). The lower one is smaller (gridspec) and the two subplots share an $x$-axis.
  • The top subplot shows both the PDF and a normalized histogram of $N = 1000$ randomly generated values of (presumably) a normal distribution (scipy.stats.norm). It has a meaningful title.
  • The top plot also contains the corresponding CDF on a separate y-axis (ax.twinx). It is a different color than the other plots.
  • The second plot contains the residual between the PDF and the histogram, using plt.step in order to match the binning of the histogram.
  • The overall plot style is not the default one. Hopefully a preset style was used. The top plot contains a grid matching the right $y$-axis and the bottom plot has a grid matching both the $x$- and $y$-axis.
  • In addition, the number of events is added to the plot (plt.text) as well as a plt.legend in the upper left corner, which even has a title It only contains the label for the histogram and the PDF.
  • The axis are all properly labeled. The $x$-axis even has a fancy $\LaTeX$ label.

Try to replicate the plot as closely as possible.

In [1]:
from scipy import stats
import numpy as np
import matplotlib as plt

np.random.seed(13)

gauss = stats.norm(loc=0, scale=1)
x = np.linspace(-5, 5)
pdf_values = gauss.pdf(x)
generated = gauss.rvs(1000)
hist, bins = np.histogram(generated, bins=len(x), density=True)
diff = hist - pdf_values
cdf_values = gauss.cdf(x)

with plt.style.context("ggplot"):
    fig, (ax1, ax2) = plt.subplots(2, sharex=True, figsize=(10, 10),
                                   gridspec_kw={'height_ratios': [2, 1]})
    ax1.set_title("A very important measurement")
    ax1.grid(False)
    ax1.plot(x, pdf_values, label="PDF")
    ax1.set_ylim(0)
    ax1.set_ylabel("Values")
    ax1.hist(generated, bins=len(x), density=True, label="Histogram")
    ax1.legend(loc=2, title="Legend")
    ax1.text(3, 0.2, "$N = 1000$")

    ax3 = ax1.twinx()
    ax3.set_ylabel("CDF")
    ax3.plot(x, cdf_values, "g", label="CDF")
    ax3.set_ylim(0)
    
    ax2.step(bins[:-1], diff)
    ax2.set_xlim(-5, 5)
    ax2.set_ylabel("Residual")
    ax2.set_xlabel(r"$x_i$")

    plt.savefig("plot_replication.png")
    plt.show()

Geospatial data

Use geopandas to create an interesting visualization of data on a map.

You can find a shapefile for the boroughs of London at the link https://data.london.gov.uk/dataset/statistical-gis-boundary-files-london. Download the file statistical-gis-boundaries-london.zip.

For a lot of statistical data per borough, visit https://data.london.gov.uk/dataset/london-borough-profiles and download the csv file.

Think about how to best join the two dataframes.

Use the plot method of the geopandas.DataFrame to plot some statistics per borough. An example of what this can look like can be found below.

In [3]:
import geopandas
In [17]:
df = geopandas.read_file("statistical-gis-boundaries-london/ESRI/London_Borough_Excluding_MHW.shp").set_index("GSS_CODE")
df = df.join(pd.read_csv("london-borough-profiles.csv", header=0, encoding='iso-8859-1').set_index("Code"))
In [18]:
fig, ax = plt.subplots(1, figsize=(10, 6))
df["GLA_Population_Estimate_2017"] /= 1000
df.plot(column="GLA_Population_Estimate_2017", cmap="Blues", linewidth=0.8, ax=ax, edgecolor='0.8')
ax.axis('off')
ax.set_title("London population estimate 2017", fontdict={"fontsize": 25, "fontweight": 3})
vmin, vmax = df["GLA_Population_Estimate_2017"].min(), df["GLA_Population_Estimate_2017"].max()
sm = plt.cm.ScalarMappable(cmap='Blues', norm=plt.Normalize(vmin=vmin, vmax=vmax))
sm._A = []
cbar = fig.colorbar(sm, label="Thousand")
plt.savefig("geopandas_population.png")
plt.show()

More tools

Web scraping

Use the requests library and bs4.BeautifulSoup to parse our homepage for the material of each lecture. Use the developer tools of your browser to figure out the names and attributes of elements. You can use CSS selectors using soup.select, or directly operate on the tags with soup.find/soup.find_all.

Write a download function that automatically downloads the material to a specified directory. For this use response.content, instead of response.text, and open(file_name, "wb") in order to directly write the binary content to a file. Make sure to create the directory if it does not exist and that you deal with file names which would be illegal (for example file names containing /).

In [88]:
import requests
from bs4 import BeautifulSoup
from pathlib import Path
import os

def get_soup(session, url):
    """Convenience function to use a session to get the soup of a webpage."""
    response = session.get(url)
    response.raise_for_status()
    return BeautifulSoup(response.text, "lxml")

def download(url, file_name, directory):
    response = requests.get(url) 
    response.raise_for_status()
    directory = Path(directory)
    if not directory.exists():
        os.mkdir(directory)
    file_name = file_name.replace("/", "_")
    with open(directory / file_name, "wb") as f:
        f.write(response.content)
In [96]:
session = requests.Session()
base_url = "https://www.physik.uzh.ch/~python/python"
soup = get_soup(session, f"{base_url}/programme.php")
links = [a['href'] for a in soup.select("a.internal")]
In [97]:
for link in links:
    soup = get_soup(session, f"{base_url}/{link}")
    print(soup.title.text.split(" - ")[-1])
    for file_name in [a['href'] for a in soup.select("a.download")]:
        url = f"{base_url}/{link}/{file_name}"
        print(url)
        # uncomment this to actually download the files
        # download(url, file_name, ".")
    print()
Best Practice / git
https://www.physik.uzh.ch/~python/python/lecture_bp+git/best-practice.pdf
https://www.physik.uzh.ch/~python/python/lecture_bp+git/git-tutorial.pdf
https://www.physik.uzh.ch/~python/python/lecture_bp+git/exercise_single_local.txt
https://www.physik.uzh.ch/~python/python/lecture_bp+git/exercise_multi_remote.txt
https://www.physik.uzh.ch/~python/python/lecture_bp+git/single_local.txt
https://www.physik.uzh.ch/~python/python/lecture_bp+git/multi_remote.txt

Object-Oriented-Programming
https://www.physik.uzh.ch/~python/python/lecture_oop/slides/oop_slides.pdf
https://www.physik.uzh.ch/~python/python/lecture_oop/oopex.tar.gz
https://www.physik.uzh.ch/~python/python/lecture_oop/oopsolution.tar.gz

Test, Debug, Profile
https://www.physik.uzh.ch/~python/python/lecture_tdp/test_debug_profile.pdf
https://www.physik.uzh.ch/~python/python/lecture_tdp/demo_code.zip
https://www.physik.uzh.ch/~python/python/lecture_tdp/tdp_exercise.pdf
https://www.physik.uzh.ch/~python/python/lecture_tdp/maxima.py
https://www.physik.uzh.ch/~python/python/lecture_tdp/tdp_solutions.zip
https://www.physik.uzh.ch/~python/python/lecture_tdp/software_carpentry_cheatsheets_v1_6.pdf
https://www.physik.uzh.ch/~python/python/lecture_tdp/additional.pdf
https://www.physik.uzh.ch/~python/python/lecture_tdp/carpentry_exercise.zip

Pandas Tutorial
https://www.physik.uzh.ch/~python/python/lecture_pandas/Refugees.ipynb
https://www.physik.uzh.ch/~python/python/lecture_pandas/Refugees.html
https://www.physik.uzh.ch/~python/python/lecture_pandas/Refugees.pdf
https://www.physik.uzh.ch/~python/python/lecture_pandas/data.zip

Data Structures: NumPy, Pandas, and beyond
https://www.physik.uzh.ch/~python/python/lecture_data/che_data_structures.pdf
https://www.physik.uzh.ch/~python/python/lecture_data/Lecture.html
https://www.physik.uzh.ch/~python/python/lecture_data/material_Data_lec.zip
https://www.physik.uzh.ch/~python/python/lecture_data/DataStructures_ex.pdf
https://www.physik.uzh.ch/~python/python/lecture_data/material_Data_ex.zip

Scientific Programming: Analytics
https://www.physik.uzh.ch/~python/python/lecture_analysis/che_analytics.pdf
https://www.physik.uzh.ch/~python/python/lecture_analysis/material_analysis_lec.zip
https://www.physik.uzh.ch/~python/python/lecture_analysis/Analytics_ex.pdf
https://www.physik.uzh.ch/~python/python/lecture_analysis/material_analysis_ex.zip
https://www.physik.uzh.ch/~python/python/lecture_analysis/solution_analysis_ex.zip

Python meets C/C++
https://www.physik.uzh.ch/~python/python/lecture_c++/che_pythoncpp.pdf
https://www.physik.uzh.ch/~python/python/lecture_c++/material_Python_C_lec.zip
https://www.physik.uzh.ch/~python/python/lecture_c++/Python_C_ex.pdf
https://www.physik.uzh.ch/~python/python/lecture_c++/material_Python_C_ex.zip
https://www.physik.uzh.ch/~python/python/lecture_c++/solution_Python_C_ex.zip

Hardware-assisted speed-up techniques
https://www.physik.uzh.ch/~python/python/lecture_hwaccel/ex/Python_hwaccel_ex.pdf
https://www.physik.uzh.ch/~python/python/lecture_hwaccel/ex/hwaccelex.tar.gz