Your predeccessor has created the graph shown below for some publication. Since then things have changed and you need to reproduce the plot with new data. However, the code that produced it was lost (it only ever existed in an interactive ipython
session, the computer on which it was run has long been replaced). In addition the person who created the plot is no longer in academia and cannot be reached.
From looking at the plot you notice multiple features:
plt.subplot
or plt.subplots
?). The lower one is smaller (gridspec
) and the two subplots share an $x$-axis.scipy.stats.norm
). It has a meaningful title.ax.twinx
). It is a different color than the other plots.plt.step
in order to match the binning of the histogram.plt.text
) as well as a plt.legend
in the upper left corner, which even has a title
It only contains the label for the histogram and the PDF.Try to replicate the plot as closely as possible.
from scipy import stats
import numpy as np
import matplotlib as plt
np.random.seed(13)
gauss = stats.norm(loc=0, scale=1)
x = np.linspace(-5, 5)
pdf_values = gauss.pdf(x)
generated = gauss.rvs(1000)
hist, bins = np.histogram(generated, bins=len(x), density=True)
diff = hist - pdf_values
cdf_values = gauss.cdf(x)
with plt.style.context("ggplot"):
fig, (ax1, ax2) = plt.subplots(2, sharex=True, figsize=(10, 10),
gridspec_kw={'height_ratios': [2, 1]})
ax1.set_title("A very important measurement")
ax1.grid(False)
ax1.plot(x, pdf_values, label="PDF")
ax1.set_ylim(0)
ax1.set_ylabel("Values")
ax1.hist(generated, bins=len(x), density=True, label="Histogram")
ax1.legend(loc=2, title="Legend")
ax1.text(3, 0.2, "$N = 1000$")
ax3 = ax1.twinx()
ax3.set_ylabel("CDF")
ax3.plot(x, cdf_values, "g", label="CDF")
ax3.set_ylim(0)
ax2.step(bins[:-1], diff)
ax2.set_xlim(-5, 5)
ax2.set_ylabel("Residual")
ax2.set_xlabel(r"$x_i$")
plt.savefig("plot_replication.png")
plt.show()
Use geopandas
to create an interesting visualization of data on a map.
You can find a shapefile for the boroughs of London at the link https://data.london.gov.uk/dataset/statistical-gis-boundary-files-london. Download the file statistical-gis-boundaries-london.zip
.
For a lot of statistical data per borough, visit https://data.london.gov.uk/dataset/london-borough-profiles and download the csv
file.
Think about how to best join the two dataframes.
Use the plot
method of the geopandas.DataFrame
to plot some statistics per borough. An example of what this can look like can be found below.
import geopandas
df = geopandas.read_file("statistical-gis-boundaries-london/ESRI/London_Borough_Excluding_MHW.shp").set_index("GSS_CODE")
df = df.join(pd.read_csv("london-borough-profiles.csv", header=0, encoding='iso-8859-1').set_index("Code"))
fig, ax = plt.subplots(1, figsize=(10, 6))
df["GLA_Population_Estimate_2017"] /= 1000
df.plot(column="GLA_Population_Estimate_2017", cmap="Blues", linewidth=0.8, ax=ax, edgecolor='0.8')
ax.axis('off')
ax.set_title("London population estimate 2017", fontdict={"fontsize": 25, "fontweight": 3})
vmin, vmax = df["GLA_Population_Estimate_2017"].min(), df["GLA_Population_Estimate_2017"].max()
sm = plt.cm.ScalarMappable(cmap='Blues', norm=plt.Normalize(vmin=vmin, vmax=vmax))
sm._A = []
cbar = fig.colorbar(sm, label="Thousand")
plt.savefig("geopandas_population.png")
plt.show()
Use the requests
library and bs4.BeautifulSoup
to parse our homepage for the material of each lecture.
Use the developer tools of your browser to figure out the names and attributes of elements.
You can use CSS selectors using soup.select
, or directly operate on the tags with soup.find
/soup.find_all
.
Write a download
function that automatically downloads the material to a specified directory. For this use response.content
, instead of response.text
, and open(file_name, "wb")
in order to directly write the binary content to a file.
Make sure to create the directory if it does not exist and that you deal with file names which would be illegal (for example file names containing /
).
import requests
from bs4 import BeautifulSoup
from pathlib import Path
import os
def get_soup(session, url):
"""Convenience function to use a session to get the soup of a webpage."""
response = session.get(url)
response.raise_for_status()
return BeautifulSoup(response.text, "lxml")
def download(url, file_name, directory):
response = requests.get(url)
response.raise_for_status()
directory = Path(directory)
if not directory.exists():
os.mkdir(directory)
file_name = file_name.replace("/", "_")
with open(directory / file_name, "wb") as f:
f.write(response.content)
session = requests.Session()
base_url = "https://www.physik.uzh.ch/~python/python"
soup = get_soup(session, f"{base_url}/programme.php")
links = [a['href'] for a in soup.select("a.internal")]
for link in links:
soup = get_soup(session, f"{base_url}/{link}")
print(soup.title.text.split(" - ")[-1])
for file_name in [a['href'] for a in soup.select("a.download")]:
url = f"{base_url}/{link}/{file_name}"
print(url)
# uncomment this to actually download the files
# download(url, file_name, ".")
print()