Calculating your 5km travel buffer

Over the weekend Victoria, Australia, and especially Melbourne, have come into higher levels of lockdown due to the COVID-19 pandemic. This includes a restriction that the furthest you can travel from home is 5km (with exceptions if your nearest shops are more than 5km away).

Here we'll show how you can calculate your 5km travel bubble from home using some of the best Python tools for spatial analysis.

Setup

First some setup. In Python, we rely on building on third party libraries for our analysis (and hopefully contributing back some of our own). Installation of the spatial libraries can sometimes be a challenge due to the compilation requirements but if you're using the Anaconda Python distribution it is quite simple to get started. First do some installations with the conda install tool, followed by some extras with pip:

conda install gdal fiona geopandas rtree descartes pyproj

pip install -U alphashape geopy osmnx folium

Imports

The following are the imports we'll use for this example. For now don't worry about the individual imports - some may be familiar, and some may not - we'll discuss them in more depth as we go through the notebook.

import alphashape
from descartes import PolygonPatch
import folium
import geopandas as gpd
from geopy.geocoders import Nominatim
from ipywidgets import interact, fixed, widgets
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import osmnx as ox
import pandas as pd
from shapely import geometry
%config InlineBackend.figure_format = 'retina'
plt.rcParams['figure.figsize'] = (10, 10)

Getting the data

Geocoding an address

First we need a location. To do this we perform a process called "geocoding". This is the process of matching an address to a location on the ground. i.e. where is your house? You can use any address that you like, but for now I'll use my local Melbourne train station: Footscray.

The geopy library is a really convenient library that wraps up a bunch of online geocoding services and APIs into a simple consistent package. This means if you have an API key for Google you can use the Google Maps API via Python to get the location of an address, or if you have a Bing API key you can do the same. Importantly the address will be returned in a consistent way.

Normally I'd use the Here maps API - it's a good one with a generous amount of free address searches when you sign up. For this example though I'll use the open Nominatim geocoding API built on Open Street Map as it doesn't require any signup - you just need to set the user agent to tell them who you are.

address = 'Footscray Railway Station Victoria, 3011, Australia'
geocoder = Nominatim(user_agent='Isochrone calculator')
location = geocoder.geocode(address)
location
Location(Footscray, Hyde Street, Footscray, City of Maribyrnong, Victoria, 3011, Australia, (-37.8015202, 144.9025869, 0.0))

This works well and gives me be a Location that I can use to get the latitude and longitude values for the address easily.

location.latitude, location.longitude
(-37.8015202, 144.9025869)

As a side note, I can use this with the folium library to embed a quick interactive map in my notebook so I can see the address is in the place that I expect:

m = folium.Map((location.latitude, location.longitude), max_zoom=20, zoom_start=16)
folium.Marker((location.latitude, location.longitude), popup=address).add_to(m)
m

At this point I could go and create a geometry and build this into a point I can use to help me find my 5km radius, but there are some nice shortcuts using Pandas and GeoPandas. First I create a Pandas DataFrame that contains the address I wish to geocode:

home = pd.DataFrame([{'address': address}])
home
address
0 Footscray Railway Station Victoria, 3011, Aust...

Not very exciting, but if we wanted to we could use this to find the address of many points of interest. For example you might want to calculate the 5km travel buffer around your home, your parents' home and your brothers and sisters' homes. For now we'll stick with one.

The GeoPandas library is a Python library that extends the Pandas dataframe to work with spatial vector data (points, lines and polygons, or for example: addresses, roads, and suburb boundaries). It includes tools that will let you geocode your data using the geopy library as the back-end:

home = gpd.tools.geocode(home['address'], Nominatim, user_agent='Isochrone calculator')

Note here that we supply the geocoder class object and any parameters it requires as keyword arguments to this function. This returns a GeoPandas GeoDataFrame - a dataframe with a geometry that represents the point for each address.

home
geometry address
0 POINT (144.90259 -37.80152) Footscray, Hyde Street, Footscray, City of Mar...

That's really handy, and we can easily put this on the same map as above:

m = folium.Map((location.latitude, location.longitude), max_zoom=20, zoom_start=16)
folium.GeoJson(home, tooltip=folium.GeoJsonTooltip(['address'])).add_to(m)
m

Note that in this case the tooltip can be more complex: if we have more fields we want to see we can place them all in the list and display it on our map.

Getting the street network within 5km of your home

We can move 5km from our homes. However this not 5km as the crow flies (in a straight line), it is 5km of travel. So it's actually 5km as we move through a street network. To calculate what this is we need to download some of that street network data. First we'll need to find out what a boundary is around our home to download just the data we need.

How do we get this 5km bubble? In GIS, a buffer is a standard operation to expand out from a geometry by a set value. However we have a bit of an issue here: our data is in latitude and longitude (which is not metres) - if we try to buffer by 5000 we will get a very wrong answer! First we must re-project the data to a projection based in metres, perform the buffer, the project back.

A later blog post will discuss map projections and how to use them in Python, but for now we will create our bubble by projecting to the GDA2020 MGA Zone 55 projection (EPSG code 7855) to create the buffer before re-projecting back to the original GCS WGS84 projection.

buffer = home.to_crs(epsg=7855).buffer(5000).to_crs(epsg=4326)

It's definitely worth seeing this on a map - this will be a 5000 metre circle around your home. If for example you used the interactive tool from the Age newspaper you would get the same map.

m = folium.Map((location.latitude, location.longitude), max_zoom=20, zoom_start=12)
folium.GeoJson(buffer).add_to(m)
folium.GeoJson(home, tooltip=folium.GeoJsonTooltip(['address'])).add_to(m)
m

Sure looks like I can go a long way! If only. Don't forget: every map is lying to you, even if it's not deliberate!

We'll use the bounds from this along with the osmnx library to extract the road network data from Open Street Map. The osmnx library is a really powerful tool to extract and use data in Python from Open Street Map (the Wikipedia of maps). We'll talk more about how it works as we go on but for now let's grab the data and draw a simple plot:

bounds = buffer.bounds.loc[0]
bounds
minx    144.845830
miny    -37.846556
maxx    144.959347
maxy    -37.756484
Name: 0, dtype: float64
region = ox.graph_from_bbox(bounds['maxy'], bounds['miny'], bounds['minx'], bounds['maxx'])
ox.plot_graph(region);

png

Note that this might take a couple of minutes to download from the Open Street Map servers depending on your internet connection. There are options to just download the road network or the walking network if you prefer, but we're looking at anywhere you can reach within 5km.

Calculating your 5km travel radius

Conceptually there are two steps to calculating your 5km travel radius. 1. Calculate how much of the network that we can reach within 5km; 2. Draw an accurate boundary around this region.

Getting all the nodes in the graph using network algorithms

As part ofcalculating the region of our street network graph we can reach first we need to get the nearest node in our graph to our starting point.

As a bonus this will be very simple as the geocoder we used was Nominatim which is built on top of Open Street Map - as long as we found an address we will be able to match it to a node.

center_node = ox.get_nearest_node(region, (home.loc[0, 'geometry'].y, home.loc[0, 'geometry'].x))

center_node  # this is the node ID
4809590392

Again the data we have extracted from Open Street Map is uses latitude and longitude as its current coordinate system. We need to project this data so that it's in a system that uses metres as its unit of measurement. Once again we'll use EPSG code 7855 - the GDA2020 MGA Zone 55 projection.

region = ox.project_graph(region, 7855)

And now we can choose some distances of interest. Because it looks good we might increment all the way up to 5,000 metres in increments of 500 metres. Ultimately we will be interested in just the 5,000 metres. The methodology here was adapted from the osmnx Isochrones example.

distances = np.arange(0, 5001, 500)
distances
array([   0,  500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000])
# get one color for each isochrone
iso_colors = ox.plot.get_colors(n=len(distances), cmap='plasma', start=0, return_hex=True)
node_colors = {}

for trip_time, color in zip(sorted(distances, reverse=True), iso_colors):
    subgraph = nx.ego_graph(region, center_node, radius=trip_time, distance='length')
    for node in subgraph.nodes():
        node_colors[node] = color

nc = [node_colors[node] if node in node_colors else 'none' for node in region.nodes()]
ns = [15 if node in node_colors else 0 for node in region.nodes()]
fig, ax = ox.plot_graph(region, node_color=nc, node_size=ns, node_alpha=0.8,
                        edge_linewidth=0.2, edge_color='#999999')

png

From this visualisation you can see that I definitely can't go out as far as 5km as the crow flies. You can see as well that I can't cross the Yarra River to the South as I have to travel too far around to cross the bridges. So how does this work? To understand we need to take a closer look at what our region data actually is:

type(region)
networkx.classes.multidigraph.MultiDiGraph

The osmnx library builds on the NetworkX library to provide a graph object. In this case a MultiDiGraph. This is not a kind of graph like a bar-chart (though you can certainly visualise this data!), rather it is a mathematical graph or network. A network is made up of nodes linked by edges. A directed graph or network is made up of nodes linked by edges which you can only traverse in a single direction, and a multi-network is shorthand for saying there may be multiple edges between a pair of nodes.

The data we have downloaded is data from Open Street Map which represents a road network. Nodes are places along the network (like your home address) and the edges are the roads linking those locations. This is extremely useful for us as we have access to many well-defined functions on networks.

The function we're going to use is called the Ego Graph. This is a method to extract a subgraph (a subset of the nodes from the original graph and only the edges connecting those nodes) based on the distance from a source point. Normally without any parameters this number is counted as the number of edges you traverse. A future blog will discuss how this works in more detail.

For now: how does that help us for working out what is within 5km of our houses? The Ego Graph function take an additional parameter which is the name of an edge attribute that contains a value to use as the distance between nodes instead of the default 1. Since we downloaded our spatial data from Open Street Map each of our edges has a length attribute already, and because we have projected our data this length is in metres. For example we can look at the attributes of an edge with:

next(iter(region.edges(data=True)))
(277872640,
 277872709,
 {'osmid': 237324746,
  'name': 'Albion Street',
  'highway': 'tertiary',
  'oneway': False,
  'length': 42.719,
  'geometry': <shapely.geometry.linestring.LineString at 0x7fa2134b8610>})

In our data the two nodes (the first two numbers 277872640 and 277872809) are connected by an edge with a length of 42.719 metres.

So if we want to find our region we can use the Ego Graph function to select our nodes and work from there:

subgraph = nx.ego_graph(region, center_node, radius=5000, distance='length')
nc = ['white' if node in subgraph else 'none' for node in region.nodes()]
ns = [15 if node in subgraph else 0 for node in region.nodes()]
fig, ax = ox.plot_graph(
    region,
    node_color=nc,
    node_size=ns,
    node_alpha=0.8,
    edge_linewidth=0.2,
    edge_color='#999999'
)

png

Building the exterior polygon

So now we have a region of interest defined by the nodes. Let's get those points into a GeoDataFrame so we can take a look at our data as a whole. First we can create a simple list of x and y values:

points = pd.DataFrame([(data['x'], data['y']) for _, data in subgraph.nodes(data=True)], columns=['x', 'y'])
point_geoms = gpd.GeoSeries(points.apply(geometry.Point, axis='columns'))
point_geoms.crs='epsg:7855'
point_geoms.head()
0    POINT (314225.925 5816949.709)
1    POINT (314241.982 5816930.565)
2    POINT (314248.090 5816991.443)
3    POINT (314207.225 5816933.744)
4    POINT (319722.144 5813820.488)
dtype: geometry

The GeoPandas library uses geometries defined in the Shapely library to represent geometries. Given those geometries we can easily merge them and calculate a slightly better boundary: a convex hull (of course without forgetting projecting the data back to latitudes and longitudes).

hull = point_geoms.to_crs(epsg=4326).unary_union.convex_hull
m = folium.Map((location.latitude, location.longitude), max_zoom=20, zoom_start=12)
folium.GeoJson(buffer).add_to(m)
folium.GeoJson(hull).add_to(m)
folium.GeoJson(home, tooltip=folium.GeoJsonTooltip(['address'])).add_to(m)
m

We can see this area is a better estimation compared to the buffer with a smaller region, but it does include the area on the other side of the Yarra River which is not correct. So ideally we'd like to get a better fit to our data which we can do with the alphashape library.

An Alpha Shape (sometimes called a concave hull) is a method of building a minimum geometry to surround a set of points. It requires an alpha value that is used to calculate the edges. From the wikipedia entry:

Then an edge of the alpha-shape is drawn between two members of the finite point set whenever there exists a generalized disk of radius containing none of the point set and which has the property that the two points lie on its boundary.

When then the result is a convex hull. Higher values will tighten the hull, while lower values will tend towards the larger shape. Our coordinates are measured in metres, so when the radius of the disk used to carve the edges is 1km.

bounding_poly = alphashape.alphashape(point_geoms, 0.001)
bounding_poly

svg

Raising the value too high will result in disjoined polygons, or worse still, will result in there being no geometry at all (here the radius of the disk is 10 metres, meaning points have to be very close together to be linked):

bounding_poly = alphashape.alphashape(point_geoms, 0.1)
bounding_poly

svg

You can try it interactively in a Jupyter notebook (or Jupyter lab) with the following code (note that the hull can take a few seconds to generate, thus the continuous update is set False to only update when you've finished moving the slider).

interact(
    alphashape.alphashape,
    points=fixed(point_geoms),
    alpha=widgets.FloatSlider(value=0.005, min=0, max=0.02, step=0.001, readout_format='.3f', continuous_update=False)
)

Of course generating the optimum value of can be tricky, but if you don't supply a value to the alphashape function it will generate an optimum value on your behalf.

Warning: the amount of time to generate the alpha shape from a large group of points can be quite high so be prepared to grab a cup of coffee while you wait!

bounding_poly = alphashape.alphashape(point_geoms)
bounding_poly

svg

Putting it all together

So now we have our optimum hull. It would be useful once again to see this on a map so that we can compare this boundary to the original 5km buffer and the 5km convex hull. To do this we'll put all the geometries into a GeoPandas GeoDataFrame:

boundaries = gpd.GeoDataFrame(
    [
        {'description': '5km buffer', 'geometry': buffer.to_crs(epsg=7855).loc[0]},
        {'description': 'Convex hull', 'geometry': point_geoms.unary_union.convex_hull},
        {'description': 'Alpha shape', 'geometry': bounding_poly}
    ],
    crs='epsg:7855',
    geometry='geometry'
)

We might first look at the area in which you can move for each polygon to compare them for size:

boundaries['area'] = boundaries.area / 1_000_000 # go from square metres to square km
boundaries
description geometry area
0 5km buffer POLYGON ((320347.175 5814134.273, 320323.099 5... 78.413712
1 Convex hull POLYGON ((314896.061 5809394.102, 314563.137 5... 55.449025
2 Alpha shape POLYGON ((314948.024 5809399.148, 314896.061 5... 47.742697

Finally we'll show this on a map by projecting the data:

m = folium.Map((home.loc[0, 'geometry'].y, home.loc[0, 'geometry'].x), max_zoom=20, zoom_start=13)
folium.GeoJson(boundaries, tooltip=folium.GeoJsonTooltip(['description'])).add_to(m)
folium.GeoJson(home, tooltip=folium.GeoJsonTooltip(['address'])).add_to(m)
m

Here we can see the differences in area: the original 5km circular buffer is fine, but it substantially overestimates the distance we can actually travel. The convex hull is a better fit, but it too has it's flaws especially when you look at crossing the river. The alpha shape, while more time consuming to calculate is still easy to obtain thanks to Python's extensive 3rd party library ecosystem, and gives us a much more realistic result (and sadly a much smaller home range where we can travel.

References:

And of course the great work of the Python spatial and scientific stack authors and developers