Constructing Hexagon Maps with H3 and Plotly: A Comprehensive Tutorial

Unlocking the Potential of Hexagon Maps for Data Analysis

Amanda Iglesias Moreno
Towards Data Science

--

Sam Balye in Unsplash

Typically, when we want to visualize a variable across a territory using choropleth maps, we use administrative geometries that are commonly known. For instance, if we wanted to see unemployment rates across Europe, we could visualize them by the respective states within each country.

However, administrative regions are often irregular and vary in size compared to one another. For this reason, a useful alternative for visualizing any variable across a territory is to use hexagons to divide it. The advantages include having balanced geometry for better regional comparisons and improved territorial coverage. Additionally, hexagonal maps offer the benefit of minimizing visual bias, as they provide equal representation of areas, unlike traditional administrative boundaries, which can sometimes distort the perception of data due to their irregular shapes and sizes.

In this article, we will provide a step-by-step explanation of how to create hexagonal maps in Python. To accomplish this, we will make use of two libraries that streamline the map-building process: H3 and Plotly.

Analysis Data: Barcelona City Hotel Dataset

The dataset used in this article is available on the open data portal of the city of Barcelona. This open data portal hosts demographic, economic, and sociological data about the city of Barcelona, all of which are freely accessible. The dataset we are utilizing contains information on all the hotels in Barcelona, including their locations. You can download the file from the following link.

The number of hotels will be the variable we visualize on the hexagonal map. The following sections of the article will explain step by step how to create this visualization.

Data Reading and Cleaning

The first step in our analysis, after downloading the file, is to proceed with data reading and cleaning. In this case, the dataset contains many columns that are not relevant to our analysis, and we won’t be examining them. We will specifically select the hotel’s name, its geographical location (latitude and longitude), and perhaps an attribute related to its location (though we won’t use them in this particular case). Once we have selected these columns, we will rename them with simpler names, and then our dataset will be ready for visualization.

Hotel Data Set (Image created by the author)

Hexagon Grid Generation Using H3

In order to visualize the data using a hexagonal map, our first step is to create the grid. To accomplish this, we will utilize the H3 library, developed by Uber. The get_hexagon_grid function is responsible for creating the hexagonal grid in the form of a GeoDataFrame. It starts by creating a hexagon at a specific location (latitude and longitude), in this case, the center of Barcelona. The size of this hexagon is defined by the resolution parameter. Subsequently, additional hexagons of the same size are generated concentrically around the central hexagon. The number of concentric rings to create is determined by the ring_size parameter. Finally, this collection of hexagons is converted into a GeoDataFrame, where each hexagon is assigned a unique ID corresponding to the ID provided by the H3 library.

While we won’t delve into the specifics of each function used to construct the hexagonal grid in this article, interested readers can refer to the library’s documentation for a detailed understanding of how we have applied it.

The following graph illustrates how the parameters resolution and ring_size influence the created grid. Resolution controls the size of the hexagons, meaning that higher resolutions result in smaller hexagons. On the other hand, the ring_size parameter governs the number of concentric rings of hexagons created around the central hexagon. In other words, a larger ring_size leads to a greater number of concentric rings. In the graph below, all the plots have the same axis limits. As you can observe, to cover the same area, using a higher resolution requires more rings because, as previously mentioned, all the hexagons created are of the same size as the central hexagon.

Hexagon Grids at Various Resolutions and Ring Sizes

The chosen resolution will depend on the variations of the variable we want to represent across a specific area. If there is significant variation, a higher resolution will be considered. In this particular case, a resolution of 9 has been selected. Furthermore, the ring_size will depend on the region we aim to cover and the resolution previously chosen. In this specific case, a ring_size of 45 is sufficient to cover the entire area of the city of Barcelona. We won’t delve into the details of how we arrived at this conclusion. In general terms, we obtained the bounding box of the Barcelona city polygon and determined the number of rings needed to cover that area.

Below, you’ll find the creation of the hexagon network in the form of a GeoDataFrame, using the previously described parameters and the get_hexagon_grid function.

As you can see above, the get_hexagon_grid function provides a GeoDataFrame with two columns: the first column serves as a unique ID assigned to each polygon by the H3 library, while the second column contains the actual polygon and is named geometry.

Assignment of Each Hotel to Its Respective Hexagon

After creating the hexagonal grid, it is necessary to assign each hotel to the hexagon it belongs to. The calculate_hexagon_ids function computes the hexagon to which each hotel belongs and creates a new column called Hexagon_ID to store this information.

Hotel Data Set with ‘Hexagon_ID’ Column

Now, the dataset of all hotels also includes information about the hexagon in which each hotel is situated. This information is found in the Hexagon_ID column as an alphanumeric identifier.

Data Grouping Based on the Variables to Be Visualized

Once the ID of the hexagon is assigned, we proceed to calculate the data we wish to visualize. In this particular case, we aim to display the number of hotels in each hexagon. To achieve this, we perform a grouping by Hexagon_ID and a count operation. Additionally, we want to implement a hover-over feature that allows us to view the names of the hotels located in each hexagon. To achieve this, we perform a join operation for all the hotel names in the grouping. We use the HTML <br> tag to indicate a line break in the join, as Plotly uses HTML for defining its hover-over texts.

DataFrame Resulting from Aggregation

As seen above, the grouped dataframe has three columns: (1) Hexagon_ID, which contains the unique hexagon identifier, (2) Count, which holds the number of hotels in that hexagon, and (3) Hotels, which contains a list of the hotel names within the hexagon.

Data Visualization: Cartographic Representation of Hotels in Barcelona Using Hexagons

Once the data has been grouped, we can proceed to the final step, which is the creation of the hexagon map using Plotly.

The create_choropleth_map function is responsible for processing the grouped dataset and the dataset containing the geometries of each hexagon to generate the hexagon map. This map allows us to visualize which areas of the city have a higher concentration of hotels.

Hotel Distribution Heatmap in Barcelona City (Image created by the author)

To create the map, we will employ the choropleth_mapbox function available in Plotly Express. This function generates a map with the defined geometry (in this case, the set of hexagons created) and colors them according to the number of hotels detected in each hexagon, utilizing the continuous color scale selected by the user. When you hover your mouse over one of the hexagons, you can view the list of hotels located within that hexagon.

In this case, the background map used is carto-positron, but this parameter can be easily adjusted to use a different map style that provides better identification of city streets and points of interest, such as open-street-map. Furthermore, we can also utilize a different color scale. In the previous case, we used the Viridis color scale, while in this case, we are using the Reds color scale.

Hotel Distribution Heatmap in Barcelona City (Image created by the author)

The map is interactive, allowing us to zoom in on the area of interest.
As evident when zooming in on the area with reddish hues, most of the hotels in Barcelona are situated around Plaça de Catalunya.

The Area with the Highest Hotel Concentration in Barcelona (Image created by the author)

Summary

Choropleth maps with administrative regions are a valuable means of visualizing how a variable is distributed within a geographic area. However, they have the disadvantage of providing a biased visualization of the variable’s distribution due to the irregular shapes and varying sizes of administrative regions. For this reason, the use of hexagonal maps with regular geometric shapes serves as a highly useful alternative for analyzing distributions across a territory. In this article, we have provided a detailed explanation of how to create a hexagonal grid using the Uber H3 library and how this grid has been utilized in a Plotly visualization to depict the distribution of hotels in Barcelona.

--

--