Using {gt}, {gtExtras} and {openair} to present air quality monitoring data

Presented for the 2021 RStudio Table Contest

Author

Jack Davison

Published

September 3, 2022

This is the second iteration of this tutorial. While much of the content has remained the same, a few key user improvements have been made. Many of these – such as code linking/copying/collapsing – owe to the tutorial’s conversion from rmarkdown to quarto. The styling of the openair plots has also been improved, taking lessons from the in-development openairmaps package.

1 Introduction

This tutorial will take you through three “levels” of summarising air quality monitoring data.

  1. First, we will begin to use the gt package, a powerful package that is to tables what ggplot2 is to figures.

  2. Then, we will incorporate the gtExtras package to further theme and add in-table visualisations.

  3. Finally, we will investigate how to incorporate openair plots - plots extensively around the world by the air quality community - into our gt tables.

That last level is perhaps easier said than done - to quote the openair book:

openair is a product of the time it was started and used the highly capable lattice package for plotting. This was a time before ggplot2 and the tidyverse.

gt has functions that help with the inclusion of ggplot2, but not lattice, so this will require some modification of pre-existing gt functions that will be covered in the third part of the tutorial.

This tutorial assumes a general understanding of the tidyverse and openair, but is suitable for gt novices. The ideal target is an air quality professional or researcher who is interested in exploring new tools for presenting their data.

  • If you would like to know more about openair and reading plots, the best place to look is the openair book, available online.

  • If you would like to know more about the tidyverse, there are many resources available including R for Data Science.

All code in this document is reproducible; even the data used comes straight from the openair package (see the “Installation and Data” tab).

The tidyverse, openair and gt are all available on CRAN, so can be downloaded using install.packages().

Code

gtExtras is still under active development, so can be downloaded directly from it’s GitHub repository. I would also recommend keeping up to date with it on its website.

Code
# if needed install.packages("remotes")
remotes::install_github("jthomasmock/gtExtras")

The data that is going to be used in this tutorial is adapted from openair::mydata.

Code
head(mydata)
# A tibble: 6 × 10
  date                   ws    wd   nox   no2    o3  pm10   so2    co  pm25
  <dttm>              <dbl> <int> <int> <int> <int> <int> <dbl> <dbl> <int>
1 1998-01-01 00:00:00  0.6    280   285    39     1    29  4.72  3.37    NA
2 1998-01-01 01:00:00  2.16   230    NA    NA    NA    37 NA    NA       NA
3 1998-01-01 02:00:00  2.76   190    NA    NA     3    34  6.83  9.60    NA
4 1998-01-01 03:00:00  2.16   170   493    52     3    35  7.66 10.2     NA
5 1998-01-01 04:00:00  2.4    180   468    78     2    34  8.07  8.91    NA
6 1998-01-01 05:00:00  3      190   264    42     0    16  5.50  3.05    NA

The goal for this tutorial is to produce a summary table of air quality pollutants at a single site for a single year (feel free to click “Show me the final table” to see). The principles behind this could produce much different table structures however (e.g., a pollutant per row, a year per row, a site per row, some combination of the three, some other idea…).

For this table, the initial transformations that need to be done are as follows:

  • Filter for a single year (2003 is chosen arbitrarily, feel free to pick any year in the mydata dataset).
  • Reshape the data from a “wide” to “long” format, such that the concentrations are all in one column.
  • Turn the “species” column into a factor in the order we would like them to appear in the final table.
Code
data = mydata |>
  mutate(year = lubridate::year(date)) |>
  filter(year == 2003) |>
  pivot_longer(nox:pm25, names_to = "species", values_to = "conc") |>
  mutate(species = factor(species, c(
    "co", "nox", "no2", "o3", "so2", "pm25", "pm10"
  )))

The final structure is now:

Code
head(data, n = 10)
# A tibble: 10 × 6
   date                   ws    wd  year species  conc
   <dttm>              <dbl> <int> <dbl> <fct>   <dbl>
 1 2003-01-01 00:00:00   5.2   160  2003 co      0.675
 2 2003-01-01 01:00:00   4.6   140  2003 co      0.967
 3 2003-01-01 02:00:00   3.6   140  2003 co      1.2  
 4 2003-01-01 03:00:00   4.6   140  2003 co      1.15 
 5 2003-01-01 04:00:00   5.7   140  2003 co      0.725
 6 2003-01-01 05:00:00   4.6   140  2003 co      0.575
 7 2003-01-01 06:00:00   4.1   160  2003 co      0.525
 8 2003-01-01 07:00:00   5.2   160  2003 co      0.8  
 9 2003-01-01 08:00:00   3.6   160  2003 co      1    
10 2003-01-01 09:00:00   3.6   180  2003 co      0.925

Below you can see the table we’re aiming for by the end of Level 3.

Marylebone 2003 | Air Quality Monitoring Summary
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite.
Species Name Formula Statistics Visualisations
p25 median mean p75 max missing trend1 Poll. Rose Polar Plt.
Carbon Monoxide

CO

0.600 0.975 1.12 1.50 4.30 1.63% 1.1
Oxides of Nitrogen

NOx

76.0 139 164 226 749 6.27% 164.1
Nitrogen Dioxide

NO2

36.0 52.0 56.0 72.0 206 6.27% 53.5
Ozone

O3

2.00 4.00 7.67 11.0 70.0 3.68% 5.7
Sulfur Dioxide

SO2

2.00 3.75 4.40 5.75 44.2 3.86% 3.3
Particulate Matter

PM2.5

11.0 17.0 19.1 25.0 83.0 6.71% 17.3

PM10

23.0 34.0 37.0 48.0 235 1.26% 30.7
The data were obtained using the openair package.
1 Monthly average. The lowest and highest months are indicated with green and red, respectively.

2 Tutorials

Level 1: Using {gt} to present summary statistics

Much as a ggplot2 plot begins with ggplot2::ggplot(), a gt table begins with gt::gt(). However, while frequently a ggplot2::ggplot() function can be fed all of our raw data, it can be easier to provide gt::gt() with a pre-made summary. Therefore our first step is to summarise our air quality data in some way. In real life, the summary statistics you chose will likely align with the legislation of your country or the sorts of statistics a client wants, but here some simple statistics are chosen.

The below code calculates the 25th percentile, median, mean, 7th percentile and maximum of each pollutant, as well as the percentage of data that is missing. Note the trick used for that last statistic - sum(is.na(conc)) checks whether each value in the column is NA and then sums that column of TRUE/FALSE values, where TRUE = 1 and FALSE = 0. dplyr::n() gives the number of rows in the group, meaning that the division of the two is effectively “number of missing divided by total”.

Summary statistics could be calculated however we’d like - including using openair functions - but this code does the job.

Code
averages = data |>
  group_by(species) |>
  summarise(
    p25 = quantile(conc, .25, na.rm = T),
    median = median(conc, na.rm = T),
    mean = mean(conc, na.rm = T),
    p75 = quantile(conc, .75, na.rm = T),
    max = max(conc, na.rm = T),
    missing = sum(is.na(conc)) / n()
  )

averages
# A tibble: 7 × 7
  species   p25  median   mean    p75   max missing
  <fct>   <dbl>   <dbl>  <dbl>  <dbl> <dbl>   <dbl>
1 co        0.6   0.975   1.12   1.5    4.3  0.0163
2 nox      76   139     164.   226.   749    0.0627
3 no2      36    52      56.0   72    206    0.0627
4 o3        2     4       7.67  11     70    0.0368
5 so2       2     3.75    4.40   5.75  44.2  0.0386
6 pm25     11    17      19.1   25     83    0.0671
7 pm10     23    34      37.0   48    235    0.0126

Now we’re at the position where we can begin a gt table. As alluded to earlier, we’ll begin by providing the gt::gt() function with our data.

Code
our_table = gt(averages)

our_table
species p25 median mean p75 max missing
co 0.6 0.975 1.118069 1.50 4.30 0.01632420
nox 76.0 139.000 163.941055 225.50 749.00 0.06267123
no2 36.0 52.000 55.964682 72.00 206.00 0.06267123
o3 2.0 4.000 7.673975 11.00 70.00 0.03675799
so2 2.0 3.750 4.398658 5.75 44.25 0.03858447
pm25 11.0 17.000 19.071341 25.00 83.00 0.06712329
pm10 23.0 34.000 37.009133 48.00 235.00 0.01255708

gt has produced a HTML table which would look excellent in an rmarkdown or quarto report, but it clearly could do with some finessing. For example, it may be useful to format the numeric columns to be a little shorter - we don’t need that many significant figures. gt has a collection of fmt_*() functions for this purpose. In our case, gt::fmt_number().

Note that, unlike ggplot2 where additional geom_*() functions are added, in gt additional features like fmt_*() are piped.

fmt_*() functions require columns to be specified. This argument works on tidyselect principles, so when we provide the argument where(is.numeric) we are telling it to format all numeric columns (in this case, all columns besides “species”). We could specify a number of decimal places or significant figures - in this case the latter - and we can also tell the column to drop trailing zeroes. This means that the median of the NOx row won’t read “163.000…” just because the CO row goes to more decimal places.

Code
our_table = our_table |>
  fmt_number(
    columns = where(is.numeric),
    n_sigfig = 3,
    drop_trailing_zeros = T
  )

our_table
species p25 median mean p75 max missing
co 0.600 0.975 1.12 1.50 4.30 0.0163
nox 76.0 139 164 226 749 0.0627
no2 36.0 52.0 56.0 72.0 206 0.0627
o3 2.00 4.00 7.67 11.0 70.0 0.0368
so2 2.00 3.75 4.40 5.75 44.2 0.0386
pm25 11.0 17.0 19.1 25.0 83.0 0.0671
pm10 23.0 34.0 37.0 48.0 235 0.0126

This isn’t quite right as the “missing” column is actually a percentage. We can override the previous gt::fmt_number() column by piping this new output into gt::fmt_percent() and specifying the “missing” column. Note that, much like in the tidyverse, columns are provided unquoted.

Code
our_table = our_table |>
  fmt_percent(columns = missing)

our_table
species p25 median mean p75 max missing
co 0.600 0.975 1.12 1.50 4.30 0.0163
nox 76.0 139 164 226 749 0.0627
no2 36.0 52.0 56.0 72.0 206 0.0627
o3 2.00 4.00 7.67 11.0 70.0 0.0368
so2 2.00 3.75 4.40 5.75 44.2 0.0386
pm25 11.0 17.0 19.1 25.0 83.0 0.0671
pm10 23.0 34.0 37.0 48.0 235 0.0126

The “species” column could also be refined - currently the second row reads “nox” when it should read “NOx”. gt supports markdown syntax for achieving things like subscripts, bold text and the like. In this case, we’ll change our “species” column to be written as markdown. Once again, we can use a fmt_*() function, this time to format the column as “markdown”.

Just for completion, while we’re using dplyr::mutate() let’s insert the English names of the pollutants too, and put them at the far left of the table.

Code
tab_data = averages |>
  mutate(
    name = case_when(
      species == "co" ~ "Carbon Monoxide",
      species == "nox" ~ "Oxides of Nitrogen",
      species == "no2" ~ "Nitrogen Dioxide",
      species == "o3"  ~ "Ozone",
      species == "so2" ~ "Sulfur Dioxide",
      species == "pm25" ~ "Particulate Matter",
      species == "pm10" ~ ""
    ),
    species = case_when(
      species == "co" ~ "CO",
      species == "nox" ~ "NO<sub>x</sub>",
      species == "no2" ~ "NO<sub>2</sub>",
      species == "o3"  ~ "O<sub>3</sub>",
      species == "so2" ~ "SO<sub>2</sub>",
      species == "pm25" ~ "PM<sub>2.5</sub>",
      species == "pm10" ~ "PM<sub>10</sub>"
    )
  ) |>
  relocate(name, .before = species)

our_table = gt(tab_data) |>
  fmt_number(
    columns = where(is.numeric),
    n_sigfig = 3,
    drop_trailing_zeros = T
  ) |>
  fmt_percent(columns = missing) |>
  fmt_markdown(columns = species)

our_table
name species p25 median mean p75 max missing
Carbon Monoxide

CO

0.600 0.975 1.12 1.50 4.30 0.0163
Oxides of Nitrogen

NOx

76.0 139 164 226 749 0.0627
Nitrogen Dioxide

NO2

36.0 52.0 56.0 72.0 206 0.0627
Ozone

O3

2.00 4.00 7.67 11.0 70.0 0.0368
Sulfur Dioxide

SO2

2.00 3.75 4.40 5.75 44.2 0.0386
Particulate Matter

PM2.5

11.0 17.0 19.1 25.0 83.0 0.0671

PM10

23.0 34.0 37.0 48.0 235 0.0126

By default, gt simply carries over the column names to be the header names, but much like the axis labels that ggplot2 defaults to these will often want changing. Column names in data are often written to be easy to use, but column names in a table should be written to be easy to read. It may be useful at this stage to refer to the “elements of a table” diagram from the gt website.

“The elements of a gt table.”

We are wanting to control something to do with columns so we’ll use a cols_*() function. This family of functions can move columns around, merge them, hide them and - important right now - rename them. The syntax of gt::cols_label() is almost the inverse of dplyr::rename() - instead of “new = old”, “old = new” is provided. The new name should also be given as a string.

Code
our_table = our_table |>
  cols_label(name = "Species Name",
             species = "Formula")

our_table
Species Name Formula p25 median mean p75 max missing
Carbon Monoxide

CO

0.600 0.975 1.12 1.50 4.30 0.0163
Oxides of Nitrogen

NOx

76.0 139 164 226 749 0.0627
Nitrogen Dioxide

NO2

36.0 52.0 56.0 72.0 206 0.0627
Ozone

O3

2.00 4.00 7.67 11.0 70.0 0.0368
Sulfur Dioxide

SO2

2.00 3.75 4.40 5.75 44.2 0.0386
Particulate Matter

PM2.5

11.0 17.0 19.1 25.0 83.0 0.0671

PM10

23.0 34.0 37.0 48.0 235 0.0126

This table is already looking great, but we can refine it on a whole table level. To make large scale changes, we can use the a tab_*() function. These could be things like adding a title, subtitle, footnote, or a “spanner title” that groups certain columns together. In the next code chunk we’ll implement all of these things. Note that we can format elements like the figure title using markdown by wrapping the string in the gt::md() function.

Code
our_table = our_table |>
  tab_spanner(label = "Statistics", columns = p25:missing) |>
  tab_header(
    title = md("<b>Marylebone 2003</b> | Air Quality Monitoring Summary"),
    subtitle = md(
      "The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite."
    )
  ) |>
  tab_source_note(source_note = md("The data were obtained using the <code>openair</code> package."))

our_table
Marylebone 2003 | Air Quality Monitoring Summary
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite.
Species Name Formula Statistics
p25 median mean p75 max missing
Carbon Monoxide

CO

0.600 0.975 1.12 1.50 4.30 0.0163
Oxides of Nitrogen

NOx

76.0 139 164 226 749 0.0627
Nitrogen Dioxide

NO2

36.0 52.0 56.0 72.0 206 0.0627
Ozone

O3

2.00 4.00 7.67 11.0 70.0 0.0368
Sulfur Dioxide

SO2

2.00 3.75 4.40 5.75 44.2 0.0386
Particulate Matter

PM2.5

11.0 17.0 19.1 25.0 83.0 0.0671

PM10

23.0 34.0 37.0 48.0 235 0.0126
The data were obtained using the openair package.

This is a good stopping point to review the code we’ve used so far. Notice what we have done:

  • We have created a summary data frame, and then turned it into a HTML table using gt().
  • We used fmt_*() functions to format numeric and character columns.
  • We used a col_*() function to change a property of a column (in this case, it’s name)
  • We used tab_*() functions to make table-wide changes - giving it a title, spanner headers, etc.

Review the full code from this level in the “Code” tab. Once you’re comfortable with it, feel free to move onto Level 2 where gtExtras will be incorporated to enhance the table’s theme and introduce in-line visualisations.

Code
data = mydata |>
  mutate(year = lubridate::year(date)) |>
  filter(year == 2003) |>
  pivot_longer(nox:pm25, names_to = "species", values_to = "conc") |>
  mutate(species = factor(species, c(
    "co", "nox", "no2", "o3", "so2", "pm25", "pm10"
  ))) |>
  arrange(species)

averages = data |>
  group_by(species) |>
  summarise(
    p25 = quantile(conc, .25, na.rm = T),
    median = median(conc, na.rm = T),
    mean   = mean(conc, na.rm = T),
    p75 = quantile(conc, .75, na.rm = T),
    max    = max(conc, na.rm = T),
    missing = sum(is.na(conc)) / n()
  )

tab_data = averages |>
  mutate(
    name = case_when(
      species == "co" ~ "Carbon Monoxide",
      species == "nox" ~ "Oxides of Nitrogen",
      species == "no2" ~ "Nitrogen Dioxide",
      species == "o3"  ~ "Ozone",
      species == "so2" ~ "Sulfur Dioxide",
      species == "pm25" ~ "Particulate Matter",
      species == "pm10" ~ ""
    ),
    species = case_when(
      species == "co" ~ "CO",
      species == "nox" ~ "NO<sub>x</sub>",
      species == "no2" ~ "NO<sub>2</sub>",
      species == "o3"  ~ "O<sub>3</sub>",
      species == "so2" ~ "SO<sub>2</sub>",
      species == "pm25" ~ "PM<sub>2.5</sub>",
      species == "pm10" ~ "PM<sub>10</sub>"
    )
  ) |>
  relocate(name, .before = species)

our_table = gt(tab_data) |>
  
  # Format columns
  fmt_number(
    columns = where(is.numeric),
    n_sigfig = 3,
    drop_trailing_zeros = T
  ) |>
  fmt_percent(columns = missing) |>
  fmt_markdown(columns = species) |>
  
  # Column labels
  cols_label(name = "Species Name",
             species = "Formula") |>
  
  # Format table (add spanners, title, footnotes, etc.)
  tab_spanner(label = "Statistics", columns = p25:missing) |>
  tab_header(
    title = md("<b>Marylebone 2003</b> | Air Quality Monitoring Summary"),
    subtitle = md(
      "The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite."
    )
  ) |>
  tab_source_note(source_note = md("The data were obtained using the <code>openair</code> package."))
Marylebone 2003 | Air Quality Monitoring Summary
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite.
Species Name Formula Statistics
p25 median mean p75 max missing
Carbon Monoxide

CO

0.600 0.975 1.12 1.50 4.30 0.0163
Oxides of Nitrogen

NOx

76.0 139 164 226 749 0.0627
Nitrogen Dioxide

NO2

36.0 52.0 56.0 72.0 206 0.0627
Ozone

O3

2.00 4.00 7.67 11.0 70.0 0.0368
Sulfur Dioxide

SO2

2.00 3.75 4.40 5.75 44.2 0.0386
Particulate Matter

PM2.5

11.0 17.0 19.1 25.0 83.0 0.0671

PM10

23.0 34.0 37.0 48.0 235 0.0126
The data were obtained using the openair package.

Level 2: Using {gtExtras} to add in-line visualisations

NB: The gtExtras package is currently under extremely active development, so be prepared for breaking changes until functionality is added to gt or gtExtras gets a stable release.

The gtExtras package provides some helper functions to simplify the use of gt somewhat. This section will focus in on some of its key functionality, particularly the addition of “sparklines” to represent trends in a pollutant.

First, however, we’ll look at the use of the gt_theme_*() functions, which - much like ggplot2::theme_*() functions - make large changes to the appearance of our table. The themes are based on many popular websites, such as FiveThirtyEight, the Guardian newspaper, the New York Times, and more. Personally, I think the ESPN theme looks nice and clean.

Code
our_table = our_table |>
  gt_theme_espn()

our_table
Marylebone 2003 | Air Quality Monitoring Summary
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite.
Species Name Formula Statistics
p25 median mean p75 max missing
Carbon Monoxide

CO

0.600 0.975 1.12 1.50 4.30 0.0163
Oxides of Nitrogen

NOx

76.0 139 164 226 749 0.0627
Nitrogen Dioxide

NO2

36.0 52.0 56.0 72.0 206 0.0627
Ozone

O3

2.00 4.00 7.67 11.0 70.0 0.0368
Sulfur Dioxide

SO2

2.00 3.75 4.40 5.75 44.2 0.0386
Particulate Matter

PM2.5

11.0 17.0 19.1 25.0 83.0 0.0671

PM10

23.0 34.0 37.0 48.0 235 0.0126
The data were obtained using the openair package.

On the topic of changing the appearence of a plot, gtExtras contains the gtExtras::gt_highlight_rows() function, which could be useful to draw attention to a certain row of a table. This could be a pollutant that is of particular interest, sites that exceed emission limits, or the current year (depending on how your table is formatted). There is also the equivalent gtExtras::gt_highlight_cols() function.

This functionality is demonstrated below, but we won’t be taking it any further in this case. The “rows” argument can take a logical expression (here checking that the row is either NO2 or NOx). Certain columns can also be “targeted”, meaning that only certain cells are put in bold.

Code
our_table |>
  gt_highlight_rows(
    rows = name %in% c("Nitrogen Dioxide", "Oxides of Nitrogen"),
    fill = "lightyellow",
    target_col = c(name, species),
    bold_target_only = T
  )
Marylebone 2003 | Air Quality Monitoring Summary
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite.
Species Name Formula Statistics
p25 median mean p75 max missing
Carbon Monoxide

CO

0.600 0.975 1.12 1.50 4.30 0.0163
Oxides of Nitrogen

NOx

76.0 139 164 226 749 0.0627
Nitrogen Dioxide

NO2

36.0 52.0 56.0 72.0 206 0.0627
Ozone

O3

2.00 4.00 7.67 11.0 70.0 0.0368
Sulfur Dioxide

SO2

2.00 3.75 4.40 5.75 44.2 0.0386
Particulate Matter

PM2.5

11.0 17.0 19.1 25.0 83.0 0.0671

PM10

23.0 34.0 37.0 48.0 235 0.0126
The data were obtained using the openair package.

The most interesting functionality, perhaps, is the idea of in-line visualisations. These allow us to, at a glance, understand the distribution of some numeric data or - in our case - the trend in a time series. Plotting a time series is a classic part of air quality monitoring data analysis - openair itself has a dedicated function for it.

Code
timePlot(
  data,
  pollutant = "conc",
  type = "species",
  y.relation = "free",
  key = F,
  avg.time = "week"
)

To get a “sparkline” into our table, there are two main steps:

  1. Create a list column which contains a list of the data to be plotted as a sparkline. This means that there will be a column where each row is a list of data.

  2. Use gtExtras to format this column as a sparkline.

The code outlined below will achieve Step 1. openair is used to time average our data to be monthly - this will ensure our sparkline won’t be totally unreadable. The most unusual step is perhaps the dplyr::summarise(trend = list(conc)) step. This is creating a data frame - one row per “species” as per the dplyr::group_by() - which has the column “trend” which contains a list of the concentration data for the corresponding species. Notice the structure below - the “trend” column is populated by lists of 12 “dbl” values - one for each month of the year.

Code
spark = data |>
  timeAverage(avg.time = "month", type = "species") |>
  group_by(species) |>
  summarise(trend = list(conc))

all_data = left_join(averages, spark)
Joining, by = "species"
Code
all_data
# A tibble: 7 × 8
  species   p25  median   mean    p75   max missing trend     
  <fct>   <dbl>   <dbl>  <dbl>  <dbl> <dbl>   <dbl> <list>    
1 co        0.6   0.975   1.12   1.5    4.3  0.0163 <dbl [12]>
2 nox      76   139     164.   226.   749    0.0627 <dbl [12]>
3 no2      36    52      56.0   72    206    0.0627 <dbl [12]>
4 o3        2     4       7.67  11     70    0.0368 <dbl [12]>
5 so2       2     3.75    4.40   5.75  44.2  0.0386 <dbl [12]>
6 pm25     11    17      19.1   25     83    0.0671 <dbl [12]>
7 pm10     23    34      37.0   48    235    0.0126 <dbl [12]>

With our new data - now including the list column - let’s re-apply all of the code we have so far and see what the “raw” gt output is. Note that the tutorial_level_1() function is a stand-in for all the formatting we applied in the previous section. Notice that all of the values of each list have now been printed - not what we’re after.

Code
tab_data = all_data |>
  mutate(
    name = case_when(
      species == "co" ~ "Carbon Monoxide",
      species == "nox" ~ "Oxides of Nitrogen",
      species == "no2" ~ "Nitrogen Dioxide",
      species == "o3"  ~ "Ozone",
      species == "so2" ~ "Sulfur Dioxide",
      species == "pm25" ~ "Particulate Matter",
      species == "pm10" ~ ""
    ),
    species = case_when(
      species == "co" ~ "CO",
      species == "nox" ~ "NO<sub>x</sub>",
      species == "no2" ~ "NO<sub>2</sub>",
      species == "o3"  ~ "O<sub>3</sub>",
      species == "so2" ~ "SO<sub>2</sub>",
      species == "pm25" ~ "PM<sub>2.5</sub>",
      species == "pm10" ~ "PM<sub>10</sub>"
    )
  ) |>
  relocate(name, .before = species)

our_table = gt(tab_data) |>
  tutorial_level_1() |>
  gt_theme_espn()

our_table
Marylebone 2003 | Air Quality Monitoring Summary
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite.
Species Name Formula Statistics trend
p25 median mean p75 max missing
Carbon Monoxide

CO

0.600 0.975 1.12 1.50 4.30 0.0163 1.1156836, 1.2255305, 1.1926563, 0.9887262, 1.2467706, 1.1988858, 1.1338768, 0.8610005, 1.1812325, 0.8946839, 1.2881151, 1.0920393
Oxides of Nitrogen

NOx

76.0 139 164 226 749 0.0627 155.2497, 196.4437, 164.6905, 132.4206, 179.0217, 163.5314, 142.0042, 109.2338, 190.7626, 147.3095, 205.6243, 164.0558
Nitrogen Dioxide

NO2

36.0 52.0 56.0 72.0 206 0.0627 44.21981, 54.23906, 57.30270, 50.46797, 64.14130, 61.18131, 54.07547, 48.20223, 68.16201, 49.19054, 64.55587, 53.47755
Ozone

O3

2.00 4.00 7.67 11.0 70.0 0.0368 5.921516, 3.308892, 7.297844, 13.272981, 8.668478, 10.086351, 8.446549, 13.558459, 5.318267, 5.821221, 3.563282, 5.674324
Sulfur Dioxide

SO2

2.00 3.75 4.40 5.75 44.2 0.0386 4.314640, 5.468944, 5.189721, 4.552185, 3.750152, 5.050094, 4.054082, 3.540692, 4.437384, 4.481669, 4.598770, 3.341903
Particulate Matter

PM2.5

11.0 17.0 19.1 25.0 83.0 0.0671 14.61111, 22.92192, 23.17655, 20.02368, 16.84564, 19.50485, 13.86406, 21.49507, 22.29248, 16.26497, 20.96106, 17.31579

PM10

23.0 34.0 37.0 48.0 235 0.0126 30.16986, 44.92042, 42.60434, 37.45582, 35.37959, 39.36376, 37.55267, 38.15724, 40.13624, 30.71409, 37.85273, 30.74763
The data were obtained using the openair package.

To accomplish Step 2, we use the gtExtras function gtExtras::gt_plt_sparkline(). Much like other functions we have used, this takes a column (we’ll give it “trend”) and attempts to transform it into a sparkline. One important option to set is “same_limit” to be FALSE - this is the equivalent of setting “scales = ‘free’” in a ggplot2::facet_wrap() function.

The sparkline also highlights the maximum and minimum values using coloured dots. We can specify their colours using the “palette” argument.

Notice also that gtExtras::gt_plt_sparkline() prints the final value of the sparkline - this could be useful if we were showing yearly data, as that value would correspond to the most recent year in our data set! Here it just represents the average value in December so it could be removed if we wanted to drop it - for this we’d set “label” to be FALSE.

Code
our_table = our_table |>
  gt_plt_sparkline(
    column = trend,
    palette = c("black", "black", "chartreuse3", "red", "black"),
    same_limit = F
  )

our_table
Marylebone 2003 | Air Quality Monitoring Summary
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite.
Species Name Formula Statistics trend
p25 median mean p75 max missing
Carbon Monoxide

CO

0.600 0.975 1.12 1.50 4.30 0.0163 1.1
Oxides of Nitrogen

NOx

76.0 139 164 226 749 0.0627 164.1
Nitrogen Dioxide

NO2

36.0 52.0 56.0 72.0 206 0.0627 53.5
Ozone

O3

2.00 4.00 7.67 11.0 70.0 0.0368 5.7
Sulfur Dioxide

SO2

2.00 3.75 4.40 5.75 44.2 0.0386 3.3
Particulate Matter

PM2.5

11.0 17.0 19.1 25.0 83.0 0.0671 17.3

PM10

23.0 34.0 37.0 48.0 235 0.0126 30.7
The data were obtained using the openair package.

gtExtras also has functions to add density functions, histograms, bar charts and bullet charts to your tables, so it is worth investigating these too. For some, like gtExtras::gt_plt_bar(), creating a list column isn’t even necessary. The key thing is that none of the gtExtras “plotting” functions will require anything more complex than making a list column.

It would be good for readers to understand what our sparkline represents. This may be a good opportunity to do some more formatting of our table using another tab_*() function, gt::tab_footnote(). This is similar to gt::tab_source_note() which we used before, but prints the label which it is provided with a superscript number associated with a specific part of our table. This could be the title, subtitle, a column header, or more.

To target specific parts of a gt table, there is a family of cells_*() functions. In this case, we use the gt::cells_column_labels() function, which targets a specific column header. Notice that “TREND” now possesses a superscript “1”, which is associated with the footnote at the bottom of the table.

Code
our_table = our_table |>
  tab_footnote(locations = cells_column_labels(columns = trend),
               footnote = "Monthly average. The lowest and highest months are indicated with green and red, respectively.")

our_table
Marylebone 2003 | Air Quality Monitoring Summary
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite.
Species Name Formula Statistics trend1
p25 median mean p75 max missing
Carbon Monoxide

CO

0.600 0.975 1.12 1.50 4.30 0.0163 1.1
Oxides of Nitrogen

NOx

76.0 139 164 226 749 0.0627 164.1
Nitrogen Dioxide

NO2

36.0 52.0 56.0 72.0 206 0.0627 53.5
Ozone

O3

2.00 4.00 7.67 11.0 70.0 0.0368 5.7
Sulfur Dioxide

SO2

2.00 3.75 4.40 5.75 44.2 0.0386 3.3
Particulate Matter

PM2.5

11.0 17.0 19.1 25.0 83.0 0.0671 17.3

PM10

23.0 34.0 37.0 48.0 235 0.0126 30.7
The data were obtained using the openair package.
1 Monthly average. The lowest and highest months are indicated with green and red, respectively.

This is another good stopping point to review the code we’ve used so far. Notice what we have done:

  • We have used gtExtras to quickly add a theme to our gt table.
  • We have explored how gtExtras can be used to highlight rows or columns of interest.
  • We have used gtExtras and list columns to add attractive “sparklines” in our table to, at a glance, see trends in our time series.

Review the full code from this level in the “Code” tab. Once you’re comfortable with it, feel free to move onto Level 3 where openair plots will be inserted in-line, similar to the sparkline we’ve already seen.

Code
# Calculate averages and get data for sparkline, and combine

data = mydata |>
  mutate(year = lubridate::year(date)) |>
  filter(year == 2003) |>
  pivot_longer(nox:pm25, names_to = "species", values_to = "conc") |>
  mutate(species = factor(species, c(
    "co", "nox", "no2", "o3", "so2", "pm25", "pm10"
  ))) |>
  arrange(species)

averages = data |>
  group_by(species) |>
  summarise(
    p25 = quantile(conc, .25, na.rm = T),
    median = median(conc, na.rm = T),
    mean   = mean(conc, na.rm = T),
    p75 = quantile(conc, .75, na.rm = T),
    max    = max(conc, na.rm = T),
    missing = sum(is.na(conc)) / n()
  )

spark = data |>
  timeAverage(avg.time = "month", type = "species") |>
  group_by(species) |>
  summarise(trend = list(conc))

all_data = left_join(averages, spark)

# Pre-format table data

tab_data = all_data |>
  mutate(
    name = case_when(
      species == "co" ~ "Carbon Monoxide",
      species == "nox" ~ "Oxides of Nitrogen",
      species == "no2" ~ "Nitrogen Dioxide",
      species == "o3"  ~ "Ozone",
      species == "so2" ~ "Sulfur Dioxide",
      species == "pm25" ~ "Particulate Matter",
      species == "pm10" ~ ""
    ),
    species = case_when(
      species == "co" ~ "CO",
      species == "nox" ~ "NO<sub>x</sub>",
      species == "no2" ~ "NO<sub>2</sub>",
      species == "o3"  ~ "O<sub>3</sub>",
      species == "so2" ~ "SO<sub>2</sub>",
      species == "pm25" ~ "PM<sub>2.5</sub>",
      species == "pm10" ~ "PM<sub>10</sub>"
    )
  ) |>
  relocate(name, .before = species)

# Tabulate

our_table = gt(tab_data) |>
  
  # Set theme as ESPN
  gt_theme_espn() |>
  
  # Format columns
  fmt_number(
    columns = where(is.numeric),
    n_sigfig = 3,
    drop_trailing_zeros = T
  ) |>
  fmt_percent(columns = missing) |>
  fmt_markdown(columns = species) |>
  
  # Column labels
  cols_label(name = "Species Name",
             species = "Formula") |>
  
  # Format table (add spanners, title, footnotes, etc.)
  tab_spanner(label = "Statistics", columns = p25:missing) |>
  tab_header(
    title = md("<b>Marylebone 2003</b> | Air Quality Monitoring Summary"),
    subtitle = md(
      "The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite."
    )
  ) |>
  tab_source_note(source_note = md("The data were obtained using the <code>openair</code> package.")) |>
  tab_footnote(locations = cells_column_labels(columns = trend),
               footnote = "Monthly average. The lowest and highest months are indicated with green and red, respectively.") |>
  
  # Format sparkline
  gt_plt_sparkline(
    column = trend,
    palette = c("black", "black", "chartreuse3", "red", "black"),
    same_limit = F
  )
Marylebone 2003 | Air Quality Monitoring Summary
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite.
Species Name Formula Statistics trend1
p25 median mean p75 max missing
Carbon Monoxide

CO

0.600 0.975 1.12 1.50 4.30 0.0163 1.1
Oxides of Nitrogen

NOx

76.0 139 164 226 749 0.0627 164.1
Nitrogen Dioxide

NO2

36.0 52.0 56.0 72.0 206 0.0627 53.5
Ozone

O3

2.00 4.00 7.67 11.0 70.0 0.0368 5.7
Sulfur Dioxide

SO2

2.00 3.75 4.40 5.75 44.2 0.0386 3.3
Particulate Matter

PM2.5

11.0 17.0 19.1 25.0 83.0 0.0671 17.3

PM10

23.0 34.0 37.0 48.0 235 0.0126 30.7
The data were obtained using the openair package.
1 Monthly average. The lowest and highest months are indicated with green and red, respectively.

Level 3: Incorporating {openair} into a {gt} table

openair creates plots used the world over but, as previously discussed, is built using lattice. This is unfortunate as the gt::ggplot_image() function is designed to insert only ggplot2 graphics into gt. We can still employ the principles behind gt::ggplot_image(), we will just need to rewrite it a bit to work with lattice graphics. Let’s consider what it is doing. The key thing it is achieving is to take some list of ggplot2 objects, save them in a local temporary directory, use the gt function gt::local_image() to read them back in as images and insert them into the table, and then delete the temporary files.

Code
function (plot_object,
          height = 100,
          aspect_ratio = 1)
{
  if (is.numeric(height)) {
    height <- paste0(height, "px")
  }
  if (inherits(plot_object, "gg")) {
    plot_object <- list(plot_object)
  }
  vapply(
    seq_along(plot_object),
    FUN.VALUE = character(1),
    USE.NAMES = FALSE,
    FUN = function(x) {
      filename <- paste0("temp_ggplot_", formatC(x, width = 4,
                                                 flag = "0"), ".png")
      ggplot2::ggsave(
        filename = filename,
        plot = plot_object[[x]],
        device = "png",
        dpi = 100,
        width = 5 * aspect_ratio,
        height = 5
      )
      on.exit(file.remove(filename))
      local_image(filename = filename, height = height)
    }
  )
}

The key things that need changing are to change “gg” to “trellis”, and to change the ggplot2::ggsave() step to a more lattice-friendly way to save plots.

Code
openair_image <- function(plot_object, height) {
  if (is.numeric(height)) {
    height <- paste0(height, "px")
  }
  if (inherits(plot_object, "trellis")) {
    plot_object <- list(plot_object)
  }
  invisible(vapply(
    seq_along(plot_object),
    FUN.VALUE = character(1),
    USE.NAMES = FALSE,
    FUN = function(x) {
      filename <- paste0("temp_openair_", formatC(x,
        width = 4,
        flag = "0"
      ), ".png")

      png(
        filename = filename,
        res = 300,
        width = 4 * 300,
        height = 4 * 300,
        bg = "transparent"
      )
      print(plot_object[[x]])
      dev.off()

      on.exit(file.remove(filename))
      local_image(filename = filename, height = height)
    }
  ))
}

Now we have a function that can take a list of lattice plots and insert them into our gt table, we need to actually make our lattice plots and ensure that there are columns in our data for them to occupy.

To quickly produce openair plots for each pollutant, we can use a dplyr workflow. The below code may be an unfamiliar way of working, so to unpack:

  1. The data is “nested by” the “species” column. This means that the data frame becomes a data frame of seven rows - one for each species - and two columns - the retained “species” column, and a list-column called “data” containing all of the other data. This can be conceptually difficult, but effectively we are creating a column in a data frame which itself contains data frames. dplyr::nest_by() also “groups” by each row - any function applied to this new data frame will be applied to each row individually.

  2. dplyr::mutate() is then used to create more list-columns, this time containing the lattice plot outputs of the openair openair::pollutionRose and openair::polarPlot functions. Note the use of list() to signify that we are creating list-columns.

  3. The whole data frame is arranged by “species”. This is to ensure the plots are in the same order as our final table, and why turning them into factors right at the start was so important.

Note that this may take a minute to run!

Code
data_plots = data |>
  nest_by(species) |>
  mutate(polrose = list(
    pollutionRose(
      data,
      pollutant = "conc", 
      key = FALSE,
      par.settings = list(axis.line = list(col = "transparent"))
    )$plot
  ),
  polplot = list(polarPlot(
    data,
    pollutant = "conc", 
    key = FALSE,
    par.settings = list(axis.line = list(col = "transparent"))
  )$plot)) |>
  arrange(species)

data_plots

We now have a function to turn a column into plots, and the plots themselves. We now just need an empty couple of columns in our gt table to insert these plots into. We’ll create some empty columns in our pre-existing tab_data data frame.

Code
tab_data = tab_data |>
  mutate(polrose = NA,
         polplot = NA)

tab_data
# A tibble: 7 × 11
  name   species   p25  median   mean    p75   max missing trend polrose polplot
  <chr>  <chr>   <dbl>   <dbl>  <dbl>  <dbl> <dbl>   <dbl> <lis> <lgl>   <lgl>  
1 "Carb… CO        0.6   0.975   1.12   1.5    4.3  0.0163 <dbl> NA      NA     
2 "Oxid… NO<sub…  76   139     164.   226.   749    0.0627 <dbl> NA      NA     
3 "Nitr… NO<sub…  36    52      56.0   72    206    0.0627 <dbl> NA      NA     
4 "Ozon… O<sub>…   2     4       7.67  11     70    0.0368 <dbl> NA      NA     
5 "Sulf… SO<sub…   2     3.75    4.40   5.75  44.2  0.0386 <dbl> NA      NA     
6 "Part… PM<sub…  11    17      19.1   25     83    0.0671 <dbl> NA      NA     
7 ""     PM<sub…  23    34      37.0   48    235    0.0126 <dbl> NA      NA     

Let’s see how this looks now. Note that, once again, the tutorial_level_2() function is a stand-in for all the formatting we applied in the previous section. We now have two empty columns.

Code
our_table = gt(tab_data) |> 
  tutorial_level_2()

our_table
Marylebone 2003 | Air Quality Monitoring Summary
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite.
Species Name Formula Statistics trend1 polrose polplot
p25 median mean p75 max missing
Carbon Monoxide

CO

0.600 0.975 1.12 1.50 4.30 0.0163 1.1 NA NA
Oxides of Nitrogen

NOx

76.0 139 164 226 749 0.0627 164.1 NA NA
Nitrogen Dioxide

NO2

36.0 52.0 56.0 72.0 206 0.0627 53.5 NA NA
Ozone

O3

2.00 4.00 7.67 11.0 70.0 0.0368 5.7 NA NA
Sulfur Dioxide

SO2

2.00 3.75 4.40 5.75 44.2 0.0386 3.3 NA NA
Particulate Matter

PM2.5

11.0 17.0 19.1 25.0 83.0 0.0671 17.3 NA NA

PM10

23.0 34.0 37.0 48.0 235 0.0126 30.7 NA NA
The data were obtained using the openair package.
1 Monthly average. The lowest and highest months are indicated with green and red, respectively.

The final step is a complex one, so let’s unpack it.

  1. The function we use is gt::text_transform(). This does targeted text transformations. In this case we are transforming an empty cell into a plot.

  2. gt::text_transform(), much like gt::tab_footnote(), needs a target specified by a cells_*() function. Here we are targeting the body of the cells themselves, hence the use of gt::cells_body(). We specify our empty column (“polrose” or “polplot”).

  3. gt::text_transform() also takes a function used to transform the cells it is targeting. Here we purrr::map() over the list of plots in our data_plots data frame using our openair_image() function. This effectively transforms each NA value in the column with the corresponding plot in the data_plots data frame.

This process is why it was important that the order of the data_plots data frame was the same as the order of the table itself - if they were in the wrong order the plot would be with the wrong pollutant.

Code
our_table = our_table |>
  text_transform(
    locations = cells_body(columns = polrose),
    fn = function(x) {
      map(data_plots$polrose, openair_image, height = px(70))
    }
  ) |>
  text_transform(
    locations = cells_body(columns = polplot),
    fn = function(x) {
      map(data_plots$polplot, openair_image, height = px(70))
    }
  )

our_table
Marylebone 2003 | Air Quality Monitoring Summary
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite.
Species Name Formula Statistics trend1 polrose polplot
p25 median mean p75 max missing
Carbon Monoxide

CO

0.600 0.975 1.12 1.50 4.30 0.0163 1.1
Oxides of Nitrogen

NOx

76.0 139 164 226 749 0.0627 164.1
Nitrogen Dioxide

NO2

36.0 52.0 56.0 72.0 206 0.0627 53.5
Ozone

O3

2.00 4.00 7.67 11.0 70.0 0.0368 5.7
Sulfur Dioxide

SO2

2.00 3.75 4.40 5.75 44.2 0.0386 3.3
Particulate Matter

PM2.5

11.0 17.0 19.1 25.0 83.0 0.0671 17.3

PM10

23.0 34.0 37.0 48.0 235 0.0126 30.7
The data were obtained using the openair package.
1 Monthly average. The lowest and highest months are indicated with green and red, respectively.

The table is nearing completion, so there is just a couple of final refinements that could be made. Some personal choices as last minute refinements are:

Code
our_table = our_table |>
  tab_options(table.font.size = 12) |>
  tab_spanner(label = "Visualisations", columns = trend:polplot) |>
  cols_label(polrose = "Poll. Rose",
             polplot = "Polar Plt.") |>
  cols_width(c(p25, median, mean, p75, max, missing) ~ px(50),
             name ~ px(120)) |>
  cols_align(align = "center", columns = 2)

our_table
Marylebone 2003 | Air Quality Monitoring Summary
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite.
Species Name Formula Statistics Visualisations
p25 median mean p75 max missing trend1 Poll. Rose Polar Plt.
Carbon Monoxide

CO

0.600 0.975 1.12 1.50 4.30 0.0163 1.1
Oxides of Nitrogen

NOx

76.0 139 164 226 749 0.0627 164.1
Nitrogen Dioxide

NO2

36.0 52.0 56.0 72.0 206 0.0627 53.5
Ozone

O3

2.00 4.00 7.67 11.0 70.0 0.0368 5.7
Sulfur Dioxide

SO2

2.00 3.75 4.40 5.75 44.2 0.0386 3.3
Particulate Matter

PM2.5

11.0 17.0 19.1 25.0 83.0 0.0671 17.3

PM10

23.0 34.0 37.0 48.0 235 0.0126 30.7
The data were obtained using the openair package.
1 Monthly average. The lowest and highest months are indicated with green and red, respectively.

These are last-minute, personal refinements, but demonstrate the depth of the gt package. It is a good idea to explore the functions of gt and gtExtras further to really get down to what they both can do!

The key achievement of this section was to add the openair plots to our gt table. The “Code” tab contains all the code needed to recreate the table “Table” tab, slightly refined for readability (putting things in a better order, and with comments).

Code
# Calculate averages and get data for sparkline, and combine

data = mydata |>
  mutate(year = lubridate::year(date)) |>
  filter(year == 2003) |>
  pivot_longer(nox:pm25, names_to = "species", values_to = "conc") |>
  mutate(species = factor(species, c(
    "co", "nox", "no2", "o3", "so2", "pm25", "pm10"
  ))) |>
  arrange(species)

averages = data |>
  group_by(species) |>
  summarise(
    p25 = quantile(conc, .25, na.rm = T),
    median = median(conc, na.rm = T),
    mean   = mean(conc, na.rm = T),
    p75 = quantile(conc, .75, na.rm = T),
    max    = max(conc, na.rm = T),
    missing = sum(is.na(conc)) / n()
  )

spark = data |>
  timeAverage(avg.time = "month", type = "species") |>
  group_by(species) |>
  summarise(trend = list(conc))

all_data = left_join(averages, spark) |>
  mutate(polrose = NA,
         polplot = NA)

# Function to tabulate plots

openair_image <- function(plot_object, height) {
  if (is.numeric(height)) {
    height <- paste0(height, "px")
  }
  if (inherits(plot_object, "trellis")) {
    plot_object <- list(plot_object)
  }
  invisible(vapply(
    seq_along(plot_object),
    FUN.VALUE = character(1),
    USE.NAMES = FALSE,
    FUN = function(x) {
      filename <- paste0("temp_openair_", formatC(x,
        width = 4,
        flag = "0"
      ), ".png")

      png(
        filename = filename,
        res = 300,
        width = 4 * 300,
        height = 4 * 300,
        bg = "transparent"
      )
      print(plot_object[[x]])
      dev.off()

      on.exit(file.remove(filename))
      local_image(filename = filename, height = height)
    }
  ))
}

# List of plots

data_plots = data |>
  nest_by(species) |>
  mutate(polrose = list(
    pollutionRose(
      data,
      pollutant = "conc", 
      key = FALSE,
      par.settings = list(axis.line = list(col = "transparent"))
    )$plot
  ),
  polplot = list(polarPlot(
    data,
    pollutant = "conc", 
    key = FALSE,
    par.settings = list(axis.line = list(col = "transparent"))
  )$plot)) |>
  arrange(species)

# Pre-format table data

tab_data = all_data |>
  mutate(
    name = case_when(
      species == "co" ~ "Carbon Monoxide",
      species == "nox" ~ "Oxides of Nitrogen",
      species == "no2" ~ "Nitrogen Dioxide",
      species == "o3"  ~ "Ozone",
      species == "so2" ~ "Sulfur Dioxide",
      species == "pm25" ~ "Particulate Matter",
      species == "pm10" ~ ""
    ),
    species = case_when(
      species == "co" ~ "CO",
      species == "nox" ~ "NO<sub>x</sub>",
      species == "no2" ~ "NO<sub>2</sub>",
      species == "o3"  ~ "O<sub>3</sub>",
      species == "so2" ~ "SO<sub>2</sub>",
      species == "pm25" ~ "PM<sub>2.5</sub>",
      species == "pm10" ~ "PM<sub>10</sub>"
    )
  ) |>
  relocate(name, .before = species)

# Tabulate

our_table = gt(tab_data) |>
  
  # Set theme as ESPN
  gt_theme_espn() |>
  
  # Format columns
  fmt_number(
    columns = where(is.numeric),
    n_sigfig = 3,
    drop_trailing_zeros = T
  ) |>
  fmt_percent(columns = missing) |>
  fmt_markdown(columns = species) |>
  
  # Column labels, widths, and alignments
  cols_label(
    name = "Species Name",
    species = "Formula",
    polrose = "Poll. Rose",
    polplot = "Polar Plt."
  ) |>
  cols_width(c(p25, median, mean, p75, max, missing) ~ px(50),
             name ~ px(120)) |>
  cols_align(align = "left", columns = 1) |>
  cols_align(align = "center", columns = 2) |>
  
  # Format table (add spanners, title, footnotes, set font size, etc.)
  tab_options(table.font.size = 12) |>
  tab_spanner(label = "Statistics", columns = p25:missing) |>
  tab_spanner(label = "Visualisations", columns = trend:polplot) |>
  tab_header(
    title = md("<b>Marylebone 2003</b> | Air Quality Monitoring Summary"),
    subtitle = md(
      "The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite."
    )
  ) |>
  tab_source_note(source_note = md("The data were obtained using the <code>openair</code> package.")) |>
  tab_footnote(locations = cells_column_labels(columns = trend),
               footnote = "Monthly average. The lowest and highest months are indicated with green and red, respectively.") |>
  
  # Format sparkline
  gt_plt_sparkline(
    column = trend,
    palette = c("black", "black", "chartreuse3", "red", "black"),
    same_limit = F
  ) |>
  
  # replace empty columns with plots
  text_transform(
    locations = cells_body(polrose),
    fn = function(x) {
      map(data_plots$polrose, openair_image, height = px(70))
    }
  ) |>
  text_transform(
    locations = cells_body(polplot),
    fn = function(x) {
      map(data_plots$polplot, openair_image, height = px(70))
    }
  )
Marylebone 2003 | Air Quality Monitoring Summary
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite.
Species Name Formula Statistics Visualisations
p25 median mean p75 max missing trend1 Poll. Rose Polar Plt.
Carbon Monoxide

CO

0.600 0.975 1.12 1.50 4.30 0.0163 1.1
Oxides of Nitrogen

NOx

76.0 139 164 226 749 0.0627 164.1
Nitrogen Dioxide

NO2

36.0 52.0 56.0 72.0 206 0.0627 53.5
Ozone

O3

2.00 4.00 7.67 11.0 70.0 0.0368 5.7
Sulfur Dioxide

SO2

2.00 3.75 4.40 5.75 44.2 0.0386 3.3
Particulate Matter

PM2.5

11.0 17.0 19.1 25.0 83.0 0.0671 17.3

PM10

23.0 34.0 37.0 48.0 235 0.0126 30.7
The data were obtained using the openair package.
1 Monthly average. The lowest and highest months are indicated with green and red, respectively.

3 Conclusion

gt is an incredibly powerful tool to create impressive HTML tables, with features well suited to presenting air quality data. We’ve progressed through three levels of complexity:

  1. Getting to grips with gt, clearly presenting air quality monitoring statistics in an attractive way.

  2. Incorporating visualisations into the plot, combining the classic time series plot with our HTML table, allowing for at-a-glance understanding of data trends.

  3. Adding industry-standard openair lattice-based plots to our table, enabling clients to explore more closely the pollutants in which they are most interested.

Even in these three sections we have just scratched the surface of what gt can do for presenting air quality data. It would be great to see more gt tables out in the wild, being used to great effect by the global air quality community.

R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.utf8 
[2] LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] gtExtras_0.4.1  gt_0.7.0        openair_2.10-0  forcats_0.5.2  
 [5] stringr_1.4.1   dplyr_1.0.9     purrr_0.3.4     readr_2.1.2    
 [9] tidyr_1.2.0     tibble_3.1.8    ggplot2_3.3.6   tidyverse_1.3.2

loaded via a namespace (and not attached):
 [1] nlme_3.1-157        fs_1.5.2            lubridate_1.8.0    
 [4] fontawesome_0.3.0   RColorBrewer_1.1-3  httr_1.4.4         
 [7] tools_4.2.1         backports_1.4.1     utf8_1.2.2         
[10] R6_2.5.1            DBI_1.1.3           mgcv_1.8-40        
[13] colorspace_2.0-3    withr_2.5.0         tidyselect_1.1.2   
[16] compiler_4.2.1      textshaping_0.3.6   cli_3.3.0          
[19] rvest_1.0.3         xml2_1.3.3          sass_0.4.2         
[22] labeling_0.4.2      scales_1.2.1        hexbin_1.28.2      
[25] systemfonts_1.0.4   commonmark_1.8.0    digest_0.6.29      
[28] svglite_2.1.0       rmarkdown_2.16      base64enc_0.1-3    
[31] jpeg_0.1-9          pkgconfig_2.0.3     htmltools_0.5.3    
[34] dbplyr_2.2.1        fastmap_1.1.0       maps_3.4.0         
[37] htmlwidgets_1.5.4   rlang_1.0.4         readxl_1.4.1       
[40] rstudioapi_0.14     farver_2.1.1        generics_0.1.3     
[43] jsonlite_1.8.0      googlesheets4_1.0.1 magrittr_2.0.3     
[46] interp_1.1-3        Matrix_1.4-1        Rcpp_1.0.9         
[49] munsell_0.5.0       fansi_1.0.3         lifecycle_1.0.1    
[52] stringi_1.7.8       yaml_2.3.5          MASS_7.3-57        
[55] grid_4.2.1          paletteer_1.4.1     crayon_1.5.1       
[58] deldir_1.0-6        lattice_0.20-45     haven_2.5.1        
[61] splines_4.2.1       mapproj_1.2.8       hms_1.1.2          
[64] knitr_1.40          pillar_1.8.1        reprex_2.0.2       
[67] glue_1.6.2          evaluate_0.16       latticeExtra_0.6-30
[70] modelr_0.1.9        png_0.1-7           vctrs_0.4.1        
[73] tzdb_0.3.0          cellranger_1.1.0    gtable_0.3.0       
[76] rematch2_2.1.2      assertthat_0.2.1    xfun_0.32          
[79] broom_1.0.0         ragg_1.2.2          googledrive_2.0.0  
[82] gargle_1.2.0        cluster_2.1.3       ellipsis_0.3.2