2022 Update

This is the second iteration of this tutorial. While much of the content has remained the same, a few key user improvements have been made. Many of these – such as code linking/copying/collapsing – owe to the tutorial’s conversion from rmarkdown to quarto. The styling of the openair plots has also been improved, taking lessons from the in-development openairmaps package.

1 Introduction

This tutorial will take you through three “levels” of summarising air quality monitoring data.

First, we will begin to use the gt package, a powerful package that is to tables what ggplot2 is to figures.
Then, we will incorporate the gtExtras package to further theme and add in-table visualisations.
Finally, we will investigate how to incorporate openair plots - plots extensively around the world by the air quality community - into our gt tables.

That last level is perhaps easier said than done - to quote the openair book:

openair is a product of the time it was started and used the highly capable lattice package for plotting. This was a time before ggplot2 and the tidyverse.

gt has functions that help with the inclusion of ggplot2, but not lattice, so this will require some modification of pre-existing gt functions that will be covered in the third part of the tutorial.

This tutorial assumes a general understanding of the tidyverse and openair, but is suitable for gt novices. The ideal target is an air quality professional or researcher who is interested in exploring new tools for presenting their data.

If you would like to know more about openair and reading plots, the best place to look is the openair book, available online.
If you would like to know more about the tidyverse, there are many resources available including R for Data Science.

All code in this document is reproducible; even the data used comes straight from the openair package (see the “Installation and Data” tab).

The tidyverse, openair and gt are all available on CRAN, so can be downloaded using install.packages().

Code

install.packages(tidyverse)
install.packages(openair)
install.packages(openair)

gtExtras is still under active development, so can be downloaded directly from it’s GitHub repository. I would also recommend keeping up to date with it on its website.

Code

# if needed install.packages("remotes")
remotes::install_github("jthomasmock/gtExtras")

The data that is going to be used in this tutorial is adapted from openair::mydata.

Code

head(mydata)

# A tibble: 6 × 10
  date                   ws    wd   nox   no2    o3  pm10   so2    co  pm25
  <dttm>              <dbl> <int> <int> <int> <int> <int> <dbl> <dbl> <int>
1 1998-01-01 00:00:00  0.6    280   285    39     1    29  4.72  3.37    NA
2 1998-01-01 01:00:00  2.16   230    NA    NA    NA    37 NA    NA       NA
3 1998-01-01 02:00:00  2.76   190    NA    NA     3    34  6.83  9.60    NA
4 1998-01-01 03:00:00  2.16   170   493    52     3    35  7.66 10.2     NA
5 1998-01-01 04:00:00  2.4    180   468    78     2    34  8.07  8.91    NA
6 1998-01-01 05:00:00  3      190   264    42     0    16  5.50  3.05    NA

The goal for this tutorial is to produce a summary table of air quality pollutants at a single site for a single year (feel free to click “Show me the final table” to see). The principles behind this could produce much different table structures however (e.g., a pollutant per row, a year per row, a site per row, some combination of the three, some other idea…).

For this table, the initial transformations that need to be done are as follows:

Filter for a single year (2003 is chosen arbitrarily, feel free to pick any year in the mydata dataset).
Reshape the data from a “wide” to “long” format, such that the concentrations are all in one column.
Turn the “species” column into a factor in the order we would like them to appear in the final table.

Code

data = mydata |>
  mutate(year = lubridate::year(date)) |>
  filter(year == 2003) |>
  pivot_longer(nox:pm25, names_to = "species", values_to = "conc") |>
  mutate(species = factor(species, c(
    "co", "nox", "no2", "o3", "so2", "pm25", "pm10"
  )))

The final structure is now:

Code

head(data, n = 10)

# A tibble: 10 × 6
   date                   ws    wd  year species  conc
   <dttm>              <dbl> <int> <dbl> <fct>   <dbl>
 1 2003-01-01 00:00:00   5.2   160  2003 co      0.675
 2 2003-01-01 01:00:00   4.6   140  2003 co      0.967
 3 2003-01-01 02:00:00   3.6   140  2003 co      1.2  
 4 2003-01-01 03:00:00   4.6   140  2003 co      1.15 
 5 2003-01-01 04:00:00   5.7   140  2003 co      0.725
 6 2003-01-01 05:00:00   4.6   140  2003 co      0.575
 7 2003-01-01 06:00:00   4.1   160  2003 co      0.525
 8 2003-01-01 07:00:00   5.2   160  2003 co      0.8  
 9 2003-01-01 08:00:00   3.6   160  2003 co      1    
10 2003-01-01 09:00:00   3.6   180  2003 co      0.925

Below you can see the table we’re aiming for by the end of Level 3.

Species Name	Formula	Statistics						Visualisations
Marylebone 2003 \| Air Quality Monitoring Summary
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite.
Species Name	Formula	p25	median	mean	p75	max	missing	trend¹	Poll. Rose	Polar Plt.
Carbon Monoxide	CO	0.600	0.975	1.12	1.50	4.30	1.63%
Oxides of Nitrogen	NO_x	76.0	139	164	226	749	6.27%
Nitrogen Dioxide	NO₂	36.0	52.0	56.0	72.0	206	6.27%
Ozone	O₃	2.00	4.00	7.67	11.0	70.0	3.68%
Sulfur Dioxide	SO₂	2.00	3.75	4.40	5.75	44.2	3.86%
Particulate Matter	PM_2.5	11.0	17.0	19.1	25.0	83.0	6.71%
	PM₁₀	23.0	34.0	37.0	48.0	235	1.26%
The data were obtained using the `openair` package.
¹ Monthly average. The lowest and highest months are indicated with green and red, respectively.

2 Tutorials

Level 1: Using `{gt}` to present summary statistics

Much as a ggplot2 plot begins with ggplot2::ggplot(), a gt table begins with gt::gt(). However, while frequently a ggplot2::ggplot() function can be fed all of our raw data, it can be easier to provide gt::gt() with a pre-made summary. Therefore our first step is to summarise our air quality data in some way. In real life, the summary statistics you chose will likely align with the legislation of your country or the sorts of statistics a client wants, but here some simple statistics are chosen.

The below code calculates the 25th percentile, median, mean, 7th percentile and maximum of each pollutant, as well as the percentage of data that is missing. Note the trick used for that last statistic - sum(is.na(conc)) checks whether each value in the column is NA and then sums that column of TRUE/FALSE values, where TRUE = 1 and FALSE = 0. dplyr::n() gives the number of rows in the group, meaning that the division of the two is effectively “number of missing divided by total”.

Summary statistics could be calculated however we’d like - including using openair functions - but this code does the job.

Code

averages = data |>
  group_by(species) |>
  summarise(
    p25 = quantile(conc, .25, na.rm = T),
    median = median(conc, na.rm = T),
    mean = mean(conc, na.rm = T),
    p75 = quantile(conc, .75, na.rm = T),
    max = max(conc, na.rm = T),
    missing = sum(is.na(conc)) / n()
  )

averages

# A tibble: 7 × 7
  species   p25  median   mean    p75   max missing
  <fct>   <dbl>   <dbl>  <dbl>  <dbl> <dbl>   <dbl>
1 co        0.6   0.975   1.12   1.5    4.3  0.0163
2 nox      76   139     164.   226.   749    0.0627
3 no2      36    52      56.0   72    206    0.0627
4 o3        2     4       7.67  11     70    0.0368
5 so2       2     3.75    4.40   5.75  44.2  0.0386
6 pm25     11    17      19.1   25     83    0.0671
7 pm10     23    34      37.0   48    235    0.0126

Now we’re at the position where we can begin a gt table. As alluded to earlier, we’ll begin by providing the gt::gt() function with our data.

Code

our_table = gt(averages)

our_table

species	p25	median	mean	p75	max	missing
co	0.6	0.975	1.118069	1.50	4.30	0.01632420
nox	76.0	139.000	163.941055	225.50	749.00	0.06267123
no2	36.0	52.000	55.964682	72.00	206.00	0.06267123
o3	2.0	4.000	7.673975	11.00	70.00	0.03675799
so2	2.0	3.750	4.398658	5.75	44.25	0.03858447
pm25	11.0	17.000	19.071341	25.00	83.00	0.06712329
pm10	23.0	34.000	37.009133	48.00	235.00	0.01255708

gt has produced a HTML table which would look excellent in an rmarkdown or quarto report, but it clearly could do with some finessing. For example, it may be useful to format the numeric columns to be a little shorter - we don’t need that many significant figures. gt has a collection of fmt_*() functions for this purpose. In our case, gt::fmt_number().

Note that, unlike ggplot2 where additional geom_*() functions are added, in gt additional features like fmt_*() are piped.

fmt_*() functions require columns to be specified. This argument works on tidyselect principles, so when we provide the argument where(is.numeric) we are telling it to format all numeric columns (in this case, all columns besides “species”). We could specify a number of decimal places or significant figures - in this case the latter - and we can also tell the column to drop trailing zeroes. This means that the median of the NO_x row won’t read “163.000…” just because the CO row goes to more decimal places.

Code

our_table = our_table |>
  fmt_number(
    columns = where(is.numeric),
    n_sigfig = 3,
    drop_trailing_zeros = T
  )

our_table

species	p25	median	mean	p75	max	missing
co	0.600	0.975	1.12	1.50	4.30	0.0163
nox	76.0	139	164	226	749	0.0627
no2	36.0	52.0	56.0	72.0	206	0.0627
o3	2.00	4.00	7.67	11.0	70.0	0.0368
so2	2.00	3.75	4.40	5.75	44.2	0.0386
pm25	11.0	17.0	19.1	25.0	83.0	0.0671
pm10	23.0	34.0	37.0	48.0	235	0.0126

This isn’t quite right as the “missing” column is actually a percentage. We can override the previous gt::fmt_number() column by piping this new output into gt::fmt_percent() and specifying the “missing” column. Note that, much like in the tidyverse, columns are provided unquoted.

Code

our_table = our_table |>
  fmt_percent(columns = missing)

our_table

species	p25	median	mean	p75	max	missing
co	0.600	0.975	1.12	1.50	4.30	0.0163
nox	76.0	139	164	226	749	0.0627
no2	36.0	52.0	56.0	72.0	206	0.0627
o3	2.00	4.00	7.67	11.0	70.0	0.0368
so2	2.00	3.75	4.40	5.75	44.2	0.0386
pm25	11.0	17.0	19.1	25.0	83.0	0.0671
pm10	23.0	34.0	37.0	48.0	235	0.0126

The “species” column could also be refined - currently the second row reads “nox” when it should read “NO_x”. gt supports markdown syntax for achieving things like subscripts, bold text and the like. In this case, we’ll change our “species” column to be written as markdown. Once again, we can use a fmt_*() function, this time to format the column as “markdown”.

Just for completion, while we’re using dplyr::mutate() let’s insert the English names of the pollutants too, and put them at the far left of the table.

Code

tab_data = averages |>
  mutate(
    name = case_when(
      species == "co" ~ "Carbon Monoxide",
      species == "nox" ~ "Oxides of Nitrogen",
      species == "no2" ~ "Nitrogen Dioxide",
      species == "o3"  ~ "Ozone",
      species == "so2" ~ "Sulfur Dioxide",
      species == "pm25" ~ "Particulate Matter",
      species == "pm10" ~ ""
    ),
    species = case_when(
      species == "co" ~ "CO",
      species == "nox" ~ "NO<sub>x</sub>",
      species == "no2" ~ "NO<sub>2</sub>",
      species == "o3"  ~ "O<sub>3</sub>",
      species == "so2" ~ "SO<sub>2</sub>",
      species == "pm25" ~ "PM<sub>2.5</sub>",
      species == "pm10" ~ "PM<sub>10</sub>"
    )
  ) |>
  relocate(name, .before = species)

our_table = gt(tab_data) |>
  fmt_number(
    columns = where(is.numeric),
    n_sigfig = 3,
    drop_trailing_zeros = T
  ) |>
  fmt_percent(columns = missing) |>
  fmt_markdown(columns = species)

our_table

name	species	p25	median	mean	p75	max	missing
Carbon Monoxide	CO	0.600	0.975	1.12	1.50	4.30	0.0163
Oxides of Nitrogen	NO_x	76.0	139	164	226	749	0.0627
Nitrogen Dioxide	NO₂	36.0	52.0	56.0	72.0	206	0.0627
Ozone	O₃	2.00	4.00	7.67	11.0	70.0	0.0368
Sulfur Dioxide	SO₂	2.00	3.75	4.40	5.75	44.2	0.0386
Particulate Matter	PM_2.5	11.0	17.0	19.1	25.0	83.0	0.0671
	PM₁₀	23.0	34.0	37.0	48.0	235	0.0126

By default, gt simply carries over the column names to be the header names, but much like the axis labels that ggplot2 defaults to these will often want changing. Column names in data are often written to be easy to use, but column names in a table should be written to be easy to read. It may be useful at this stage to refer to the “elements of a table” diagram from the gt website.

fig:The elements of a gt table. — “The elements of a gt table.”

We are wanting to control something to do with columns so we’ll use a cols_*() function. This family of functions can move columns around, merge them, hide them and - important right now - rename them. The syntax of gt::cols_label() is almost the inverse of dplyr::rename() - instead of “new = old”, “old = new” is provided. The new name should also be given as a string.

Code

our_table = our_table |>
  cols_label(name = "Species Name",
             species = "Formula")

our_table

Species Name	Formula	p25	median	mean	p75	max	missing
Carbon Monoxide	CO	0.600	0.975	1.12	1.50	4.30	0.0163
Oxides of Nitrogen	NO_x	76.0	139	164	226	749	0.0627
Nitrogen Dioxide	NO₂	36.0	52.0	56.0	72.0	206	0.0627
Ozone	O₃	2.00	4.00	7.67	11.0	70.0	0.0368
Sulfur Dioxide	SO₂	2.00	3.75	4.40	5.75	44.2	0.0386
Particulate Matter	PM_2.5	11.0	17.0	19.1	25.0	83.0	0.0671
	PM₁₀	23.0	34.0	37.0	48.0	235	0.0126

This table is already looking great, but we can refine it on a whole table level. To make large scale changes, we can use the a tab_*() function. These could be things like adding a title, subtitle, footnote, or a “spanner title” that groups certain columns together. In the next code chunk we’ll implement all of these things. Note that we can format elements like the figure title using markdown by wrapping the string in the gt::md() function.

Code

our_table = our_table |>
  tab_spanner(label = "Statistics", columns = p25:missing) |>
  tab_header(
    title = md("<b>Marylebone 2003</b> | Air Quality Monitoring Summary"),
    subtitle = md(
      "The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite."
    )
  ) |>
  tab_source_note(source_note = md("The data were obtained using the <code>openair</code> package."))

our_table

Species Name	Formula	Statistics
Marylebone 2003 \| Air Quality Monitoring Summary
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite.
Species Name	Formula	p25	median	mean	p75	max	missing
Carbon Monoxide	CO	0.600	0.975	1.12	1.50	4.30	0.0163
Oxides of Nitrogen	NO_x	76.0	139	164	226	749	0.0627
Nitrogen Dioxide	NO₂	36.0	52.0	56.0	72.0	206	0.0627
Ozone	O₃	2.00	4.00	7.67	11.0	70.0	0.0368
Sulfur Dioxide	SO₂	2.00	3.75	4.40	5.75	44.2	0.0386
Particulate Matter	PM_2.5	11.0	17.0	19.1	25.0	83.0	0.0671
	PM₁₀	23.0	34.0	37.0	48.0	235	0.0126
The data were obtained using the `openair` package.

This is a good stopping point to review the code we’ve used so far. Notice what we have done:

We have created a summary data frame, and then turned it into a HTML table using gt().
We used fmt_*() functions to format numeric and character columns.
We used a col_*() function to change a property of a column (in this case, it’s name)
We used tab_*() functions to make table-wide changes - giving it a title, spanner headers, etc.

Review the full code from this level in the “Code” tab. Once you’re comfortable with it, feel free to move onto Level 2 where gtExtras will be incorporated to enhance the table’s theme and introduce in-line visualisations.

Code

data = mydata |>
  mutate(year = lubridate::year(date)) |>
  filter(year == 2003) |>
  pivot_longer(nox:pm25, names_to = "species", values_to = "conc") |>
  mutate(species = factor(species, c(
    "co", "nox", "no2", "o3", "so2", "pm25", "pm10"
  ))) |>
  arrange(species)

averages = data |>
  group_by(species) |>
  summarise(
    p25 = quantile(conc, .25, na.rm = T),
    median = median(conc, na.rm = T),
    mean   = mean(conc, na.rm = T),
    p75 = quantile(conc, .75, na.rm = T),
    max    = max(conc, na.rm = T),
    missing = sum(is.na(conc)) / n()
  )

tab_data = averages |>
  mutate(
    name = case_when(
      species == "co" ~ "Carbon Monoxide",
      species == "nox" ~ "Oxides of Nitrogen",
      species == "no2" ~ "Nitrogen Dioxide",
      species == "o3"  ~ "Ozone",
      species == "so2" ~ "Sulfur Dioxide",
      species == "pm25" ~ "Particulate Matter",
      species == "pm10" ~ ""
    ),
    species = case_when(
      species == "co" ~ "CO",
      species == "nox" ~ "NO<sub>x</sub>",
      species == "no2" ~ "NO<sub>2</sub>",
      species == "o3"  ~ "O<sub>3</sub>",
      species == "so2" ~ "SO<sub>2</sub>",
      species == "pm25" ~ "PM<sub>2.5</sub>",
      species == "pm10" ~ "PM<sub>10</sub>"
    )
  ) |>
  relocate(name, .before = species)

our_table = gt(tab_data) |>
  
  # Format columns
  fmt_number(
    columns = where(is.numeric),
    n_sigfig = 3,
    drop_trailing_zeros = T
  ) |>
  fmt_percent(columns = missing) |>
  fmt_markdown(columns = species) |>
  
  # Column labels
  cols_label(name = "Species Name",
             species = "Formula") |>
  
  # Format table (add spanners, title, footnotes, etc.)
  tab_spanner(label = "Statistics", columns = p25:missing) |>
  tab_header(
    title = md("<b>Marylebone 2003</b> | Air Quality Monitoring Summary"),
    subtitle = md(
      "The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite."
    )
  ) |>
  tab_source_note(source_note = md("The data were obtained using the <code>openair</code> package."))

Species Name	Formula	Statistics
Marylebone 2003 \| Air Quality Monitoring Summary
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite.
Species Name	Formula	p25	median	mean	p75	max	missing
Carbon Monoxide	CO	0.600	0.975	1.12	1.50	4.30	0.0163
Oxides of Nitrogen	NO_x	76.0	139	164	226	749	0.0627
Nitrogen Dioxide	NO₂	36.0	52.0	56.0	72.0	206	0.0627
Ozone	O₃	2.00	4.00	7.67	11.0	70.0	0.0368
Sulfur Dioxide	SO₂	2.00	3.75	4.40	5.75	44.2	0.0386
Particulate Matter	PM_2.5	11.0	17.0	19.1	25.0	83.0	0.0671
	PM₁₀	23.0	34.0	37.0	48.0	235	0.0126
The data were obtained using the `openair` package.

Level 2: Using `{gtExtras}` to add in-line visualisations

NB: The gtExtras package is currently under extremely active development, so be prepared for breaking changes until functionality is added to gt or gtExtras gets a stable release.

The gtExtras package provides some helper functions to simplify the use of gt somewhat. This section will focus in on some of its key functionality, particularly the addition of “sparklines” to represent trends in a pollutant.

First, however, we’ll look at the use of the gt_theme_*() functions, which - much like ggplot2::theme_*() functions - make large changes to the appearance of our table. The themes are based on many popular websites, such as FiveThirtyEight, the Guardian newspaper, the New York Times, and more. Personally, I think the ESPN theme looks nice and clean.

Code

our_table = our_table |>
  gt_theme_espn()

our_table

Species Name	Formula	Statistics
Marylebone 2003 \| Air Quality Monitoring Summary
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite.
Species Name	Formula	p25	median	mean	p75	max	missing
Carbon Monoxide	CO	0.600	0.975	1.12	1.50	4.30	0.0163
Oxides of Nitrogen	NO_x	76.0	139	164	226	749	0.0627
Nitrogen Dioxide	NO₂	36.0	52.0	56.0	72.0	206	0.0627
Ozone	O₃	2.00	4.00	7.67	11.0	70.0	0.0368
Sulfur Dioxide	SO₂	2.00	3.75	4.40	5.75	44.2	0.0386
Particulate Matter	PM_2.5	11.0	17.0	19.1	25.0	83.0	0.0671
	PM₁₀	23.0	34.0	37.0	48.0	235	0.0126
The data were obtained using the `openair` package.

On the topic of changing the appearence of a plot, gtExtras contains the gtExtras::gt_highlight_rows() function, which could be useful to draw attention to a certain row of a table. This could be a pollutant that is of particular interest, sites that exceed emission limits, or the current year (depending on how your table is formatted). There is also the equivalent gtExtras::gt_highlight_cols() function.

This functionality is demonstrated below, but we won’t be taking it any further in this case. The “rows” argument can take a logical expression (here checking that the row is either NO₂ or NO_x). Certain columns can also be “targeted”, meaning that only certain cells are put in bold.

Code

our_table |>
  gt_highlight_rows(
    rows = name %in% c("Nitrogen Dioxide", "Oxides of Nitrogen"),
    fill = "lightyellow",
    target_col = c(name, species),
    bold_target_only = T
  )

Species Name	Formula	Statistics
Marylebone 2003 \| Air Quality Monitoring Summary
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite.
Species Name	Formula	p25	median	mean	p75	max	missing
Carbon Monoxide	CO	0.600	0.975	1.12	1.50	4.30	0.0163
Oxides of Nitrogen	NO_x	76.0	139	164	226	749	0.0627
Nitrogen Dioxide	NO₂	36.0	52.0	56.0	72.0	206	0.0627
Ozone	O₃	2.00	4.00	7.67	11.0	70.0	0.0368
Sulfur Dioxide	SO₂	2.00	3.75	4.40	5.75	44.2	0.0386
Particulate Matter	PM_2.5	11.0	17.0	19.1	25.0	83.0	0.0671
	PM₁₀	23.0	34.0	37.0	48.0	235	0.0126
The data were obtained using the `openair` package.

The most interesting functionality, perhaps, is the idea of in-line visualisations. These allow us to, at a glance, understand the distribution of some numeric data or - in our case - the trend in a time series. Plotting a time series is a classic part of air quality monitoring data analysis - openair itself has a dedicated function for it.

Code

timePlot(
  data,
  pollutant = "conc",
  type = "species",
  y.relation = "free",
  key = F,
  avg.time = "week"
)

To get a “sparkline” into our table, there are two main steps:

Create a list column which contains a list of the data to be plotted as a sparkline. This means that there will be a column where each row is a list of data.
Use gtExtras to format this column as a sparkline.

The code outlined below will achieve Step 1. openair is used to time average our data to be monthly - this will ensure our sparkline won’t be totally unreadable. The most unusual step is perhaps the dplyr::summarise(trend = list(conc)) step. This is creating a data frame - one row per “species” as per the dplyr::group_by() - which has the column “trend” which contains a list of the concentration data for the corresponding species. Notice the structure below - the “trend” column is populated by lists of 12 “dbl” values - one for each month of the year.

Code

spark = data |>
  timeAverage(avg.time = "month", type = "species") |>
  group_by(species) |>
  summarise(trend = list(conc))

all_data = left_join(averages, spark)

Joining, by = "species"

Code

all_data

# A tibble: 7 × 8
  species   p25  median   mean    p75   max missing trend     
  <fct>   <dbl>   <dbl>  <dbl>  <dbl> <dbl>   <dbl> <list>    
1 co        0.6   0.975   1.12   1.5    4.3  0.0163 <dbl [12]>
2 nox      76   139     164.   226.   749    0.0627 <dbl [12]>
3 no2      36    52      56.0   72    206    0.0627 <dbl [12]>
4 o3        2     4       7.67  11     70    0.0368 <dbl [12]>
5 so2       2     3.75    4.40   5.75  44.2  0.0386 <dbl [12]>
6 pm25     11    17      19.1   25     83    0.0671 <dbl [12]>
7 pm10     23    34      37.0   48    235    0.0126 <dbl [12]>

With our new data - now including the list column - let’s re-apply all of the code we have so far and see what the “raw” gt output is. Note that the tutorial_level_1() function is a stand-in for all the formatting we applied in the previous section. Notice that all of the values of each list have now been printed - not what we’re after.

Code

tab_data = all_data |>
  mutate(
    name = case_when(
      species == "co" ~ "Carbon Monoxide",
      species == "nox" ~ "Oxides of Nitrogen",
      species == "no2" ~ "Nitrogen Dioxide",
      species == "o3"  ~ "Ozone",
      species == "so2" ~ "Sulfur Dioxide",
      species == "pm25" ~ "Particulate Matter",
      species == "pm10" ~ ""
    ),
    species = case_when(
      species == "co" ~ "CO",
      species == "nox" ~ "NO<sub>x</sub>",
      species == "no2" ~ "NO<sub>2</sub>",
      species == "o3"  ~ "O<sub>3</sub>",
      species == "so2" ~ "SO<sub>2</sub>",
      species == "pm25" ~ "PM<sub>2.5</sub>",
      species == "pm10" ~ "PM<sub>10</sub>"
    )
  ) |>
  relocate(name, .before = species)

our_table = gt(tab_data) |>
  tutorial_level_1() |>
  gt_theme_espn()

our_table

Species Name	Formula	Statistics						trend
Marylebone 2003 \| Air Quality Monitoring Summary
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite.
Species Name	Formula	p25	median	mean	p75	max	missing	trend
Carbon Monoxide	CO	0.600	0.975	1.12	1.50	4.30	0.0163	1.1156836, 1.2255305, 1.1926563, 0.9887262, 1.2467706, 1.1988858, 1.1338768, 0.8610005, 1.1812325, 0.8946839, 1.2881151, 1.0920393
Oxides of Nitrogen	NO_x	76.0	139	164	226	749	0.0627	155.2497, 196.4437, 164.6905, 132.4206, 179.0217, 163.5314, 142.0042, 109.2338, 190.7626, 147.3095, 205.6243, 164.0558
Nitrogen Dioxide	NO₂	36.0	52.0	56.0	72.0	206	0.0627	44.21981, 54.23906, 57.30270, 50.46797, 64.14130, 61.18131, 54.07547, 48.20223, 68.16201, 49.19054, 64.55587, 53.47755
Ozone	O₃	2.00	4.00	7.67	11.0	70.0	0.0368	5.921516, 3.308892, 7.297844, 13.272981, 8.668478, 10.086351, 8.446549, 13.558459, 5.318267, 5.821221, 3.563282, 5.674324
Sulfur Dioxide	SO₂	2.00	3.75	4.40	5.75	44.2	0.0386	4.314640, 5.468944, 5.189721, 4.552185, 3.750152, 5.050094, 4.054082, 3.540692, 4.437384, 4.481669, 4.598770, 3.341903
Particulate Matter	PM_2.5	11.0	17.0	19.1	25.0	83.0	0.0671	14.61111, 22.92192, 23.17655, 20.02368, 16.84564, 19.50485, 13.86406, 21.49507, 22.29248, 16.26497, 20.96106, 17.31579
	PM₁₀	23.0	34.0	37.0	48.0	235	0.0126	30.16986, 44.92042, 42.60434, 37.45582, 35.37959, 39.36376, 37.55267, 38.15724, 40.13624, 30.71409, 37.85273, 30.74763
The data were obtained using the `openair` package.

To accomplish Step 2, we use the gtExtras function gtExtras::gt_plt_sparkline(). Much like other functions we have used, this takes a column (we’ll give it “trend”) and attempts to transform it into a sparkline. One important option to set is “same_limit” to be FALSE - this is the equivalent of setting “scales = ‘free’” in a ggplot2::facet_wrap() function.

The sparkline also highlights the maximum and minimum values using coloured dots. We can specify their colours using the “palette” argument.

Notice also that gtExtras::gt_plt_sparkline() prints the final value of the sparkline - this could be useful if we were showing yearly data, as that value would correspond to the most recent year in our data set! Here it just represents the average value in December so it could be removed if we wanted to drop it - for this we’d set “label” to be FALSE.

Code

our_table = our_table |>
  gt_plt_sparkline(
    column = trend,
    palette = c("black", "black", "chartreuse3", "red", "black"),
    same_limit = F
  )

our_table

Species Name	Formula	Statistics						trend
Marylebone 2003 \| Air Quality Monitoring Summary
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite.
Species Name	Formula	p25	median	mean	p75	max	missing	trend
Carbon Monoxide	CO	0.600	0.975	1.12	1.50	4.30	0.0163
Oxides of Nitrogen	NO_x	76.0	139	164	226	749	0.0627
Nitrogen Dioxide	NO₂	36.0	52.0	56.0	72.0	206	0.0627
Ozone	O₃	2.00	4.00	7.67	11.0	70.0	0.0368
Sulfur Dioxide	SO₂	2.00	3.75	4.40	5.75	44.2	0.0386
Particulate Matter	PM_2.5	11.0	17.0	19.1	25.0	83.0	0.0671
	PM₁₀	23.0	34.0	37.0	48.0	235	0.0126
The data were obtained using the `openair` package.

gtExtras also has functions to add density functions, histograms, bar charts and bullet charts to your tables, so it is worth investigating these too. For some, like gtExtras::gt_plt_bar(), creating a list column isn’t even necessary. The key thing is that none of the gtExtras “plotting” functions will require anything more complex than making a list column.

It would be good for readers to understand what our sparkline represents. This may be a good opportunity to do some more formatting of our table using another tab_*() function, gt::tab_footnote(). This is similar to gt::tab_source_note() which we used before, but prints the label which it is provided with a superscript number associated with a specific part of our table. This could be the title, subtitle, a column header, or more.

To target specific parts of a gt table, there is a family of cells_*() functions. In this case, we use the gt::cells_column_labels() function, which targets a specific column header. Notice that “TREND” now possesses a superscript “1”, which is associated with the footnote at the bottom of the table.

Code

our_table = our_table |>
  tab_footnote(locations = cells_column_labels(columns = trend),
               footnote = "Monthly average. The lowest and highest months are indicated with green and red, respectively.")

our_table

Species Name	Formula	Statistics						trend¹
Marylebone 2003 \| Air Quality Monitoring Summary
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite.
Species Name	Formula	p25	median	mean	p75	max	missing	trend¹
Carbon Monoxide	CO	0.600	0.975	1.12	1.50	4.30	0.0163
Oxides of Nitrogen	NO_x	76.0	139	164	226	749	0.0627
Nitrogen Dioxide	NO₂	36.0	52.0	56.0	72.0	206	0.0627
Ozone	O₃	2.00	4.00	7.67	11.0	70.0	0.0368
Sulfur Dioxide	SO₂	2.00	3.75	4.40	5.75	44.2	0.0386
Particulate Matter	PM_2.5	11.0	17.0	19.1	25.0	83.0	0.0671
	PM₁₀	23.0	34.0	37.0	48.0	235	0.0126
The data were obtained using the `openair` package.
¹ Monthly average. The lowest and highest months are indicated with green and red, respectively.

This is another good stopping point to review the code we’ve used so far. Notice what we have done:

We have used gtExtras to quickly add a theme to our gt table.
We have explored how gtExtras can be used to highlight rows or columns of interest.
We have used gtExtras and list columns to add attractive “sparklines” in our table to, at a glance, see trends in our time series.

Review the full code from this level in the “Code” tab. Once you’re comfortable with it, feel free to move onto Level 3 where openair plots will be inserted in-line, similar to the sparkline we’ve already seen.

Code

# Calculate averages and get data for sparkline, and combine

data = mydata |>
  mutate(year = lubridate::year(date)) |>
  filter(year == 2003) |>
  pivot_longer(nox:pm25, names_to = "species", values_to = "conc") |>
  mutate(species = factor(species, c(
    "co", "nox", "no2", "o3", "so2", "pm25", "pm10"
  ))) |>
  arrange(species)

averages = data |>
  group_by(species) |>
  summarise(
    p25 = quantile(conc, .25, na.rm = T),
    median = median(conc, na.rm = T),
    mean   = mean(conc, na.rm = T),
    p75 = quantile(conc, .75, na.rm = T),
    max    = max(conc, na.rm = T),
    missing = sum(is.na(conc)) / n()
  )

spark = data |>
  timeAverage(avg.time = "month", type = "species") |>
  group_by(species) |>
  summarise(trend = list(conc))

all_data = left_join(averages, spark)

# Pre-format table data

tab_data = all_data |>
  mutate(
    name = case_when(
      species == "co" ~ "Carbon Monoxide",
      species == "nox" ~ "Oxides of Nitrogen",
      species == "no2" ~ "Nitrogen Dioxide",
      species == "o3"  ~ "Ozone",
      species == "so2" ~ "Sulfur Dioxide",
      species == "pm25" ~ "Particulate Matter",
      species == "pm10" ~ ""
    ),
    species = case_when(
      species == "co" ~ "CO",
      species == "nox" ~ "NO<sub>x</sub>",
      species == "no2" ~ "NO<sub>2</sub>",
      species == "o3"  ~ "O<sub>3</sub>",
      species == "so2" ~ "SO<sub>2</sub>",
      species == "pm25" ~ "PM<sub>2.5</sub>",
      species == "pm10" ~ "PM<sub>10</sub>"
    )
  ) |>
  relocate(name, .before = species)

# Tabulate

our_table = gt(tab_data) |>
  
  # Set theme as ESPN
  gt_theme_espn() |>
  
  # Format columns
  fmt_number(
    columns = where(is.numeric),
    n_sigfig = 3,
    drop_trailing_zeros = T
  ) |>
  fmt_percent(columns = missing) |>
  fmt_markdown(columns = species) |>
  
  # Column labels
  cols_label(name = "Species Name",
             species = "Formula") |>
  
  # Format table (add spanners, title, footnotes, etc.)
  tab_spanner(label = "Statistics", columns = p25:missing) |>
  tab_header(
    title = md("<b>Marylebone 2003</b> | Air Quality Monitoring Summary"),
    subtitle = md(
      "The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite."
    )
  ) |>
  tab_source_note(source_note = md("The data were obtained using the <code>openair</code> package.")) |>
  tab_footnote(locations = cells_column_labels(columns = trend),
               footnote = "Monthly average. The lowest and highest months are indicated with green and red, respectively.") |>
  
  # Format sparkline
  gt_plt_sparkline(
    column = trend,
    palette = c("black", "black", "chartreuse3", "red", "black"),
    same_limit = F
  )

Species Name	Formula	Statistics						trend¹
Marylebone 2003 \| Air Quality Monitoring Summary
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite.
Species Name	Formula	p25	median	mean	p75	max	missing	trend¹
Carbon Monoxide	CO	0.600	0.975	1.12	1.50	4.30	0.0163
Oxides of Nitrogen	NO_x	76.0	139	164	226	749	0.0627
Nitrogen Dioxide	NO₂	36.0	52.0	56.0	72.0	206	0.0627
Ozone	O₃	2.00	4.00	7.67	11.0	70.0	0.0368
Sulfur Dioxide	SO₂	2.00	3.75	4.40	5.75	44.2	0.0386
Particulate Matter	PM_2.5	11.0	17.0	19.1	25.0	83.0	0.0671
	PM₁₀	23.0	34.0	37.0	48.0	235	0.0126
The data were obtained using the `openair` package.
¹ Monthly average. The lowest and highest months are indicated with green and red, respectively.

Level 3: Incorporating `{openair}` into a `{gt}` table

openair creates plots used the world over but, as previously discussed, is built using lattice. This is unfortunate as the gt::ggplot_image() function is designed to insert only ggplot2 graphics into gt. We can still employ the principles behind gt::ggplot_image(), we will just need to rewrite it a bit to work with lattice graphics. Let’s consider what it is doing. The key thing it is achieving is to take some list of ggplot2 objects, save them in a local temporary directory, use the gt function gt::local_image() to read them back in as images and insert them into the table, and then delete the temporary files.

Code

function (plot_object,
          height = 100,
          aspect_ratio = 1)
{
  if (is.numeric(height)) {
    height <- paste0(height, "px")
  }
  if (inherits(plot_object, "gg")) {
    plot_object <- list(plot_object)
  }
  vapply(
    seq_along(plot_object),
    FUN.VALUE = character(1),
    USE.NAMES = FALSE,
    FUN = function(x) {
      filename <- paste0("temp_ggplot_", formatC(x, width = 4,
                                                 flag = "0"), ".png")
      ggplot2::ggsave(
        filename = filename,
        plot = plot_object[[x]],
        device = "png",
        dpi = 100,
        width = 5 * aspect_ratio,
        height = 5
      )
      on.exit(file.remove(filename))
      local_image(filename = filename, height = height)
    }
  )
}

The key things that need changing are to change “gg” to “trellis”, and to change the ggplot2::ggsave() step to a more lattice-friendly way to save plots.

Code

openair_image <- function(plot_object, height) {
  if (is.numeric(height)) {
    height <- paste0(height, "px")
  }
  if (inherits(plot_object, "trellis")) {
    plot_object <- list(plot_object)
  }
  invisible(vapply(
    seq_along(plot_object),
    FUN.VALUE = character(1),
    USE.NAMES = FALSE,
    FUN = function(x) {
      filename <- paste0("temp_openair_", formatC(x,
        width = 4,
        flag = "0"
      ), ".png")

      png(
        filename = filename,
        res = 300,
        width = 4 * 300,
        height = 4 * 300,
        bg = "transparent"
      )
      print(plot_object[[x]])
      dev.off()

      on.exit(file.remove(filename))
      local_image(filename = filename, height = height)
    }
  ))
}

Now we have a function that can take a list of lattice plots and insert them into our gt table, we need to actually make our lattice plots and ensure that there are columns in our data for them to occupy.

To quickly produce openair plots for each pollutant, we can use a dplyr workflow. The below code may be an unfamiliar way of working, so to unpack:

The data is “nested by” the “species” column. This means that the data frame becomes a data frame of seven rows - one for each species - and two columns - the retained “species” column, and a list-column called “data” containing all of the other data. This can be conceptually difficult, but effectively we are creating a column in a data frame which itself contains data frames. dplyr::nest_by() also “groups” by each row - any function applied to this new data frame will be applied to each row individually.
dplyr::mutate() is then used to create more list-columns, this time containing the lattice plot outputs of the openair openair::pollutionRose and openair::polarPlot functions. Note the use of list() to signify that we are creating list-columns.
The whole data frame is arranged by “species”. This is to ensure the plots are in the same order as our final table, and why turning them into factors right at the start was so important.

Note that this may take a minute to run!

Code

data_plots = data |>
  nest_by(species) |>
  mutate(polrose = list(
    pollutionRose(
      data,
      pollutant = "conc", 
      key = FALSE,
      par.settings = list(axis.line = list(col = "transparent"))
    )$plot
  ),
  polplot = list(polarPlot(
    data,
    pollutant = "conc", 
    key = FALSE,
    par.settings = list(axis.line = list(col = "transparent"))
  )$plot)) |>
  arrange(species)

data_plots

We now have a function to turn a column into plots, and the plots themselves. We now just need an empty couple of columns in our gt table to insert these plots into. We’ll create some empty columns in our pre-existing tab_data data frame.

Code

tab_data = tab_data |>
  mutate(polrose = NA,
         polplot = NA)

tab_data

# A tibble: 7 × 11
  name   species   p25  median   mean    p75   max missing trend polrose polplot
  <chr>  <chr>   <dbl>   <dbl>  <dbl>  <dbl> <dbl>   <dbl> <lis> <lgl>   <lgl>  
1 "Carb… CO        0.6   0.975   1.12   1.5    4.3  0.0163 <dbl> NA      NA     
2 "Oxid… NO<sub…  76   139     164.   226.   749    0.0627 <dbl> NA      NA     
3 "Nitr… NO<sub…  36    52      56.0   72    206    0.0627 <dbl> NA      NA     
4 "Ozon… O<sub>…   2     4       7.67  11     70    0.0368 <dbl> NA      NA     
5 "Sulf… SO<sub…   2     3.75    4.40   5.75  44.2  0.0386 <dbl> NA      NA     
6 "Part… PM<sub…  11    17      19.1   25     83    0.0671 <dbl> NA      NA     
7 ""     PM<sub…  23    34      37.0   48    235    0.0126 <dbl> NA      NA

Let’s see how this looks now. Note that, once again, the tutorial_level_2() function is a stand-in for all the formatting we applied in the previous section. We now have two empty columns.

Code

our_table = gt(tab_data) |> 
  tutorial_level_2()

our_table

Species Name	Formula	Statistics						trend¹	polrose	polplot
Marylebone 2003 \| Air Quality Monitoring Summary
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite.
Species Name	Formula	p25	median	mean	p75	max	missing	trend¹	polrose	polplot
Carbon Monoxide	CO	0.600	0.975	1.12	1.50	4.30	0.0163		NA	NA
Oxides of Nitrogen	NO_x	76.0	139	164	226	749	0.0627		NA	NA
Nitrogen Dioxide	NO₂	36.0	52.0	56.0	72.0	206	0.0627		NA	NA
Ozone	O₃	2.00	4.00	7.67	11.0	70.0	0.0368		NA	NA
Sulfur Dioxide	SO₂	2.00	3.75	4.40	5.75	44.2	0.0386		NA	NA
Particulate Matter	PM_2.5	11.0	17.0	19.1	25.0	83.0	0.0671		NA	NA
	PM₁₀	23.0	34.0	37.0	48.0	235	0.0126		NA	NA
The data were obtained using the `openair` package.
¹ Monthly average. The lowest and highest months are indicated with green and red, respectively.

The final step is a complex one, so let’s unpack it.

The function we use is gt::text_transform(). This does targeted text transformations. In this case we are transforming an empty cell into a plot.
gt::text_transform(), much like gt::tab_footnote(), needs a target specified by a cells_*() function. Here we are targeting the body of the cells themselves, hence the use of gt::cells_body(). We specify our empty column (“polrose” or “polplot”).
gt::text_transform() also takes a function used to transform the cells it is targeting. Here we purrr::map() over the list of plots in our data_plots data frame using our openair_image() function. This effectively transforms each NA value in the column with the corresponding plot in the data_plots data frame.

This process is why it was important that the order of the data_plots data frame was the same as the order of the table itself - if they were in the wrong order the plot would be with the wrong pollutant.

Code

our_table = our_table |>
  text_transform(
    locations = cells_body(columns = polrose),
    fn = function(x) {
      map(data_plots$polrose, openair_image, height = px(70))
    }
  ) |>
  text_transform(
    locations = cells_body(columns = polplot),
    fn = function(x) {
      map(data_plots$polplot, openair_image, height = px(70))
    }
  )

our_table

Species Name	Formula	Statistics						trend¹	polrose	polplot
Marylebone 2003 \| Air Quality Monitoring Summary
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite.
Species Name	Formula	p25	median	mean	p75	max	missing	trend¹	polrose	polplot
Carbon Monoxide	CO	0.600	0.975	1.12	1.50	4.30	0.0163
Oxides of Nitrogen	NO_x	76.0	139	164	226	749	0.0627
Nitrogen Dioxide	NO₂	36.0	52.0	56.0	72.0	206	0.0627
Ozone	O₃	2.00	4.00	7.67	11.0	70.0	0.0368
Sulfur Dioxide	SO₂	2.00	3.75	4.40	5.75	44.2	0.0386
Particulate Matter	PM_2.5	11.0	17.0	19.1	25.0	83.0	0.0671
	PM₁₀	23.0	34.0	37.0	48.0	235	0.0126
The data were obtained using the `openair` package.
¹ Monthly average. The lowest and highest months are indicated with green and red, respectively.

The table is nearing completion, so there is just a couple of final refinements that could be made. Some personal choices as last minute refinements are:

Using gt::tab_options() to set the font size to be a little smaller. gt::tab_options() is the gt equivalent to ggplot2::theme().
Adding a new spanner using gt::tab_spanner(), and renaming the new columns using gt::cols_label().
Fixing some column widths using gt::cols_width(). Making all of the statistics columns the same width makes the table look a little more attractive!
Using gt::cols_align() to align the “formula” column to be centred, which looks a little nicer.

Code

our_table = our_table |>
  tab_options(table.font.size = 12) |>
  tab_spanner(label = "Visualisations", columns = trend:polplot) |>
  cols_label(polrose = "Poll. Rose",
             polplot = "Polar Plt.") |>
  cols_width(c(p25, median, mean, p75, max, missing) ~ px(50),
             name ~ px(120)) |>
  cols_align(align = "center", columns = 2)

our_table

Species Name	Formula	Statistics						Visualisations
Marylebone 2003 \| Air Quality Monitoring Summary
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite.
Species Name	Formula	p25	median	mean	p75	max	missing	trend¹	Poll. Rose	Polar Plt.
Carbon Monoxide	CO	0.600	0.975	1.12	1.50	4.30	0.0163
Oxides of Nitrogen	NO_x	76.0	139	164	226	749	0.0627
Nitrogen Dioxide	NO₂	36.0	52.0	56.0	72.0	206	0.0627
Ozone	O₃	2.00	4.00	7.67	11.0	70.0	0.0368
Sulfur Dioxide	SO₂	2.00	3.75	4.40	5.75	44.2	0.0386
Particulate Matter	PM_2.5	11.0	17.0	19.1	25.0	83.0	0.0671
	PM₁₀	23.0	34.0	37.0	48.0	235	0.0126
The data were obtained using the `openair` package.
¹ Monthly average. The lowest and highest months are indicated with green and red, respectively.

These are last-minute, personal refinements, but demonstrate the depth of the gt package. It is a good idea to explore the functions of gt and gtExtras further to really get down to what they both can do!

The key achievement of this section was to add the openair plots to our gt table. The “Code” tab contains all the code needed to recreate the table “Table” tab, slightly refined for readability (putting things in a better order, and with comments).

Code

# Calculate averages and get data for sparkline, and combine

data = mydata |>
  mutate(year = lubridate::year(date)) |>
  filter(year == 2003) |>
  pivot_longer(nox:pm25, names_to = "species", values_to = "conc") |>
  mutate(species = factor(species, c(
    "co", "nox", "no2", "o3", "so2", "pm25", "pm10"
  ))) |>
  arrange(species)

averages = data |>
  group_by(species) |>
  summarise(
    p25 = quantile(conc, .25, na.rm = T),
    median = median(conc, na.rm = T),
    mean   = mean(conc, na.rm = T),
    p75 = quantile(conc, .75, na.rm = T),
    max    = max(conc, na.rm = T),
    missing = sum(is.na(conc)) / n()
  )

spark = data |>
  timeAverage(avg.time = "month", type = "species") |>
  group_by(species) |>
  summarise(trend = list(conc))

all_data = left_join(averages, spark) |>
  mutate(polrose = NA,
         polplot = NA)

# Function to tabulate plots

openair_image <- function(plot_object, height) {
  if (is.numeric(height)) {
    height <- paste0(height, "px")
  }
  if (inherits(plot_object, "trellis")) {
    plot_object <- list(plot_object)
  }
  invisible(vapply(
    seq_along(plot_object),
    FUN.VALUE = character(1),
    USE.NAMES = FALSE,
    FUN = function(x) {
      filename <- paste0("temp_openair_", formatC(x,
        width = 4,
        flag = "0"
      ), ".png")

      png(
        filename = filename,
        res = 300,
        width = 4 * 300,
        height = 4 * 300,
        bg = "transparent"
      )
      print(plot_object[[x]])
      dev.off()

      on.exit(file.remove(filename))
      local_image(filename = filename, height = height)
    }
  ))
}

# List of plots

data_plots = data |>
  nest_by(species) |>
  mutate(polrose = list(
    pollutionRose(
      data,
      pollutant = "conc", 
      key = FALSE,
      par.settings = list(axis.line = list(col = "transparent"))
    )$plot
  ),
  polplot = list(polarPlot(
    data,
    pollutant = "conc", 
    key = FALSE,
    par.settings = list(axis.line = list(col = "transparent"))
  )$plot)) |>
  arrange(species)

# Pre-format table data

tab_data = all_data |>
  mutate(
    name = case_when(
      species == "co" ~ "Carbon Monoxide",
      species == "nox" ~ "Oxides of Nitrogen",
      species == "no2" ~ "Nitrogen Dioxide",
      species == "o3"  ~ "Ozone",
      species == "so2" ~ "Sulfur Dioxide",
      species == "pm25" ~ "Particulate Matter",
      species == "pm10" ~ ""
    ),
    species = case_when(
      species == "co" ~ "CO",
      species == "nox" ~ "NO<sub>x</sub>",
      species == "no2" ~ "NO<sub>2</sub>",
      species == "o3"  ~ "O<sub>3</sub>",
      species == "so2" ~ "SO<sub>2</sub>",
      species == "pm25" ~ "PM<sub>2.5</sub>",
      species == "pm10" ~ "PM<sub>10</sub>"
    )
  ) |>
  relocate(name, .before = species)

# Tabulate

our_table = gt(tab_data) |>
  
  # Set theme as ESPN
  gt_theme_espn() |>
  
  # Format columns
  fmt_number(
    columns = where(is.numeric),
    n_sigfig = 3,
    drop_trailing_zeros = T
  ) |>
  fmt_percent(columns = missing) |>
  fmt_markdown(columns = species) |>
  
  # Column labels, widths, and alignments
  cols_label(
    name = "Species Name",
    species = "Formula",
    polrose = "Poll. Rose",
    polplot = "Polar Plt."
  ) |>
  cols_width(c(p25, median, mean, p75, max, missing) ~ px(50),
             name ~ px(120)) |>
  cols_align(align = "left", columns = 1) |>
  cols_align(align = "center", columns = 2) |>
  
  # Format table (add spanners, title, footnotes, set font size, etc.)
  tab_options(table.font.size = 12) |>
  tab_spanner(label = "Statistics", columns = p25:missing) |>
  tab_spanner(label = "Visualisations", columns = trend:polplot) |>
  tab_header(
    title = md("<b>Marylebone 2003</b> | Air Quality Monitoring Summary"),
    subtitle = md(
      "The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite."
    )
  ) |>
  tab_source_note(source_note = md("The data were obtained using the <code>openair</code> package.")) |>
  tab_footnote(locations = cells_column_labels(columns = trend),
               footnote = "Monthly average. The lowest and highest months are indicated with green and red, respectively.") |>
  
  # Format sparkline
  gt_plt_sparkline(
    column = trend,
    palette = c("black", "black", "chartreuse3", "red", "black"),
    same_limit = F
  ) |>
  
  # replace empty columns with plots
  text_transform(
    locations = cells_body(polrose),
    fn = function(x) {
      map(data_plots$polrose, openair_image, height = px(70))
    }
  ) |>
  text_transform(
    locations = cells_body(polplot),
    fn = function(x) {
      map(data_plots$polplot, openair_image, height = px(70))
    }
  )

Species Name	Formula	Statistics						Visualisations
Marylebone 2003 \| Air Quality Monitoring Summary
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite.
Species Name	Formula	p25	median	mean	p75	max	missing	trend¹	Poll. Rose	Polar Plt.
Carbon Monoxide	CO	0.600	0.975	1.12	1.50	4.30	0.0163
Oxides of Nitrogen	NO_x	76.0	139	164	226	749	0.0627
Nitrogen Dioxide	NO₂	36.0	52.0	56.0	72.0	206	0.0627
Ozone	O₃	2.00	4.00	7.67	11.0	70.0	0.0368
Sulfur Dioxide	SO₂	2.00	3.75	4.40	5.75	44.2	0.0386
Particulate Matter	PM_2.5	11.0	17.0	19.1	25.0	83.0	0.0671
	PM₁₀	23.0	34.0	37.0	48.0	235	0.0126
The data were obtained using the `openair` package.
¹ Monthly average. The lowest and highest months are indicated with green and red, respectively.

3 Conclusion

gt is an incredibly powerful tool to create impressive HTML tables, with features well suited to presenting air quality data. We’ve progressed through three levels of complexity:

Getting to grips with gt, clearly presenting air quality monitoring statistics in an attractive way.
Incorporating visualisations into the plot, combining the classic time series plot with our HTML table, allowing for at-a-glance understanding of data trends.
Adding industry-standard openair lattice-based plots to our table, enabling clients to explore more closely the pollutants in which they are most interested.

Even in these three sections we have just scratched the surface of what gt can do for presenting air quality data. It would be great to see more gt tables out in the wild, being used to great effect by the global air quality community.

Click for Session Info

R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.utf8 
[2] LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] gtExtras_0.4.1  gt_0.7.0        openair_2.10-0  forcats_0.5.2  
 [5] stringr_1.4.1   dplyr_1.0.9     purrr_0.3.4     readr_2.1.2    
 [9] tidyr_1.2.0     tibble_3.1.8    ggplot2_3.3.6   tidyverse_1.3.2

loaded via a namespace (and not attached):
 [1] nlme_3.1-157        fs_1.5.2            lubridate_1.8.0    
 [4] fontawesome_0.3.0   RColorBrewer_1.1-3  httr_1.4.4         
 [7] tools_4.2.1         backports_1.4.1     utf8_1.2.2         
[10] R6_2.5.1            DBI_1.1.3           mgcv_1.8-40        
[13] colorspace_2.0-3    withr_2.5.0         tidyselect_1.1.2   
[16] compiler_4.2.1      textshaping_0.3.6   cli_3.3.0          
[19] rvest_1.0.3         xml2_1.3.3          sass_0.4.2         
[22] labeling_0.4.2      scales_1.2.1        hexbin_1.28.2      
[25] systemfonts_1.0.4   commonmark_1.8.0    digest_0.6.29      
[28] svglite_2.1.0       rmarkdown_2.16      base64enc_0.1-3    
[31] jpeg_0.1-9          pkgconfig_2.0.3     htmltools_0.5.3    
[34] dbplyr_2.2.1        fastmap_1.1.0       maps_3.4.0         
[37] htmlwidgets_1.5.4   rlang_1.0.4         readxl_1.4.1       
[40] rstudioapi_0.14     farver_2.1.1        generics_0.1.3     
[43] jsonlite_1.8.0      googlesheets4_1.0.1 magrittr_2.0.3     
[46] interp_1.1-3        Matrix_1.4-1        Rcpp_1.0.9         
[49] munsell_0.5.0       fansi_1.0.3         lifecycle_1.0.1    
[52] stringi_1.7.8       yaml_2.3.5          MASS_7.3-57        
[55] grid_4.2.1          paletteer_1.4.1     crayon_1.5.1       
[58] deldir_1.0-6        lattice_0.20-45     haven_2.5.1        
[61] splines_4.2.1       mapproj_1.2.8       hms_1.1.2          
[64] knitr_1.40          pillar_1.8.1        reprex_2.0.2       
[67] glue_1.6.2          evaluate_0.16       latticeExtra_0.6-30
[70] modelr_0.1.9        png_0.1-7           vctrs_0.4.1        
[73] tzdb_0.3.0          cellranger_1.1.0    gtable_0.3.0       
[76] rematch2_2.1.2      assertthat_0.2.1    xfun_0.32          
[79] broom_1.0.0         ragg_1.2.2          googledrive_2.0.0  
[82] gargle_1.2.0        cluster_2.1.3       ellipsis_0.3.2

1 Introduction

2 Tutorials

Level 1: Using {gt} to present summary statistics

Level 2: Using {gtExtras} to add in-line visualisations

Level 3: Incorporating {openair} into a {gt} table

3 Conclusion

Level 1: Using `{gt}` to present summary statistics

Level 2: Using `{gtExtras}` to add in-line visualisations

Level 3: Incorporating `{openair}` into a `{gt}` table