Code
install.packages(tidyverse)
install.packages(openair)
install.packages(openair)
This tutorial will take you through three “levels” of summarising air quality monitoring data.
First, we will begin to use the gt package, a powerful package that is to tables what ggplot2 is to figures.
Then, we will incorporate the gtExtras package to further theme and add in-table visualisations.
Finally, we will investigate how to incorporate openair plots - plots extensively around the world by the air quality community - into our gt tables.
That last level is perhaps easier said than done - to quote the openair book:
openair is a product of the time it was started and used the highly capable lattice package for plotting. This was a time before ggplot2 and the tidyverse.
gt has functions that help with the inclusion of ggplot2, but not lattice, so this will require some modification of pre-existing gt functions that will be covered in the third part of the tutorial.
This tutorial assumes a general understanding of the tidyverse and openair, but is suitable for gt novices. The ideal target is an air quality professional or researcher who is interested in exploring new tools for presenting their data.
If you would like to know more about openair and reading plots, the best place to look is the openair book, available online.
If you would like to know more about the tidyverse, there are many resources available including R for Data Science.
All code in this document is reproducible; even the data used comes straight from the openair package (see the “Installation and Data” tab).
The tidyverse, openair and gt are all available on CRAN, so can be downloaded using install.packages()
.
install.packages(tidyverse)
install.packages(openair)
install.packages(openair)
gtExtras is still under active development, so can be downloaded directly from it’s GitHub repository. I would also recommend keeping up to date with it on its website.
# if needed install.packages("remotes")
remotes::install_github("jthomasmock/gtExtras")
The data that is going to be used in this tutorial is adapted from openair::mydata
.
head(mydata)
# A tibble: 6 × 10
date ws wd nox no2 o3 pm10 so2 co pm25
<dttm> <dbl> <int> <int> <int> <int> <int> <dbl> <dbl> <int>
1 1998-01-01 00:00:00 0.6 280 285 39 1 29 4.72 3.37 NA
2 1998-01-01 01:00:00 2.16 230 NA NA NA 37 NA NA NA
3 1998-01-01 02:00:00 2.76 190 NA NA 3 34 6.83 9.60 NA
4 1998-01-01 03:00:00 2.16 170 493 52 3 35 7.66 10.2 NA
5 1998-01-01 04:00:00 2.4 180 468 78 2 34 8.07 8.91 NA
6 1998-01-01 05:00:00 3 190 264 42 0 16 5.50 3.05 NA
The goal for this tutorial is to produce a summary table of air quality pollutants at a single site for a single year (feel free to click “Show me the final table” to see). The principles behind this could produce much different table structures however (e.g., a pollutant per row, a year per row, a site per row, some combination of the three, some other idea…).
For this table, the initial transformations that need to be done are as follows:
mydata
dataset).The final structure is now:
head(data, n = 10)
# A tibble: 10 × 6
date ws wd year species conc
<dttm> <dbl> <int> <dbl> <fct> <dbl>
1 2003-01-01 00:00:00 5.2 160 2003 co 0.675
2 2003-01-01 01:00:00 4.6 140 2003 co 0.967
3 2003-01-01 02:00:00 3.6 140 2003 co 1.2
4 2003-01-01 03:00:00 4.6 140 2003 co 1.15
5 2003-01-01 04:00:00 5.7 140 2003 co 0.725
6 2003-01-01 05:00:00 4.6 140 2003 co 0.575
7 2003-01-01 06:00:00 4.1 160 2003 co 0.525
8 2003-01-01 07:00:00 5.2 160 2003 co 0.8
9 2003-01-01 08:00:00 3.6 160 2003 co 1
10 2003-01-01 09:00:00 3.6 180 2003 co 0.925
Below you can see the table we’re aiming for by the end of Level 3.
Marylebone 2003 | Air Quality Monitoring Summary | ||||||||||
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite. | ||||||||||
Species Name | Formula | Statistics | Visualisations | |||||||
---|---|---|---|---|---|---|---|---|---|---|
p25 | median | mean | p75 | max | missing | trend1 | Poll. Rose | Polar Plt. | ||
Carbon Monoxide | CO |
0.600 | 0.975 | 1.12 | 1.50 | 4.30 | 1.63% | |||
Oxides of Nitrogen | NOx |
76.0 | 139 | 164 | 226 | 749 | 6.27% | |||
Nitrogen Dioxide | NO2 |
36.0 | 52.0 | 56.0 | 72.0 | 206 | 6.27% | |||
Ozone | O3 |
2.00 | 4.00 | 7.67 | 11.0 | 70.0 | 3.68% | |||
Sulfur Dioxide | SO2 |
2.00 | 3.75 | 4.40 | 5.75 | 44.2 | 3.86% | |||
Particulate Matter | PM2.5 |
11.0 | 17.0 | 19.1 | 25.0 | 83.0 | 6.71% | |||
PM10 |
23.0 | 34.0 | 37.0 | 48.0 | 235 | 1.26% | ||||
The data were obtained using the openair package. |
||||||||||
1 Monthly average. The lowest and highest months are indicated with green and red, respectively. |
{gt}
to present summary statisticsMuch as a ggplot2 plot begins with ggplot2::ggplot()
, a gt table begins with gt::gt()
. However, while frequently a ggplot2::ggplot()
function can be fed all of our raw data, it can be easier to provide gt::gt()
with a pre-made summary. Therefore our first step is to summarise our air quality data in some way. In real life, the summary statistics you chose will likely align with the legislation of your country or the sorts of statistics a client wants, but here some simple statistics are chosen.
The below code calculates the 25th percentile, median, mean, 7th percentile and maximum of each pollutant, as well as the percentage of data that is missing. Note the trick used for that last statistic - sum(is.na(conc))
checks whether each value in the column is NA
and then sums that column of TRUE
/FALSE
values, where TRUE
= 1 and FALSE
= 0. dplyr::n()
gives the number of rows in the group, meaning that the division of the two is effectively “number of missing divided by total”.
Summary statistics could be calculated however we’d like - including using openair functions - but this code does the job.
# A tibble: 7 × 7
species p25 median mean p75 max missing
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 co 0.6 0.975 1.12 1.5 4.3 0.0163
2 nox 76 139 164. 226. 749 0.0627
3 no2 36 52 56.0 72 206 0.0627
4 o3 2 4 7.67 11 70 0.0368
5 so2 2 3.75 4.40 5.75 44.2 0.0386
6 pm25 11 17 19.1 25 83 0.0671
7 pm10 23 34 37.0 48 235 0.0126
Now we’re at the position where we can begin a gt table. As alluded to earlier, we’ll begin by providing the gt::gt()
function with our data.
our_table = gt(averages)
our_table
species | p25 | median | mean | p75 | max | missing |
---|---|---|---|---|---|---|
co | 0.6 | 0.975 | 1.118069 | 1.50 | 4.30 | 0.01632420 |
nox | 76.0 | 139.000 | 163.941055 | 225.50 | 749.00 | 0.06267123 |
no2 | 36.0 | 52.000 | 55.964682 | 72.00 | 206.00 | 0.06267123 |
o3 | 2.0 | 4.000 | 7.673975 | 11.00 | 70.00 | 0.03675799 |
so2 | 2.0 | 3.750 | 4.398658 | 5.75 | 44.25 | 0.03858447 |
pm25 | 11.0 | 17.000 | 19.071341 | 25.00 | 83.00 | 0.06712329 |
pm10 | 23.0 | 34.000 | 37.009133 | 48.00 | 235.00 | 0.01255708 |
gt has produced a HTML table which would look excellent in an rmarkdown or quarto report, but it clearly could do with some finessing. For example, it may be useful to format the numeric columns to be a little shorter - we don’t need that many significant figures. gt has a collection of fmt_*()
functions for this purpose. In our case, gt::fmt_number()
.
Note that, unlike ggplot2 where additional geom_*()
functions are added, in gt additional features like fmt_*()
are piped.
fmt_*()
functions require columns to be specified. This argument works on tidyselect principles, so when we provide the argument where(is.numeric)
we are telling it to format all numeric columns (in this case, all columns besides “species”). We could specify a number of decimal places or significant figures - in this case the latter - and we can also tell the column to drop trailing zeroes. This means that the median of the NOx row won’t read “163.000…” just because the CO row goes to more decimal places.
our_table = our_table |>
fmt_number(
columns = where(is.numeric),
n_sigfig = 3,
drop_trailing_zeros = T
)
our_table
species | p25 | median | mean | p75 | max | missing |
---|---|---|---|---|---|---|
co | 0.600 | 0.975 | 1.12 | 1.50 | 4.30 | 0.0163 |
nox | 76.0 | 139 | 164 | 226 | 749 | 0.0627 |
no2 | 36.0 | 52.0 | 56.0 | 72.0 | 206 | 0.0627 |
o3 | 2.00 | 4.00 | 7.67 | 11.0 | 70.0 | 0.0368 |
so2 | 2.00 | 3.75 | 4.40 | 5.75 | 44.2 | 0.0386 |
pm25 | 11.0 | 17.0 | 19.1 | 25.0 | 83.0 | 0.0671 |
pm10 | 23.0 | 34.0 | 37.0 | 48.0 | 235 | 0.0126 |
This isn’t quite right as the “missing” column is actually a percentage. We can override the previous gt::fmt_number()
column by piping this new output into gt::fmt_percent()
and specifying the “missing” column. Note that, much like in the tidyverse, columns are provided unquoted.
our_table = our_table |>
fmt_percent(columns = missing)
our_table
species | p25 | median | mean | p75 | max | missing |
---|---|---|---|---|---|---|
co | 0.600 | 0.975 | 1.12 | 1.50 | 4.30 | 0.0163 |
nox | 76.0 | 139 | 164 | 226 | 749 | 0.0627 |
no2 | 36.0 | 52.0 | 56.0 | 72.0 | 206 | 0.0627 |
o3 | 2.00 | 4.00 | 7.67 | 11.0 | 70.0 | 0.0368 |
so2 | 2.00 | 3.75 | 4.40 | 5.75 | 44.2 | 0.0386 |
pm25 | 11.0 | 17.0 | 19.1 | 25.0 | 83.0 | 0.0671 |
pm10 | 23.0 | 34.0 | 37.0 | 48.0 | 235 | 0.0126 |
The “species” column could also be refined - currently the second row reads “nox” when it should read “NOx”. gt supports markdown syntax for achieving things like subscripts, bold text and the like. In this case, we’ll change our “species” column to be written as markdown. Once again, we can use a fmt_*()
function, this time to format the column as “markdown”.
Just for completion, while we’re using dplyr::mutate()
let’s insert the English names of the pollutants too, and put them at the far left of the table.
tab_data = averages |>
mutate(
name = case_when(
species == "co" ~ "Carbon Monoxide",
species == "nox" ~ "Oxides of Nitrogen",
species == "no2" ~ "Nitrogen Dioxide",
species == "o3" ~ "Ozone",
species == "so2" ~ "Sulfur Dioxide",
species == "pm25" ~ "Particulate Matter",
species == "pm10" ~ ""
),
species = case_when(
species == "co" ~ "CO",
species == "nox" ~ "NO<sub>x</sub>",
species == "no2" ~ "NO<sub>2</sub>",
species == "o3" ~ "O<sub>3</sub>",
species == "so2" ~ "SO<sub>2</sub>",
species == "pm25" ~ "PM<sub>2.5</sub>",
species == "pm10" ~ "PM<sub>10</sub>"
)
) |>
relocate(name, .before = species)
our_table = gt(tab_data) |>
fmt_number(
columns = where(is.numeric),
n_sigfig = 3,
drop_trailing_zeros = T
) |>
fmt_percent(columns = missing) |>
fmt_markdown(columns = species)
our_table
name | species | p25 | median | mean | p75 | max | missing |
---|---|---|---|---|---|---|---|
Carbon Monoxide | CO |
0.600 | 0.975 | 1.12 | 1.50 | 4.30 | 0.0163 |
Oxides of Nitrogen | NOx |
76.0 | 139 | 164 | 226 | 749 | 0.0627 |
Nitrogen Dioxide | NO2 |
36.0 | 52.0 | 56.0 | 72.0 | 206 | 0.0627 |
Ozone | O3 |
2.00 | 4.00 | 7.67 | 11.0 | 70.0 | 0.0368 |
Sulfur Dioxide | SO2 |
2.00 | 3.75 | 4.40 | 5.75 | 44.2 | 0.0386 |
Particulate Matter | PM2.5 |
11.0 | 17.0 | 19.1 | 25.0 | 83.0 | 0.0671 |
PM10 |
23.0 | 34.0 | 37.0 | 48.0 | 235 | 0.0126 |
By default, gt simply carries over the column names to be the header names, but much like the axis labels that ggplot2 defaults to these will often want changing. Column names in data are often written to be easy to use, but column names in a table should be written to be easy to read. It may be useful at this stage to refer to the “elements of a table” diagram from the gt website.
We are wanting to control something to do with columns so we’ll use a cols_*()
function. This family of functions can move columns around, merge them, hide them and - important right now - rename them. The syntax of gt::cols_label()
is almost the inverse of dplyr::rename()
- instead of “new = old”, “old = new” is provided. The new name should also be given as a string.
our_table = our_table |>
cols_label(name = "Species Name",
species = "Formula")
our_table
Species Name | Formula | p25 | median | mean | p75 | max | missing |
---|---|---|---|---|---|---|---|
Carbon Monoxide | CO |
0.600 | 0.975 | 1.12 | 1.50 | 4.30 | 0.0163 |
Oxides of Nitrogen | NOx |
76.0 | 139 | 164 | 226 | 749 | 0.0627 |
Nitrogen Dioxide | NO2 |
36.0 | 52.0 | 56.0 | 72.0 | 206 | 0.0627 |
Ozone | O3 |
2.00 | 4.00 | 7.67 | 11.0 | 70.0 | 0.0368 |
Sulfur Dioxide | SO2 |
2.00 | 3.75 | 4.40 | 5.75 | 44.2 | 0.0386 |
Particulate Matter | PM2.5 |
11.0 | 17.0 | 19.1 | 25.0 | 83.0 | 0.0671 |
PM10 |
23.0 | 34.0 | 37.0 | 48.0 | 235 | 0.0126 |
This table is already looking great, but we can refine it on a whole table level. To make large scale changes, we can use the a tab_*()
function. These could be things like adding a title, subtitle, footnote, or a “spanner title” that groups certain columns together. In the next code chunk we’ll implement all of these things. Note that we can format elements like the figure title using markdown by wrapping the string in the gt::md()
function.
our_table = our_table |>
tab_spanner(label = "Statistics", columns = p25:missing) |>
tab_header(
title = md("<b>Marylebone 2003</b> | Air Quality Monitoring Summary"),
subtitle = md(
"The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite."
)
) |>
tab_source_note(source_note = md("The data were obtained using the <code>openair</code> package."))
our_table
Marylebone 2003 | Air Quality Monitoring Summary | |||||||
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite. | |||||||
Species Name | Formula | Statistics | |||||
---|---|---|---|---|---|---|---|
p25 | median | mean | p75 | max | missing | ||
Carbon Monoxide | CO |
0.600 | 0.975 | 1.12 | 1.50 | 4.30 | 0.0163 |
Oxides of Nitrogen | NOx |
76.0 | 139 | 164 | 226 | 749 | 0.0627 |
Nitrogen Dioxide | NO2 |
36.0 | 52.0 | 56.0 | 72.0 | 206 | 0.0627 |
Ozone | O3 |
2.00 | 4.00 | 7.67 | 11.0 | 70.0 | 0.0368 |
Sulfur Dioxide | SO2 |
2.00 | 3.75 | 4.40 | 5.75 | 44.2 | 0.0386 |
Particulate Matter | PM2.5 |
11.0 | 17.0 | 19.1 | 25.0 | 83.0 | 0.0671 |
PM10 |
23.0 | 34.0 | 37.0 | 48.0 | 235 | 0.0126 | |
The data were obtained using the openair package. |
This is a good stopping point to review the code we’ve used so far. Notice what we have done:
gt()
.fmt_*()
functions to format numeric and character columns.col_*()
function to change a property of a column (in this case, it’s name)tab_*()
functions to make table-wide changes - giving it a title, spanner headers, etc.Review the full code from this level in the “Code” tab. Once you’re comfortable with it, feel free to move onto Level 2 where gtExtras will be incorporated to enhance the table’s theme and introduce in-line visualisations.
data = mydata |>
mutate(year = lubridate::year(date)) |>
filter(year == 2003) |>
pivot_longer(nox:pm25, names_to = "species", values_to = "conc") |>
mutate(species = factor(species, c(
"co", "nox", "no2", "o3", "so2", "pm25", "pm10"
))) |>
arrange(species)
averages = data |>
group_by(species) |>
summarise(
p25 = quantile(conc, .25, na.rm = T),
median = median(conc, na.rm = T),
mean = mean(conc, na.rm = T),
p75 = quantile(conc, .75, na.rm = T),
max = max(conc, na.rm = T),
missing = sum(is.na(conc)) / n()
)
tab_data = averages |>
mutate(
name = case_when(
species == "co" ~ "Carbon Monoxide",
species == "nox" ~ "Oxides of Nitrogen",
species == "no2" ~ "Nitrogen Dioxide",
species == "o3" ~ "Ozone",
species == "so2" ~ "Sulfur Dioxide",
species == "pm25" ~ "Particulate Matter",
species == "pm10" ~ ""
),
species = case_when(
species == "co" ~ "CO",
species == "nox" ~ "NO<sub>x</sub>",
species == "no2" ~ "NO<sub>2</sub>",
species == "o3" ~ "O<sub>3</sub>",
species == "so2" ~ "SO<sub>2</sub>",
species == "pm25" ~ "PM<sub>2.5</sub>",
species == "pm10" ~ "PM<sub>10</sub>"
)
) |>
relocate(name, .before = species)
our_table = gt(tab_data) |>
# Format columns
fmt_number(
columns = where(is.numeric),
n_sigfig = 3,
drop_trailing_zeros = T
) |>
fmt_percent(columns = missing) |>
fmt_markdown(columns = species) |>
# Column labels
cols_label(name = "Species Name",
species = "Formula") |>
# Format table (add spanners, title, footnotes, etc.)
tab_spanner(label = "Statistics", columns = p25:missing) |>
tab_header(
title = md("<b>Marylebone 2003</b> | Air Quality Monitoring Summary"),
subtitle = md(
"The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite."
)
) |>
tab_source_note(source_note = md("The data were obtained using the <code>openair</code> package."))
Marylebone 2003 | Air Quality Monitoring Summary | |||||||
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite. | |||||||
Species Name | Formula | Statistics | |||||
---|---|---|---|---|---|---|---|
p25 | median | mean | p75 | max | missing | ||
Carbon Monoxide | CO |
0.600 | 0.975 | 1.12 | 1.50 | 4.30 | 0.0163 |
Oxides of Nitrogen | NOx |
76.0 | 139 | 164 | 226 | 749 | 0.0627 |
Nitrogen Dioxide | NO2 |
36.0 | 52.0 | 56.0 | 72.0 | 206 | 0.0627 |
Ozone | O3 |
2.00 | 4.00 | 7.67 | 11.0 | 70.0 | 0.0368 |
Sulfur Dioxide | SO2 |
2.00 | 3.75 | 4.40 | 5.75 | 44.2 | 0.0386 |
Particulate Matter | PM2.5 |
11.0 | 17.0 | 19.1 | 25.0 | 83.0 | 0.0671 |
PM10 |
23.0 | 34.0 | 37.0 | 48.0 | 235 | 0.0126 | |
The data were obtained using the openair package. |
{gtExtras}
to add in-line visualisationsNB: The gtExtras package is currently under extremely active development, so be prepared for breaking changes until functionality is added to gt or gtExtras gets a stable release.
The gtExtras package provides some helper functions to simplify the use of gt somewhat. This section will focus in on some of its key functionality, particularly the addition of “sparklines” to represent trends in a pollutant.
First, however, we’ll look at the use of the gt_theme_*()
functions, which - much like ggplot2::theme_*()
functions - make large changes to the appearance of our table. The themes are based on many popular websites, such as FiveThirtyEight, the Guardian newspaper, the New York Times, and more. Personally, I think the ESPN theme looks nice and clean.
our_table = our_table |>
gt_theme_espn()
our_table
Marylebone 2003 | Air Quality Monitoring Summary | |||||||
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite. | |||||||
Species Name | Formula | Statistics | |||||
---|---|---|---|---|---|---|---|
p25 | median | mean | p75 | max | missing | ||
Carbon Monoxide | CO |
0.600 | 0.975 | 1.12 | 1.50 | 4.30 | 0.0163 |
Oxides of Nitrogen | NOx |
76.0 | 139 | 164 | 226 | 749 | 0.0627 |
Nitrogen Dioxide | NO2 |
36.0 | 52.0 | 56.0 | 72.0 | 206 | 0.0627 |
Ozone | O3 |
2.00 | 4.00 | 7.67 | 11.0 | 70.0 | 0.0368 |
Sulfur Dioxide | SO2 |
2.00 | 3.75 | 4.40 | 5.75 | 44.2 | 0.0386 |
Particulate Matter | PM2.5 |
11.0 | 17.0 | 19.1 | 25.0 | 83.0 | 0.0671 |
PM10 |
23.0 | 34.0 | 37.0 | 48.0 | 235 | 0.0126 | |
The data were obtained using the openair package. |
On the topic of changing the appearence of a plot, gtExtras contains the gtExtras::gt_highlight_rows()
function, which could be useful to draw attention to a certain row of a table. This could be a pollutant that is of particular interest, sites that exceed emission limits, or the current year (depending on how your table is formatted). There is also the equivalent gtExtras::gt_highlight_cols()
function.
This functionality is demonstrated below, but we won’t be taking it any further in this case. The “rows” argument can take a logical expression (here checking that the row is either NO2 or NOx). Certain columns can also be “targeted”, meaning that only certain cells are put in bold.
Marylebone 2003 | Air Quality Monitoring Summary | |||||||
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite. | |||||||
Species Name | Formula | Statistics | |||||
---|---|---|---|---|---|---|---|
p25 | median | mean | p75 | max | missing | ||
Carbon Monoxide | CO |
0.600 | 0.975 | 1.12 | 1.50 | 4.30 | 0.0163 |
Oxides of Nitrogen | NOx |
76.0 | 139 | 164 | 226 | 749 | 0.0627 |
Nitrogen Dioxide | NO2 |
36.0 | 52.0 | 56.0 | 72.0 | 206 | 0.0627 |
Ozone | O3 |
2.00 | 4.00 | 7.67 | 11.0 | 70.0 | 0.0368 |
Sulfur Dioxide | SO2 |
2.00 | 3.75 | 4.40 | 5.75 | 44.2 | 0.0386 |
Particulate Matter | PM2.5 |
11.0 | 17.0 | 19.1 | 25.0 | 83.0 | 0.0671 |
PM10 |
23.0 | 34.0 | 37.0 | 48.0 | 235 | 0.0126 | |
The data were obtained using the openair package. |
The most interesting functionality, perhaps, is the idea of in-line visualisations. These allow us to, at a glance, understand the distribution of some numeric data or - in our case - the trend in a time series. Plotting a time series is a classic part of air quality monitoring data analysis - openair itself has a dedicated function for it.
timePlot(
data,
pollutant = "conc",
type = "species",
y.relation = "free",
key = F,
avg.time = "week"
)
To get a “sparkline” into our table, there are two main steps:
Create a list column which contains a list of the data to be plotted as a sparkline. This means that there will be a column where each row is a list of data.
Use gtExtras to format this column as a sparkline.
The code outlined below will achieve Step 1. openair is used to time average our data to be monthly - this will ensure our sparkline won’t be totally unreadable. The most unusual step is perhaps the dplyr::summarise(trend = list(conc))
step. This is creating a data frame - one row per “species” as per the dplyr::group_by()
- which has the column “trend” which contains a list of the concentration data for the corresponding species. Notice the structure below - the “trend” column is populated by lists of 12 “dbl” values - one for each month of the year.
spark = data |>
timeAverage(avg.time = "month", type = "species") |>
group_by(species) |>
summarise(trend = list(conc))
all_data = left_join(averages, spark)
Joining, by = "species"
all_data
# A tibble: 7 × 8
species p25 median mean p75 max missing trend
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <list>
1 co 0.6 0.975 1.12 1.5 4.3 0.0163 <dbl [12]>
2 nox 76 139 164. 226. 749 0.0627 <dbl [12]>
3 no2 36 52 56.0 72 206 0.0627 <dbl [12]>
4 o3 2 4 7.67 11 70 0.0368 <dbl [12]>
5 so2 2 3.75 4.40 5.75 44.2 0.0386 <dbl [12]>
6 pm25 11 17 19.1 25 83 0.0671 <dbl [12]>
7 pm10 23 34 37.0 48 235 0.0126 <dbl [12]>
With our new data - now including the list column - let’s re-apply all of the code we have so far and see what the “raw” gt output is. Note that the tutorial_level_1()
function is a stand-in for all the formatting we applied in the previous section. Notice that all of the values of each list have now been printed - not what we’re after.
tab_data = all_data |>
mutate(
name = case_when(
species == "co" ~ "Carbon Monoxide",
species == "nox" ~ "Oxides of Nitrogen",
species == "no2" ~ "Nitrogen Dioxide",
species == "o3" ~ "Ozone",
species == "so2" ~ "Sulfur Dioxide",
species == "pm25" ~ "Particulate Matter",
species == "pm10" ~ ""
),
species = case_when(
species == "co" ~ "CO",
species == "nox" ~ "NO<sub>x</sub>",
species == "no2" ~ "NO<sub>2</sub>",
species == "o3" ~ "O<sub>3</sub>",
species == "so2" ~ "SO<sub>2</sub>",
species == "pm25" ~ "PM<sub>2.5</sub>",
species == "pm10" ~ "PM<sub>10</sub>"
)
) |>
relocate(name, .before = species)
our_table = gt(tab_data) |>
tutorial_level_1() |>
gt_theme_espn()
our_table
Marylebone 2003 | Air Quality Monitoring Summary | ||||||||
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite. | ||||||||
Species Name | Formula | Statistics | trend | |||||
---|---|---|---|---|---|---|---|---|
p25 | median | mean | p75 | max | missing | |||
Carbon Monoxide | CO |
0.600 | 0.975 | 1.12 | 1.50 | 4.30 | 0.0163 | 1.1156836, 1.2255305, 1.1926563, 0.9887262, 1.2467706, 1.1988858, 1.1338768, 0.8610005, 1.1812325, 0.8946839, 1.2881151, 1.0920393 |
Oxides of Nitrogen | NOx |
76.0 | 139 | 164 | 226 | 749 | 0.0627 | 155.2497, 196.4437, 164.6905, 132.4206, 179.0217, 163.5314, 142.0042, 109.2338, 190.7626, 147.3095, 205.6243, 164.0558 |
Nitrogen Dioxide | NO2 |
36.0 | 52.0 | 56.0 | 72.0 | 206 | 0.0627 | 44.21981, 54.23906, 57.30270, 50.46797, 64.14130, 61.18131, 54.07547, 48.20223, 68.16201, 49.19054, 64.55587, 53.47755 |
Ozone | O3 |
2.00 | 4.00 | 7.67 | 11.0 | 70.0 | 0.0368 | 5.921516, 3.308892, 7.297844, 13.272981, 8.668478, 10.086351, 8.446549, 13.558459, 5.318267, 5.821221, 3.563282, 5.674324 |
Sulfur Dioxide | SO2 |
2.00 | 3.75 | 4.40 | 5.75 | 44.2 | 0.0386 | 4.314640, 5.468944, 5.189721, 4.552185, 3.750152, 5.050094, 4.054082, 3.540692, 4.437384, 4.481669, 4.598770, 3.341903 |
Particulate Matter | PM2.5 |
11.0 | 17.0 | 19.1 | 25.0 | 83.0 | 0.0671 | 14.61111, 22.92192, 23.17655, 20.02368, 16.84564, 19.50485, 13.86406, 21.49507, 22.29248, 16.26497, 20.96106, 17.31579 |
PM10 |
23.0 | 34.0 | 37.0 | 48.0 | 235 | 0.0126 | 30.16986, 44.92042, 42.60434, 37.45582, 35.37959, 39.36376, 37.55267, 38.15724, 40.13624, 30.71409, 37.85273, 30.74763 | |
The data were obtained using the openair package. |
To accomplish Step 2, we use the gtExtras function gtExtras::gt_plt_sparkline()
. Much like other functions we have used, this takes a column (we’ll give it “trend”) and attempts to transform it into a sparkline. One important option to set is “same_limit” to be FALSE
- this is the equivalent of setting “scales = ‘free’” in a ggplot2::facet_wrap()
function.
The sparkline also highlights the maximum and minimum values using coloured dots. We can specify their colours using the “palette” argument.
Notice also that gtExtras::gt_plt_sparkline()
prints the final value of the sparkline - this could be useful if we were showing yearly data, as that value would correspond to the most recent year in our data set! Here it just represents the average value in December so it could be removed if we wanted to drop it - for this we’d set “label” to be FALSE
.
our_table = our_table |>
gt_plt_sparkline(
column = trend,
palette = c("black", "black", "chartreuse3", "red", "black"),
same_limit = F
)
our_table
Marylebone 2003 | Air Quality Monitoring Summary | ||||||||
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite. | ||||||||
Species Name | Formula | Statistics | trend | |||||
---|---|---|---|---|---|---|---|---|
p25 | median | mean | p75 | max | missing | |||
Carbon Monoxide | CO |
0.600 | 0.975 | 1.12 | 1.50 | 4.30 | 0.0163 | |
Oxides of Nitrogen | NOx |
76.0 | 139 | 164 | 226 | 749 | 0.0627 | |
Nitrogen Dioxide | NO2 |
36.0 | 52.0 | 56.0 | 72.0 | 206 | 0.0627 | |
Ozone | O3 |
2.00 | 4.00 | 7.67 | 11.0 | 70.0 | 0.0368 | |
Sulfur Dioxide | SO2 |
2.00 | 3.75 | 4.40 | 5.75 | 44.2 | 0.0386 | |
Particulate Matter | PM2.5 |
11.0 | 17.0 | 19.1 | 25.0 | 83.0 | 0.0671 | |
PM10 |
23.0 | 34.0 | 37.0 | 48.0 | 235 | 0.0126 | ||
The data were obtained using the openair package. |
gtExtras also has functions to add density functions, histograms, bar charts and bullet charts to your tables, so it is worth investigating these too. For some, like gtExtras::gt_plt_bar()
, creating a list column isn’t even necessary. The key thing is that none of the gtExtras “plotting” functions will require anything more complex than making a list column.
It would be good for readers to understand what our sparkline represents. This may be a good opportunity to do some more formatting of our table using another tab_*()
function, gt::tab_footnote()
. This is similar to gt::tab_source_note()
which we used before, but prints the label which it is provided with a superscript number associated with a specific part of our table. This could be the title, subtitle, a column header, or more.
To target specific parts of a gt table, there is a family of cells_*()
functions. In this case, we use the gt::cells_column_labels()
function, which targets a specific column header. Notice that “TREND” now possesses a superscript “1”, which is associated with the footnote at the bottom of the table.
our_table = our_table |>
tab_footnote(locations = cells_column_labels(columns = trend),
footnote = "Monthly average. The lowest and highest months are indicated with green and red, respectively.")
our_table
Marylebone 2003 | Air Quality Monitoring Summary | ||||||||
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite. | ||||||||
Species Name | Formula | Statistics | trend1 | |||||
---|---|---|---|---|---|---|---|---|
p25 | median | mean | p75 | max | missing | |||
Carbon Monoxide | CO |
0.600 | 0.975 | 1.12 | 1.50 | 4.30 | 0.0163 | |
Oxides of Nitrogen | NOx |
76.0 | 139 | 164 | 226 | 749 | 0.0627 | |
Nitrogen Dioxide | NO2 |
36.0 | 52.0 | 56.0 | 72.0 | 206 | 0.0627 | |
Ozone | O3 |
2.00 | 4.00 | 7.67 | 11.0 | 70.0 | 0.0368 | |
Sulfur Dioxide | SO2 |
2.00 | 3.75 | 4.40 | 5.75 | 44.2 | 0.0386 | |
Particulate Matter | PM2.5 |
11.0 | 17.0 | 19.1 | 25.0 | 83.0 | 0.0671 | |
PM10 |
23.0 | 34.0 | 37.0 | 48.0 | 235 | 0.0126 | ||
The data were obtained using the openair package. |
||||||||
1 Monthly average. The lowest and highest months are indicated with green and red, respectively. |
This is another good stopping point to review the code we’ve used so far. Notice what we have done:
Review the full code from this level in the “Code” tab. Once you’re comfortable with it, feel free to move onto Level 3 where openair plots will be inserted in-line, similar to the sparkline we’ve already seen.
# Calculate averages and get data for sparkline, and combine
data = mydata |>
mutate(year = lubridate::year(date)) |>
filter(year == 2003) |>
pivot_longer(nox:pm25, names_to = "species", values_to = "conc") |>
mutate(species = factor(species, c(
"co", "nox", "no2", "o3", "so2", "pm25", "pm10"
))) |>
arrange(species)
averages = data |>
group_by(species) |>
summarise(
p25 = quantile(conc, .25, na.rm = T),
median = median(conc, na.rm = T),
mean = mean(conc, na.rm = T),
p75 = quantile(conc, .75, na.rm = T),
max = max(conc, na.rm = T),
missing = sum(is.na(conc)) / n()
)
spark = data |>
timeAverage(avg.time = "month", type = "species") |>
group_by(species) |>
summarise(trend = list(conc))
all_data = left_join(averages, spark)
# Pre-format table data
tab_data = all_data |>
mutate(
name = case_when(
species == "co" ~ "Carbon Monoxide",
species == "nox" ~ "Oxides of Nitrogen",
species == "no2" ~ "Nitrogen Dioxide",
species == "o3" ~ "Ozone",
species == "so2" ~ "Sulfur Dioxide",
species == "pm25" ~ "Particulate Matter",
species == "pm10" ~ ""
),
species = case_when(
species == "co" ~ "CO",
species == "nox" ~ "NO<sub>x</sub>",
species == "no2" ~ "NO<sub>2</sub>",
species == "o3" ~ "O<sub>3</sub>",
species == "so2" ~ "SO<sub>2</sub>",
species == "pm25" ~ "PM<sub>2.5</sub>",
species == "pm10" ~ "PM<sub>10</sub>"
)
) |>
relocate(name, .before = species)
# Tabulate
our_table = gt(tab_data) |>
# Set theme as ESPN
gt_theme_espn() |>
# Format columns
fmt_number(
columns = where(is.numeric),
n_sigfig = 3,
drop_trailing_zeros = T
) |>
fmt_percent(columns = missing) |>
fmt_markdown(columns = species) |>
# Column labels
cols_label(name = "Species Name",
species = "Formula") |>
# Format table (add spanners, title, footnotes, etc.)
tab_spanner(label = "Statistics", columns = p25:missing) |>
tab_header(
title = md("<b>Marylebone 2003</b> | Air Quality Monitoring Summary"),
subtitle = md(
"The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite."
)
) |>
tab_source_note(source_note = md("The data were obtained using the <code>openair</code> package.")) |>
tab_footnote(locations = cells_column_labels(columns = trend),
footnote = "Monthly average. The lowest and highest months are indicated with green and red, respectively.") |>
# Format sparkline
gt_plt_sparkline(
column = trend,
palette = c("black", "black", "chartreuse3", "red", "black"),
same_limit = F
)
Marylebone 2003 | Air Quality Monitoring Summary | ||||||||
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite. | ||||||||
Species Name | Formula | Statistics | trend1 | |||||
---|---|---|---|---|---|---|---|---|
p25 | median | mean | p75 | max | missing | |||
Carbon Monoxide | CO |
0.600 | 0.975 | 1.12 | 1.50 | 4.30 | 0.0163 | |
Oxides of Nitrogen | NOx |
76.0 | 139 | 164 | 226 | 749 | 0.0627 | |
Nitrogen Dioxide | NO2 |
36.0 | 52.0 | 56.0 | 72.0 | 206 | 0.0627 | |
Ozone | O3 |
2.00 | 4.00 | 7.67 | 11.0 | 70.0 | 0.0368 | |
Sulfur Dioxide | SO2 |
2.00 | 3.75 | 4.40 | 5.75 | 44.2 | 0.0386 | |
Particulate Matter | PM2.5 |
11.0 | 17.0 | 19.1 | 25.0 | 83.0 | 0.0671 | |
PM10 |
23.0 | 34.0 | 37.0 | 48.0 | 235 | 0.0126 | ||
The data were obtained using the openair package. |
||||||||
1 Monthly average. The lowest and highest months are indicated with green and red, respectively. |
{openair}
into a {gt}
tableopenair creates plots used the world over but, as previously discussed, is built using lattice. This is unfortunate as the gt::ggplot_image()
function is designed to insert only ggplot2 graphics into gt. We can still employ the principles behind gt::ggplot_image()
, we will just need to rewrite it a bit to work with lattice graphics. Let’s consider what it is doing. The key thing it is achieving is to take some list of ggplot2 objects, save them in a local temporary directory, use the gt function gt::local_image()
to read them back in as images and insert them into the table, and then delete the temporary files.
function (plot_object,
height = 100,
aspect_ratio = 1)
{
if (is.numeric(height)) {
height <- paste0(height, "px")
}
if (inherits(plot_object, "gg")) {
plot_object <- list(plot_object)
}
vapply(
seq_along(plot_object),
FUN.VALUE = character(1),
USE.NAMES = FALSE,
FUN = function(x) {
filename <- paste0("temp_ggplot_", formatC(x, width = 4,
flag = "0"), ".png")
ggplot2::ggsave(
filename = filename,
plot = plot_object[[x]],
device = "png",
dpi = 100,
width = 5 * aspect_ratio,
height = 5
)
on.exit(file.remove(filename))
local_image(filename = filename, height = height)
}
)
}
The key things that need changing are to change “gg” to “trellis”, and to change the ggplot2::ggsave()
step to a more lattice-friendly way to save plots.
openair_image <- function(plot_object, height) {
if (is.numeric(height)) {
height <- paste0(height, "px")
}
if (inherits(plot_object, "trellis")) {
plot_object <- list(plot_object)
}
invisible(vapply(
seq_along(plot_object),
FUN.VALUE = character(1),
USE.NAMES = FALSE,
FUN = function(x) {
filename <- paste0("temp_openair_", formatC(x,
width = 4,
flag = "0"
), ".png")
png(
filename = filename,
res = 300,
width = 4 * 300,
height = 4 * 300,
bg = "transparent"
)
print(plot_object[[x]])
dev.off()
on.exit(file.remove(filename))
local_image(filename = filename, height = height)
}
))
}
Now we have a function that can take a list of lattice plots and insert them into our gt table, we need to actually make our lattice plots and ensure that there are columns in our data for them to occupy.
To quickly produce openair plots for each pollutant, we can use a dplyr workflow. The below code may be an unfamiliar way of working, so to unpack:
The data is “nested by” the “species” column. This means that the data frame becomes a data frame of seven rows - one for each species - and two columns - the retained “species” column, and a list-column called “data” containing all of the other data. This can be conceptually difficult, but effectively we are creating a column in a data frame which itself contains data frames. dplyr::nest_by()
also “groups” by each row - any function applied to this new data frame will be applied to each row individually.
dplyr::mutate()
is then used to create more list-columns, this time containing the lattice plot outputs of the openair openair::pollutionRose
and openair::polarPlot
functions. Note the use of list()
to signify that we are creating list-columns.
The whole data frame is arranged by “species”. This is to ensure the plots are in the same order as our final table, and why turning them into factors right at the start was so important.
Note that this may take a minute to run!
data_plots = data |>
nest_by(species) |>
mutate(polrose = list(
pollutionRose(
data,
pollutant = "conc",
key = FALSE,
par.settings = list(axis.line = list(col = "transparent"))
)$plot
),
polplot = list(polarPlot(
data,
pollutant = "conc",
key = FALSE,
par.settings = list(axis.line = list(col = "transparent"))
)$plot)) |>
arrange(species)
data_plots
We now have a function to turn a column into plots, and the plots themselves. We now just need an empty couple of columns in our gt table to insert these plots into. We’ll create some empty columns in our pre-existing tab_data
data frame.
tab_data = tab_data |>
mutate(polrose = NA,
polplot = NA)
tab_data
# A tibble: 7 × 11
name species p25 median mean p75 max missing trend polrose polplot
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lis> <lgl> <lgl>
1 "Carb… CO 0.6 0.975 1.12 1.5 4.3 0.0163 <dbl> NA NA
2 "Oxid… NO<sub… 76 139 164. 226. 749 0.0627 <dbl> NA NA
3 "Nitr… NO<sub… 36 52 56.0 72 206 0.0627 <dbl> NA NA
4 "Ozon… O<sub>… 2 4 7.67 11 70 0.0368 <dbl> NA NA
5 "Sulf… SO<sub… 2 3.75 4.40 5.75 44.2 0.0386 <dbl> NA NA
6 "Part… PM<sub… 11 17 19.1 25 83 0.0671 <dbl> NA NA
7 "" PM<sub… 23 34 37.0 48 235 0.0126 <dbl> NA NA
Let’s see how this looks now. Note that, once again, the tutorial_level_2()
function is a stand-in for all the formatting we applied in the previous section. We now have two empty columns.
our_table = gt(tab_data) |>
tutorial_level_2()
our_table
Marylebone 2003 | Air Quality Monitoring Summary | ||||||||||
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite. | ||||||||||
Species Name | Formula | Statistics | trend1 | polrose | polplot | |||||
---|---|---|---|---|---|---|---|---|---|---|
p25 | median | mean | p75 | max | missing | |||||
Carbon Monoxide | CO |
0.600 | 0.975 | 1.12 | 1.50 | 4.30 | 0.0163 | NA | NA | |
Oxides of Nitrogen | NOx |
76.0 | 139 | 164 | 226 | 749 | 0.0627 | NA | NA | |
Nitrogen Dioxide | NO2 |
36.0 | 52.0 | 56.0 | 72.0 | 206 | 0.0627 | NA | NA | |
Ozone | O3 |
2.00 | 4.00 | 7.67 | 11.0 | 70.0 | 0.0368 | NA | NA | |
Sulfur Dioxide | SO2 |
2.00 | 3.75 | 4.40 | 5.75 | 44.2 | 0.0386 | NA | NA | |
Particulate Matter | PM2.5 |
11.0 | 17.0 | 19.1 | 25.0 | 83.0 | 0.0671 | NA | NA | |
PM10 |
23.0 | 34.0 | 37.0 | 48.0 | 235 | 0.0126 | NA | NA | ||
The data were obtained using the openair package. |
||||||||||
1 Monthly average. The lowest and highest months are indicated with green and red, respectively. |
The final step is a complex one, so let’s unpack it.
The function we use is gt::text_transform()
. This does targeted text transformations. In this case we are transforming an empty cell into a plot.
gt::text_transform()
, much like gt::tab_footnote()
, needs a target specified by a cells_*()
function. Here we are targeting the body of the cells themselves, hence the use of gt::cells_body()
. We specify our empty column (“polrose” or “polplot”).
gt::text_transform()
also takes a function used to transform the cells it is targeting. Here we purrr::map()
over the list of plots in our data_plots
data frame using our openair_image()
function. This effectively transforms each NA value in the column with the corresponding plot in the data_plots
data frame.
This process is why it was important that the order of the data_plots
data frame was the same as the order of the table itself - if they were in the wrong order the plot would be with the wrong pollutant.
our_table = our_table |>
text_transform(
locations = cells_body(columns = polrose),
fn = function(x) {
map(data_plots$polrose, openair_image, height = px(70))
}
) |>
text_transform(
locations = cells_body(columns = polplot),
fn = function(x) {
map(data_plots$polplot, openair_image, height = px(70))
}
)
our_table
Marylebone 2003 | Air Quality Monitoring Summary | ||||||||||
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite. | ||||||||||
Species Name | Formula | Statistics | trend1 | polrose | polplot | |||||
---|---|---|---|---|---|---|---|---|---|---|
p25 | median | mean | p75 | max | missing | |||||
Carbon Monoxide | CO |
0.600 | 0.975 | 1.12 | 1.50 | 4.30 | 0.0163 | |||
Oxides of Nitrogen | NOx |
76.0 | 139 | 164 | 226 | 749 | 0.0627 | |||
Nitrogen Dioxide | NO2 |
36.0 | 52.0 | 56.0 | 72.0 | 206 | 0.0627 | |||
Ozone | O3 |
2.00 | 4.00 | 7.67 | 11.0 | 70.0 | 0.0368 | |||
Sulfur Dioxide | SO2 |
2.00 | 3.75 | 4.40 | 5.75 | 44.2 | 0.0386 | |||
Particulate Matter | PM2.5 |
11.0 | 17.0 | 19.1 | 25.0 | 83.0 | 0.0671 | |||
PM10 |
23.0 | 34.0 | 37.0 | 48.0 | 235 | 0.0126 | ||||
The data were obtained using the openair package. |
||||||||||
1 Monthly average. The lowest and highest months are indicated with green and red, respectively. |
The table is nearing completion, so there is just a couple of final refinements that could be made. Some personal choices as last minute refinements are:
Using gt::tab_options()
to set the font size to be a little smaller. gt::tab_options()
is the gt equivalent to ggplot2::theme()
.
Adding a new spanner using gt::tab_spanner()
, and renaming the new columns using gt::cols_label()
.
Fixing some column widths using gt::cols_width()
. Making all of the statistics columns the same width makes the table look a little more attractive!
Using gt::cols_align()
to align the “formula” column to be centred, which looks a little nicer.
our_table = our_table |>
tab_options(table.font.size = 12) |>
tab_spanner(label = "Visualisations", columns = trend:polplot) |>
cols_label(polrose = "Poll. Rose",
polplot = "Polar Plt.") |>
cols_width(c(p25, median, mean, p75, max, missing) ~ px(50),
name ~ px(120)) |>
cols_align(align = "center", columns = 2)
our_table
Marylebone 2003 | Air Quality Monitoring Summary | ||||||||||
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite. | ||||||||||
Species Name | Formula | Statistics | Visualisations | |||||||
---|---|---|---|---|---|---|---|---|---|---|
p25 | median | mean | p75 | max | missing | trend1 | Poll. Rose | Polar Plt. | ||
Carbon Monoxide | CO |
0.600 | 0.975 | 1.12 | 1.50 | 4.30 | 0.0163 | |||
Oxides of Nitrogen | NOx |
76.0 | 139 | 164 | 226 | 749 | 0.0627 | |||
Nitrogen Dioxide | NO2 |
36.0 | 52.0 | 56.0 | 72.0 | 206 | 0.0627 | |||
Ozone | O3 |
2.00 | 4.00 | 7.67 | 11.0 | 70.0 | 0.0368 | |||
Sulfur Dioxide | SO2 |
2.00 | 3.75 | 4.40 | 5.75 | 44.2 | 0.0386 | |||
Particulate Matter | PM2.5 |
11.0 | 17.0 | 19.1 | 25.0 | 83.0 | 0.0671 | |||
PM10 |
23.0 | 34.0 | 37.0 | 48.0 | 235 | 0.0126 | ||||
The data were obtained using the openair package. |
||||||||||
1 Monthly average. The lowest and highest months are indicated with green and red, respectively. |
These are last-minute, personal refinements, but demonstrate the depth of the gt package. It is a good idea to explore the functions of gt and gtExtras further to really get down to what they both can do!
The key achievement of this section was to add the openair plots to our gt table. The “Code” tab contains all the code needed to recreate the table “Table” tab, slightly refined for readability (putting things in a better order, and with comments).
# Calculate averages and get data for sparkline, and combine
data = mydata |>
mutate(year = lubridate::year(date)) |>
filter(year == 2003) |>
pivot_longer(nox:pm25, names_to = "species", values_to = "conc") |>
mutate(species = factor(species, c(
"co", "nox", "no2", "o3", "so2", "pm25", "pm10"
))) |>
arrange(species)
averages = data |>
group_by(species) |>
summarise(
p25 = quantile(conc, .25, na.rm = T),
median = median(conc, na.rm = T),
mean = mean(conc, na.rm = T),
p75 = quantile(conc, .75, na.rm = T),
max = max(conc, na.rm = T),
missing = sum(is.na(conc)) / n()
)
spark = data |>
timeAverage(avg.time = "month", type = "species") |>
group_by(species) |>
summarise(trend = list(conc))
all_data = left_join(averages, spark) |>
mutate(polrose = NA,
polplot = NA)
# Function to tabulate plots
openair_image <- function(plot_object, height) {
if (is.numeric(height)) {
height <- paste0(height, "px")
}
if (inherits(plot_object, "trellis")) {
plot_object <- list(plot_object)
}
invisible(vapply(
seq_along(plot_object),
FUN.VALUE = character(1),
USE.NAMES = FALSE,
FUN = function(x) {
filename <- paste0("temp_openair_", formatC(x,
width = 4,
flag = "0"
), ".png")
png(
filename = filename,
res = 300,
width = 4 * 300,
height = 4 * 300,
bg = "transparent"
)
print(plot_object[[x]])
dev.off()
on.exit(file.remove(filename))
local_image(filename = filename, height = height)
}
))
}
# List of plots
data_plots = data |>
nest_by(species) |>
mutate(polrose = list(
pollutionRose(
data,
pollutant = "conc",
key = FALSE,
par.settings = list(axis.line = list(col = "transparent"))
)$plot
),
polplot = list(polarPlot(
data,
pollutant = "conc",
key = FALSE,
par.settings = list(axis.line = list(col = "transparent"))
)$plot)) |>
arrange(species)
# Pre-format table data
tab_data = all_data |>
mutate(
name = case_when(
species == "co" ~ "Carbon Monoxide",
species == "nox" ~ "Oxides of Nitrogen",
species == "no2" ~ "Nitrogen Dioxide",
species == "o3" ~ "Ozone",
species == "so2" ~ "Sulfur Dioxide",
species == "pm25" ~ "Particulate Matter",
species == "pm10" ~ ""
),
species = case_when(
species == "co" ~ "CO",
species == "nox" ~ "NO<sub>x</sub>",
species == "no2" ~ "NO<sub>2</sub>",
species == "o3" ~ "O<sub>3</sub>",
species == "so2" ~ "SO<sub>2</sub>",
species == "pm25" ~ "PM<sub>2.5</sub>",
species == "pm10" ~ "PM<sub>10</sub>"
)
) |>
relocate(name, .before = species)
# Tabulate
our_table = gt(tab_data) |>
# Set theme as ESPN
gt_theme_espn() |>
# Format columns
fmt_number(
columns = where(is.numeric),
n_sigfig = 3,
drop_trailing_zeros = T
) |>
fmt_percent(columns = missing) |>
fmt_markdown(columns = species) |>
# Column labels, widths, and alignments
cols_label(
name = "Species Name",
species = "Formula",
polrose = "Poll. Rose",
polplot = "Polar Plt."
) |>
cols_width(c(p25, median, mean, p75, max, missing) ~ px(50),
name ~ px(120)) |>
cols_align(align = "left", columns = 1) |>
cols_align(align = "center", columns = 2) |>
# Format table (add spanners, title, footnotes, set font size, etc.)
tab_options(table.font.size = 12) |>
tab_spanner(label = "Statistics", columns = p25:missing) |>
tab_spanner(label = "Visualisations", columns = trend:polplot) |>
tab_header(
title = md("<b>Marylebone 2003</b> | Air Quality Monitoring Summary"),
subtitle = md(
"The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite."
)
) |>
tab_source_note(source_note = md("The data were obtained using the <code>openair</code> package.")) |>
tab_footnote(locations = cells_column_labels(columns = trend),
footnote = "Monthly average. The lowest and highest months are indicated with green and red, respectively.") |>
# Format sparkline
gt_plt_sparkline(
column = trend,
palette = c("black", "black", "chartreuse3", "red", "black"),
same_limit = F
) |>
# replace empty columns with plots
text_transform(
locations = cells_body(polrose),
fn = function(x) {
map(data_plots$polrose, openair_image, height = px(70))
}
) |>
text_transform(
locations = cells_body(polplot),
fn = function(x) {
map(data_plots$polplot, openair_image, height = px(70))
}
)
Marylebone 2003 | Air Quality Monitoring Summary | ||||||||||
The data contains hourly measurements of air pollutant concentrations, wind speed and wind direction collected at the Marylebone (London) air quality monitoring supersite. | ||||||||||
Species Name | Formula | Statistics | Visualisations | |||||||
---|---|---|---|---|---|---|---|---|---|---|
p25 | median | mean | p75 | max | missing | trend1 | Poll. Rose | Polar Plt. | ||
Carbon Monoxide | CO |
0.600 | 0.975 | 1.12 | 1.50 | 4.30 | 0.0163 | |||
Oxides of Nitrogen | NOx |
76.0 | 139 | 164 | 226 | 749 | 0.0627 | |||
Nitrogen Dioxide | NO2 |
36.0 | 52.0 | 56.0 | 72.0 | 206 | 0.0627 | |||
Ozone | O3 |
2.00 | 4.00 | 7.67 | 11.0 | 70.0 | 0.0368 | |||
Sulfur Dioxide | SO2 |
2.00 | 3.75 | 4.40 | 5.75 | 44.2 | 0.0386 | |||
Particulate Matter | PM2.5 |
11.0 | 17.0 | 19.1 | 25.0 | 83.0 | 0.0671 | |||
PM10 |
23.0 | 34.0 | 37.0 | 48.0 | 235 | 0.0126 | ||||
The data were obtained using the openair package. |
||||||||||
1 Monthly average. The lowest and highest months are indicated with green and red, respectively. |
gt is an incredibly powerful tool to create impressive HTML tables, with features well suited to presenting air quality data. We’ve progressed through three levels of complexity:
Getting to grips with gt, clearly presenting air quality monitoring statistics in an attractive way.
Incorporating visualisations into the plot, combining the classic time series plot with our HTML table, allowing for at-a-glance understanding of data trends.
Adding industry-standard openair lattice-based plots to our table, enabling clients to explore more closely the pollutants in which they are most interested.
Even in these three sections we have just scratched the surface of what gt can do for presenting air quality data. It would be great to see more gt tables out in the wild, being used to great effect by the global air quality community.