Fitted-model labels with ‘ggpmisc’ and ‘gganimate’

Animated ggplots with model equations, R2, F, P, etc.

Plotting examples
Author

Pedro J. Aphalo

Published

2023-06-24

Modified

2023-06-24

Abstract

Example R code for animated plots based on package ggplot2 using geometries defined in package ggpp and statistics from package ggpmisc together with package gganimate. Functions from package gganimate implement different types of animations based on extension of the Grammar of Graphics.

Keywords

ggplot2 pkg, ggpp pkg, ggpmisc pkg, gganimate pkg, plot annotations, model equations

Warning

Animations are possible with ‘ggpmisc’ (>= 0.5.3). In earlier versions the statistics that generate labels of model equations and various parameters were incompatible with package ‘gganimate’.

Package ‘gganimate’ is widely compatible with ggplot2 and extensions, however, before version 0.5.3, ‘ggpmisc’ assumed that ggplot2’s variable group was always integer. However, ‘gganimate’ changes it into character to add indexing information about scenes and transitions. A different approach to decoding the original groups’ indices is now used when needed to ensure compatibility.

Tip

In this page code chunks are “folded” so as to decrease the clutter when searching for examples. Above each plot you will find a small triangle followed by “Code”. Clicking on the triangle “unfolds” the code chunk making visible the R code used to produce the plot. Except for the loading of packages shown in section Preliminaries code examples are in most cases self contained. When they are not, this is indicated by a comment.

All “words” defined in base R or in extension packages are linked to the corresponding HTML-rendered help pages.

The code in the chunks can be copied by clicking on the top right corner, where an icon appears when the mouse cursor hovers over the code listing.

Important

One needs to always check that annotations do not occlude anything significant, such as observations in the base plot. This needs special care when using annotations together with batch plotting. Either ensure that the scale limits of the base plot are expanded to avoid overlap or that the layer with the equations is the lowest one, i.e., added to the plot first.

1 Annotations showing parameters estimates

Fitted model equations and other related estimates can be useful in plots as they ensure that graphical representation as a curve and numerical values for parameters that cannot be read from the curve itself are displayed as text on the same plot.

See page ggplot fitted-model equations with ‘ggpmisc’ for explanations about the annotations. Here I focus on how to animate similar plots to those described earlier in this other page. Although the examples include annoations the animation code is applicable many other plots built with ‘ggplot2’ together with many other extension packages.

Note

When adding annotations one should be aware that they add clutter to a plot, and clutter can make it difficult to see the patterns of interest in the data represented as points or curves.

So, as usual, less is more, include those annotations that are relevant to the message conveyed by a plot and nothing more.

Animations that gradually add layers to a plot can help solve this problem. One can first show observations then overplot the prediction from a fitted model and finally show annotations with numeriacal values.

2 Preliminaries

The code used is shown on-demand above each plot and can be copied. We first load the packages we will use.

When package ‘ggpmisc’ is loaded and attached, packages ‘ggpp’ and ‘ggplot2’ are also attached. The only function from ‘ggplot2’ that is redefined by ‘ggpp’ is annotate(), which remains backwards compatible with ‘ggplot2’.

3 Linear correlation

I first generate some artificial data to use in the plotting examples.

Code
set.seed(94321)
x <- (1:100) / 10
yA <- x + rnorm(length(x), sd = 2)
yB <- x + rnorm(length(x), sd = 8)
my.data <- data.frame(x = rep(x, 2),
                      y = c(yA, yB),
                      group = rep(c("A", "B"), rep(length(x), 2L)))

The examples in this section make use of stat_correlation() from package ‘ggpmisc’. Please, see its help page for details.

Pearson correlation shown but Kendall and Spearman methods are also implemented.

Using group aesthetic we create two discrete scenes.

Code
p <- 
  ggplot(my.data, 
       aes(x, y, group = group)) +
  geom_point() +
  stat_correlation(vstep = 0, 
                   mapping = use_label(c("R", "P", "n")),
                   label.x = "right",
                   label.y = "bottom") +  
  transition_states(group,
                    transition_length = 1,
                    state_length = 3) + 
  enter_fade() +
  exit_fade()

animate(p, 
        fps=8, 
        renderer = gifski_renderer(loop = TRUE), 
        dev = 'png',
        width = 8, height = 5, units = "in", res = 96, pointsize = 16)

Confidence interval for R. Using colour aesthetic we create two discrete scenes based on variable group.

Code
p <-
  ggplot(my.data, aes(x, y, colour = group)) +
  geom_point() +
  stat_correlation(mapping = use_label(c("R", "R.confint")),
                   r.conf.level = 0.99,
                   vstep = 0) +  
  scale_color_viridis_d(option = "A", begin = 0.33, end = 0.67) +
  expand_limits(y = 30) +
  transition_states(group,
                    transition_length = 1,
                    state_length = 3) + 
  enter_fade() +
  exit_fade()

animate(p, 
        fps=8, 
        renderer = gifski_renderer(loop = TRUE), 
        dev = 'png',
        width = 8, height = 5, units = "in", res = 96, pointsize = 16)

Highlighting based on estimates. In this example the colour depends on the estimated value for R, but it is possible to use other estimates like P-value to decide the colour used to display the R estimate. Here we first show the observations followed by the correlation estimate as a label. Transitions are based on layers rather than grouping.

Code
p <-
  ggplot(my.data, aes(x, y)) +
  geom_point() +
  stat_correlation(mapping = 
                     aes(color = ifelse(after_stat(cor) > 0.5,
                                        "red", "black"))) +
  scale_color_identity() +
  facet_wrap(~group)  +
  transition_layers(transition_length = 1,
                    layer_length = 3)

animate(p, 
        fps=8, end_pause = 10,
        renderer = gifski_renderer(loop = TRUE), 
        dev = 'png',
        width = 8, height = 5, units = "in", res = 96, pointsize = 16)

4 Polynomials

Polynomials are linear models, but many other linear models exist. The most common case is linear regression. Polynomials are very frequently used, and in the case of linear regression the role of parameters can be easily recognized.

The examples in this section make use of stat_poly_line() and stat_poly_eq() from package ‘ggpmisc’. Please, see the help page for stat_poly_line() and the help page for stat_poly_eq() for details.

I generate different artificial data to use in the polynomial regression examples.

Code
set.seed(4321)
# generate artificial data
x <- 1:100
y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4)
y <- y / max(y)
my.data <- data.frame(x, 
                      y, 
                      group = c("A", "B"), 
                      y2 = y * c(1, 2) + c(0, 0.2),
                      block = c("a", "a", "b", "b"),
                      wt = sqrt(x))

Fitted model equation, is available for polynomials with no missing terms. Here I add different labels in different plot layers and animate the plot over layers. (In this example, the model is fitted four times to the same data.)

Code
formula <- y ~ poly(x, 3, raw = TRUE)
p <-
  ggplot(my.data, aes(x, y)) +
  geom_point() +
  stat_poly_line(formula = formula) +
  stat_poly_eq(mapping = use_label("eq"), formula = formula, label.y = 0.95) +
  stat_poly_eq(mapping = use_label(c("F", "P")), formula = formula, label.y = 0.87) +
  stat_poly_eq(mapping = use_label("R2"), formula = formula, label.y = 0.80) +
  transition_layers(transition_length = 3,
                    layer_length = 5)

animate(p, 
        fps=8, end_pause = 20,
        renderer = gifski_renderer(loop = TRUE), 
        dev = 'png',
        width = 8, height = 5, units = "in", res = 96, pointsize = 16)

A similar plot but animated by group, which can be a good way of highlighting differences between groups of observations.

Code
formula <- y ~ poly(x, 3, raw = TRUE)

p <-
  ggplot(my.data, aes(x, y2, group = group)) +
  geom_point() +
  stat_poly_line(formula = formula) +
  stat_poly_eq(aes(label = after_stat(eq.label)), 
               formula = formula, 
               vstep = 0) +
   transition_states(group,
                    transition_length = 1,
                    state_length = 3) + 
  enter_fade() +
  exit_fade()

animate(p, 
        fps=8, 
        renderer = gifski_renderer(loop = TRUE), 
        dev = 'png',
        width = 8, height = 5, units = "in", res = 96, pointsize = 16) 

Using colours to indicate the groups to which equations correspond is not always possible or if possible not the best design. Here I use labels to the left of each equation and build the plot one group at a time.

Code
formula <- y ~ poly(x, 3, raw = TRUE)

p <-
  ggplot(my.data, aes(x, y2, linetype = group, grp.label = group)) +
  geom_point() +
  stat_poly_line(formula = formula, color = "black") +
  stat_poly_eq(aes(label = after_stat(paste("bold(", grp.label, "*':')~~~", 
                                      eq.label, sep = ""))),
               formula = formula) +
   transition_states(group,
                    transition_length = 3,
                    state_length = 3) + 
  enter_fade() +
  exit_fade() +
  shadow_mark()

animate(p, 
        fps=8, end_pause = 16,
        renderer = gifski_renderer(loop = TRUE), 
        dev = 'png',
        width = 8, height = 5, units = "in", res = 96, pointsize = 16) 

Facets are also supported.

Code
formula <- y ~ poly(x, 3, raw = TRUE)

p <-
  ggplot(my.data, aes(x, y2, group = group)) +
  geom_point() +
  stat_poly_line(formula = formula) +
  stat_poly_eq(aes(label = after_stat(eq.label)),
               formula = formula) +
  facet_wrap(facets = vars(group), scales = "free_y", ncol = 1) +
   transition_states(group,
                    transition_length = 8,
                    state_length = 8) + 
  shadow_mark(alpha = 0.2, past = TRUE, future = TRUE)

animate(p, 
        fps=8,
        renderer = gifski_renderer(loop = TRUE), 
        dev = 'png',
        width = 8, height = 5, units = "in", res = 96, pointsize = 16) 

In the stats defined in ‘ggpmsic’ the fit method function can modify the model passed as argument because the one used to build the equation is retrieved from the fitted model object when possible. In this example, a linear regression is fitted if the slope is significant, but if not the mean is reported instead, both graphically and numerically. Here it is animated over plot layers. The animation takes place in parallel in all the panels in the plot.

Code
# user defined fit method removes the slope if 
# the slope is not significant
poly_or_mean <- function(formula, data, ...) {
   fm <- lm(formula = formula, data = data, ...)
   if (anova(fm)[["Pr(>F)"]][1] > 0.1) {
      lm(formula = y ~ 1, data = data, ...)
   } else {
      fm
   }
}

# we create a plot as usual, but with our method
p <-
  ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  stat_poly_line(method = "poly_or_mean") +
  stat_poly_eq(method = poly_or_mean,
               aes(label = after_stat(eq.label)),
               label.x = "right") +
  theme(legend.position = "bottom") +
  facet_wrap(~class, ncol = 2) +
  transition_layers(transition_length = 3,
                    layer_length = 5)

animate(p, 
        fps=8, end_pause = 20,
        renderer = gifski_renderer(loop = TRUE), 
        dev = 'png',
        width = 8, height = 10, units = "in", res = 96, pointsize = 16)

Faceting works also with free x and/or y scales in panels when the position of labels is set by a character string. Animation as above remains possible.

Code
formula <- y ~ poly(x, 3, raw = TRUE)

p <-
  ggplot(my.data, aes(x, y2, fill = block)) +
  geom_point(shape = 21, size = 3) +
  stat_poly_line(aes(colour = block), formula = formula) +
  stat_poly_eq(use_label("R2"), size = 3,
               geom = "label_npc", alpha = 0.33,
               formula = formula,
               label.x = "right", label.y = "bottom",
               rr.digits = 3) +
  stat_poly_eq(use_label(c("F", "P")),
               size = 3,
               geom = "label_npc", alpha = 0.33, 
               formula = formula) +
  facet_wrap(~group, scales = "free_y") +
  theme(legend.position = "top") +
  transition_layers(transition_length = 3,
                    layer_length = 5)

animate(p, 
        fps=8, end_pause = 20,
        renderer = gifski_renderer(loop = TRUE), 
        dev = 'png',
        width = 8, height = 5, units = "in", res = 96, pointsize = 16)

The same plot with multiple panels can be animated over groups, in this case using variable block to display one of two block at a time, using fading to make it easier to assess the differences.

Code
formula <- y ~ poly(x, 3, raw = TRUE)

p <-
  ggplot(my.data, aes(x, y2, fill = block)) +
  geom_point(shape = 21, size = 3) +
  stat_poly_line(aes(colour = block), formula = formula) +
  stat_poly_eq(use_label("R2"), size = 3,
               geom = "label_npc", alpha = 0.33,
               formula = formula,
               label.x = "right", label.y = "bottom",
               rr.digits = 3,
               vstep = 0) +
  stat_poly_eq(use_label(c("F", "P")),
               size = 3,
               geom = "label_npc", alpha = 0.33, 
               formula = formula,
               vstep = 0) +
  facet_wrap(~group, scales = "free_y") +
  theme(legend.position = "top") +
  transition_states(block,
                    transition_length = 8,
                    state_length = 8) +
  enter_fade() +
  exit_fade()
  
animate(p, 
        fps=8, end_pause = 20,
        renderer = gifski_renderer(loop = TRUE), 
        dev = 'png',
        width = 8, height = 5, units = "in", res = 96, pointsize = 16)

5 Major axis regression

If x and y are both subject to random errors, and none is clearly the cause of the other, we should use major axis regression instead of linear regression.

The examples in this section make use of stat_ma_line() and stat_ma_eq() from package ‘ggpmisc’. Please, see the help page for stat_ma_line() and the help page for stat_poly_eq() for details.

Another set of artificial data with rather high random variation on both x and y.

Code
set.seed(94321)
a <- (1:100) / 10
x <- a + rnorm(length(x), sd = 1)
y <- a + rnorm(length(x), sd = 2)
my_linear.data <- data.frame(x = x, y = y)

For this data set the most useful animations are over layers.

Code
p <-
  ggplot(my_linear.data, aes(x, y)) +
  geom_point() +
  stat_ma_line() +
  stat_ma_eq(mapping = use_label(c("R2", "eq"))) +
  transition_layers(transition_length = 3,
                    layer_length = 5)

animate(p, 
        fps=8, end_pause = 20,
        renderer = gifski_renderer(loop = TRUE), 
        dev = 'png',
        width = 8, height = 5, units = "in", res = 96, pointsize = 16)

6 Quantile regression

Quantile regression is frequently used is some disciplines like Economics but less frequently in other fields. It is robust and can be rather easily interpreted based on its similarity to the well known box plots.

The examples in this section make use of stat_quant_line(), stat_quant_band() and stat_quant_eq() from package ‘ggpmisc’. Please, see the help page for stat_quant_line(), help page for stat_quant_band() and the help page for stat_quant_eq() for details.

Caution

Quantile regression fits are done by numerical approximation, and frequently generate warnings for non-unique solutions or similar problems. Their importance needs to be assessed by users. In the examples below they seem mostly unimportant given the large number of observations and have been silenced in the output.

A median regression with a band limited by the upper and lower quartile regressions.

Code
# the default for formula is y ~ x
p <-
  ggplot(my_linear.data, aes(x, y)) +
  geom_point() +
  stat_quant_band() +
  stat_quant_eq(quantiles = 0.5) +
  transition_layers(transition_length = 3,
                    layer_length = 5)

animate(p, 
        fps=8, end_pause = 20,
        renderer = gifski_renderer(loop = TRUE), 
        dev = 'png',
        width = 8, height = 5, units = "in", res = 96, pointsize = 16)

Median regression through the origin.

Code
# model with intercept = 0
formula <- y ~ x + 0

p <-
  ggplot(my_linear.data, aes(x, y)) +
  geom_point() +
  stat_quant_band(formula = formula) +
  stat_quant_eq(formula = formula, quantiles = 0.5) +
  transition_layers(transition_length = 3,
                    layer_length = 5)

animate(p, 
        fps=8, end_pause = 20,
        renderer = gifski_renderer(loop = TRUE), 
        dev = 'png',
        width = 8, height = 5, units = "in", res = 96, pointsize = 16)

Quantile regressions of y on x and x on y.

Code
# the default for formula is y ~ x
p <-
  ggplot(my_linear.data, aes(x, y)) +
  geom_point() +
  stat_quant_band(color = "blue") +
  stat_quant_eq(quantiles = 0.5, color = "blue") +
  stat_quant_band(color = "red", 
                 orientation = "y") +
  stat_quant_eq(quantiles = 0.5, color = "red", 
               orientation = "y", label.y = 0.9) +
  transition_layers(transition_length = 3,
                    layer_length = 5)

animate(p, 
        fps=8, end_pause = 20,
        renderer = gifski_renderer(loop = TRUE), 
        dev = 'png',
        width = 8, height = 5, units = "in", res = 96, pointsize = 16)

Two quantiles are by default plotted as lines,

Code
p <-
  ggplot(my.data, aes(x, y)) +
  geom_point() +
  stat_quant_line(formula = y ~ poly(x, 2), quantiles = c(0.05, 0.95)) +
  stat_quant_eq(formula = y ~ poly(x, 2), quantiles = c(0.05, 0.95)) +
  transition_layers(transition_length = 3,
                    layer_length = 5)

animate(p, 
        fps=8, end_pause = 20,
        renderer = gifski_renderer(loop = TRUE), 
        dev = 'png',
        width = 8, height = 5, units = "in", res = 96, pointsize = 16)

Equations labelled by quantile.

Code
p <-
  ggplot(my.data, aes(x, y)) +
  geom_point() +
  stat_quant_band(formula = y ~ poly(x, 2), 
                  color = "black", fill = "grey60") +
  stat_quant_eq(aes(label = paste(after_stat(grp.label), "*\": \"*",
                                  after_stat(eq.label), sep = "")),
                formula = y ~ poly(x, 2)) +
  theme_classic()  +
  transition_layers(transition_length = 3,
                    layer_length = 5)

animate(p, 
        fps=8, end_pause = 20,
        renderer = gifski_renderer(loop = TRUE), 
        dev = 'png',
        width = 8, height = 5, units = "in", res = 96, pointsize = 16)

Equations labelled by group and quantile, animated over groups, based on variable group.

Code
p <-
  ggplot(my.data, aes(x, y, group = group, linetype = group, 
                    shape = group, grp.label = group)) +
  geom_point() +
  stat_quant_line(formula = y ~ poly(x, 2), 
                  quantiles = c(0.1, 0.9), 
                  color = "black") +
  stat_quant_eq(aes(label = paste(after_stat(grp.label), "*\": \"*",
                                  after_stat(eq.label), sep = "")),
                formula = y ~ poly(x, 2), 
                quantiles = c(0.1, 0.9)) +
  theme_classic() +
  transition_states(group,
                    transition_length = 0, # transition of lines fails
                    state_length = 8)

animate(p, 
        fps=0.1,
        nframes = 2,
        renderer = gifski_renderer(loop = TRUE), 
        dev = 'png',
        width = 8, height = 5, units = "in", res = 96, pointsize = 16)

7 Non-linear models

The examples in previous sections were based on ready assembled labels. Here we show examples of how to generate labels for equations and other parameter estimates with an approach applicable to many different types of models.

The examples in this section make use of stat_fit_tidy() from package ‘ggpmisc’. Please, see its help page for details. Package ‘broom’ must be installed before runing the examples below.

We use as example a fit of the Michaelis-Menthen equation of reaction kinetics, a function that is non-linear in its parameters.

# We use methods from package 'broom'
library(broom)

Animation works as ín the examples in previous sections, here over layers.

Code
micmen.formula <- y ~ SSmicmen(x, Vm, K) 

p <-
  ggplot(Puromycin, aes(conc, rate, colour = state)) +
  geom_point() +
  geom_smooth(method = "nls", 
              formula = micmen.formula,
              se = FALSE) +
  stat_fit_tidy(method = "nls", 
                method.args = list(formula = micmen.formula),
                label.x = "right",
                label.y = "bottom",
                aes(label = paste("V[m]~`=`~", signif(after_stat(Vm_estimate), digits = 3),
                                  "%+-%", signif(after_stat(Vm_se), digits = 2),
                                  "~~~~K~`=`~", signif(after_stat(K_estimate), digits = 3),
                                  "%+-%", signif(after_stat(K_se), digits = 2),
                                  sep = "")),
                parse = TRUE)  +
  transition_layers(transition_length = 3,
                    layer_length = 5)

animate(p, 
        fps=8, end_pause = 20,
        renderer = gifski_renderer(loop = TRUE), 
        dev = 'png',
        width = 8, height = 5, units = "in", res = 96, pointsize = 16)

The animation is in this variation of the plot over groups based on variable state.

Code
micmen.formula <- y ~ SSmicmen(x, Vm, K) 

p <-
  ggplot(Puromycin, aes(conc, rate, colour = state)) +
  geom_point() +
  geom_smooth(method = "nls", 
              formula = micmen.formula,
              se = FALSE) +
  stat_fit_tidy(method = "nls", 
                method.args = list(formula = micmen.formula),
                label.x = "center",
                label.y = "bottom",
                vstep = 0, hstep = -0.3,
                aes(label = paste("V~`=`~frac(", signif(after_stat(Vm_estimate), digits = 2), "~C,",
                                  signif(after_stat(K_estimate), digits = 2), "+C)",
                                  sep = "")),
                parse = TRUE) +
  labs(x = "C", y = "V")  +
  transition_states(state,
                    transition_length = 0, # transition of lines fails
                    state_length = 8)

animate(p, 
        fps=0.125,
        nframes = 2,
        renderer = gifski_renderer(loop = TRUE), 
        dev = 'png',
        width = 8, height = 5, units = "in", res = 96, pointsize = 16)

7.1 Alternatives

Tip

In all cases it is possible to do the model fitting before creating the plot, assembling labels in one’s own R code and passing them to ggplot() as data. The possibilities are nearly unlimited but implementing them, depending on one’s own familiarity with R and ‘ggplot2’ can become time consuming. I have in part written ‘ggpp’ and ‘ggpmisc’ to avoid as a user having to remind myself of how to write such code repeatedly. My hope is that others will also save time and effort by using the packages I have developed.