In this page code chunks are “folded” so as to decrease the clutter when searching for examples. Above each plot you will find one or more “folded” code chuncks signalled by a small triangle followed by “Code”. Clicking on the triangle “unfolds” the code chunk making visible the R code used to produce the plot.

The code in the chunks can be copied by clicking on the top right corner, where an icon appears when the mouse cursor hovers over the code listing.

The `</> Code`

drop down menu to the right of the page title makes it possible to unfold all code chunks and to view the Quarto source of the whole web page.

Names of functions and other R objects are linked to the corresponding on-line help pages. The names of R extension packages are linked to their documentation web sites when available.

# 1 Introduction

Here you will find examples of ggplots with labels for fitted model equations, various parameter estimates. The labels have been added to the plots with geometries defined in package ggpp and statistics from package ggpmisc both available at CRAN.

One needs to always check that annotations do not occlude anything significant, such as observations in the base plot. This needs special care when using annotations together with batch plotting. Either ensure that the scale limits of the base plot are expanded to avoid overlap or that the layer with the equations is the lowest one, i.e., added to the plot first.

# 2 Data labels and plot annotations

Data labels add textual information directly related to individual data points (shown as glyphs). Text position in this case is dependent on the scales used to represent data points. Text is usually displaced so that it does not occlude the glyph representing the data point and when the link to the data point is unclear, this link is signalled with a line segment or arrow. Data labels are distinct from annotations in that they contribute directly to the representation of data on a plot or map.

Annotations differ from data labels, in that their position is decoupled from their meaning. Insets can be thought as larger, but still self-contained annotations. In most cases the reading of inset tables and plots depends only weakly on the plot or map in which they are included.

In the case of annotations and insets the designer of a data visualization has the freedom to locate them anywhere, as long as they do not occlude features used to describe data. I will use the term annotation irrespective if the “labels” are textual or graphical.

The equations showing parameter estimates from models fit to data are normally displayed as annotations. However, it is also possible to consider them data labels “connected” to individual curves representing the corresponding model fits.

# 3 Annotations showing parameters estimates

Fitted model equations and other related estimates can be useful in plots as they ensure that graphical representation as a curve and numerical values for parameters that cannot be read from the curve itself are displayed as text on the same plot.

When adding annotations one should be aware that they add clutter to a plot, and clutter can make it difficult to see the patterns of interest in the data represented as points or curves.

So, as usual, less is more: *include those annotations that are relevant to the message conveyed by a plot and nothing more.*

Consider:

**Add model fit equations only if the values of the parameters are interpretable.**For example, in a linear regression slope and intercept are interpretable as is the reaction constant in the Michaelis-Menten equation.**If you add an estimate of \(R^2\) and/or \(P\)-value make sure that they correspond to the plotted curve.****Display observations in addition to fitted-model curves or smoothers whenever possible.**A curve not supported by observations is usually frowned at. However, if the number of observations is large, they may have to be described by their empirical distribution.**Confidence bands around curves can be used to describe the confidence we can have on the estimated line.**However, be aware that these bands are a property of the fitted model, not the data.**The number of observations, is usually given in the figure legend, and less frequently as an annotation in the figure.**A plot annotation can be automated and is, thus, less likely to not match the number of observations plotted.**Inset tables are not very common, but when computed when the plot is rendered, also ensure that the reported statistics or summaries match the observations shown in the plot.**

Fully automating the construction of fitted model equations is far from trivial for the general case, so I have implemented automation for specific types of models: polynomials fitted by OLS as linear models, polynomials fitted by quantile regression and major-axis regression fits. They are currently available as pairs of statistics with consistent interfaces, with each pair consisting in a curve-plotting statistics and an annotations statistics.

As shown below other model formulas can be rather easily assembled and added as annotations if one is familar with R’s expressions as used for `plotmath`

.

# 4 Preliminaries

The code used is shown on-demand above each plot and can be copied. We first load the packages we will use.

When package ggpmisc is loaded and attached, packages ggpp and ggplot2 are also attached. The only function from ggplot2 that is redefined by ggpp is `annotate()`

, which remains backwards compatible with ggplot2.

# 5 Linear correlation

## Code

```
# generate artifical data for a linear relationship
# two groups with different standard deviation around a slope = 1
set.seed(94321)
x <- (1:100) / 10
yA <- x + rnorm(length(x), sd = 2)
yB <- x + rnorm(length(x), sd = 8)
df1 <- data.frame(x = rep(x, 2),
y = c(yA, yB),
group = rep(c("A", "B"), rep(length(x), 2L)))
```

The examples in this section make use of `stat_correlation()`

from package ggpmisc.

Pearson correlation.

## Code

```
ggplot(subset(df1, group == "A"),
aes(x, y)) +
geom_point() +
stat_correlation()
```

## Code

```
ggplot(subset(df1, group == "A"),
aes(x, y)) +
geom_point() +
stat_correlation(small.r = TRUE)
```

Spearman correlation. (The default is to not compute the confidence interval.)

## Code

```
ggplot(subset(df1, group == "A"),
aes(x, y)) +
geom_point() +
stat_correlation(method = "spearman")
```

`Warning in compute_group(...): Skipping bootstrap estimation as 'boot.R' < 50`

Kendall correlation. (The default is to not compute the confidence interval.)

## Code

```
ggplot(subset(df1, group == "A"),
aes(x, y)) +
geom_point() +
stat_correlation(method = "kendall")
```

`Warning in compute_group(...): Skipping bootstrap estimation as 'boot.R' < 50`

Select which labels to show in plot.

## Code

```
ggplot(subset(df1, group == "A"), aes(x, y)) +
geom_point() +
stat_correlation(mapping = use_label(c("R", "P", "n")),
method = "kendall")
```

`Warning in compute_group(...): Skipping bootstrap estimation as 'boot.R' < 50`

Select which labels to show in plot.

## Code

```
ggplot(subset(df1, group == "A"), aes(x, y)) +
geom_point() +
stat_correlation(mapping = use_label(c("R", "t", "P", "n")))
```

Confidence interval for R (Pearson’s correlation).

## Code

```
ggplot(subset(df1, group == "A"), aes(x, y)) +
geom_point() +
stat_correlation(mapping = use_label(c("R", "R.confint")))
```

Confidence interval for Spearman’s correlation.

## Code

```
ggplot(subset(df1, group == "A"),
aes(x, y)) +
geom_point() +
stat_correlation(mapping = use_label(c("R", "R.confint")),
r.conf.level = 0.95,
method = "spearman")
```

Grouping supported.

## Code

```
ggplot(df1, aes(x, y, color = group)) +
geom_point() +
stat_correlation()
```

Facets supported.

## Code

```
ggplot(df1, aes(x, y)) +
geom_point() +
stat_correlation(label.x = "right", label.y = "bottom") +
facet_wrap(~group)
```

Highlighting based on estimates, here estimated *R* but it is possible to use other estimates like *P*-value.

## Code

```
ggplot(df1, aes(x, y)) +
geom_point() +
stat_correlation(mapping =
aes(color = ifelse(after_stat(cor) > 0.5,
"red", "black"))) +
scale_color_identity() +
facet_wrap(~group)
```

## 5.1 Alternatives

Package ‘ggpubr’ provides alternative approaches to correlation testing and reporting.

# 6 Polynomials

Polynomials are linear models, but many other linear models exist. The most common case is linear regression. They are very frequently used, and in the case of linear regression the role of parameters can be easily recognized.

The examples in this section make use of `stat_poly_line()`

and `stat_poly_eq()`

from package ggpmisc.

The model corresponding to lineal regression is a first degree polynomial.

## Code

```
ggplot(df1, aes(x, y)) +
geom_point() +
stat_poly_line() +
stat_poly_eq(use_label("eq"),
label.x = "right", label.y = "bottom") +
facet_wrap(~group)
```

Most likely the variables mapped onto aesthetics `x`

and `y`

have different names. If we use a symbol to represent these variables, we may want to use these symbols in the equation.

## Code

```
ggplot(df1, aes(x, y)) +
geom_point() +
stat_poly_line() +
stat_poly_eq(use_label("eq"),
eq.with.lhs = "italic(t)~`=`~",
eq.x.rhs = "~italic(E)",
label.x = "right", label.y = "bottom") +
labs(x = expression("Irradiance, "*italic(E)~~(W~m^{-2})),
y = expression("Temperature, "*italic(t)~~(degree*"C"))) +
facet_wrap(~group)
```

\(R^2\) is the default label, as its estimate is always available.

## Code

```
formula <- y ~ poly(x, 3, raw = TRUE)
ggplot(df2, aes(x, y)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(formula = formula)
```

Fitted model equation, available for polynomials with no missing terms.

## Code

```
formula <- y ~ poly(x, 3, raw = TRUE)
ggplot(df2, aes(x, y)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(mapping = use_label("eq"), formula = formula)
```

\(R_\mathrm{adj}^2\) and *P*-value added separately.

## Code

```
formula <- y ~ poly(x, 3, raw = TRUE)
ggplot(df2, aes(x, y)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(mapping = use_label("adj.R2"), formula = formula) +
stat_poly_eq(mapping = use_label("P"), label.x = "right", label.y = "bottom", size = 3,
formula = formula)
```

Fitted model equation and \(R_\mathrm{adj}^2\).

## Code

```
formula <- y ~ poly(x, 3, raw = TRUE)
ggplot(df2, aes(x, y)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(mapping = use_label(c("eq", "adj.R2"), sep = "~~italic(\"with\")~~"),
formula = formula)
```

Equation with no left-hand-side (lhs).

## Code

```
formula <- y ~ poly(x, 3, raw = TRUE)
ggplot(df2, aes(x, y)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(mapping = use_label("eq"),
eq.with.lhs = FALSE,
formula = formula)
```

Variable names in the equation set to be meaningful.

## Code

```
formula <- y ~ poly(x, 3, raw = TRUE)
ggplot(df2, aes(x, y)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(mapping = use_label(c("eq", "R2")),
eq.with.lhs = "italic(h)~`=`~",
eq.x.rhs = "~italic(z)",
formula = formula) +
labs(x = expression(italic(z)), y = expression(italic(h)))
```

Even Greek charactes and maths can be incorporated when needed.

## Code

```
formula <- y ~ poly(x, 2, raw = TRUE)
ggplot(df2, aes(x, log10(y + 1e6))) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(mapping = use_label("eq"),
eq.with.lhs = "plain(log)[10](italic(delta)+10^6)~`=`~",
eq.x.rhs = "~Omega",
formula = formula) +
labs(y = expression(plain(log)[10](italic(delta)+10^6)),
x = expression(Omega))
```

`stat_poly_eq()`

by default assumes that the fitted polynomial has the variable mapped to the `x`

or `y`

aesthetic as explanatory variable depending on the model formula and argument passed to parameter `orientation`

. It does not decode any transformations applied on the fly to `x`

or `y`

. In addition it does not function correctly with linear models that are not polynomials.

If a transformation is applied, the user is responsible for replacing as shown below the \(x\) (and/or \(y\)) symbol in the equation by setting an alternative value matching the transformation applied in the model formula.

## Code

```
formula <- y ~ I(x^2) # we apply a transformation to x within the formula
ggplot(df2, aes(x, y)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(mapping = use_label("eq"),
eq.x.rhs = "~x^2",
formula = formula)
```

In some cases, like when applying a power as transformation and fitting higher degree polynomials, one may need to manually build the R expression representing the equation to simplify it.

The examples shown above also work with groups.

## Code

```
formula <- y ~ poly(x, 3, raw = TRUE)
ggplot(df2, aes(x, y2, colour = group)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(aes(label = after_stat(eq.label)),
formula = formula,
vstep = 0.08) +
theme_bw()
```

Using colours to indicate the groups to which equations correspond is not always best. Here we use labels to the left of each equation.

## Code

```
formula <- y ~ poly(x, 3, raw = TRUE)
ggplot(df2, aes(x, y2, linetype = group, grp.label = group)) +
geom_point() +
stat_poly_line(formula = formula, color = "black") +
stat_poly_eq(aes(label = after_stat(paste("bold(", grp.label, "*':')~~~",
eq.label, sep = ""))),
formula = formula)
```

Facets are also supported.

## Code

```
formula <- y ~ poly(x, 3, raw = TRUE)
ggplot(df2, aes(x, y2)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(aes(label = after_stat(eq.label)),
formula = formula) +
facet_wrap(facets = vars(group), scales = "free_y", ncol = 1)
```

In the stats defined in ggpmsic the fit method function can modify the model passed as argument because the one used to build the equation is retrieved from the fitted model object when possible. In this example, a linear regression is fitted if the slope is significant, but if not the mean is reported, both graphically and numerically.

## Code

```
# user defined fit method removes the slope if
# the slope is not significant
poly_or_mean <- function(formula, data, ...) {
fm <- lm(formula = formula, data = data, ...)
if (anova(fm)[["Pr(>F)"]][1] > 0.1) {
lm(formula = y ~ 1, data = data, ...)
} else {
fm
}
}
# we create a plot as usual, but with our method
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
stat_poly_line(method = "poly_or_mean") +
stat_poly_eq(method = poly_or_mean,
aes(label = after_stat(eq.label)),
label.x = "right") +
theme(legend.position = "bottom") +
facet_wrap(~class, ncol = 2)
```

Faceting works also with free *x* and/or *y* scales in panels.

## Code

```
formula <- y ~ poly(x, 3, raw = TRUE)
ggplot(df2, aes(x, y2, fill = block)) +
geom_point(shape = 21, size = 3) +
stat_poly_line(formula = formula) +
stat_poly_eq(aes(label = after_stat(rr.label)), size = 3,
geom = "label_npc", alpha = 0.33,
formula = formula) +
facet_wrap(~group, scales = "free_y") +
theme(legend.position = "top")
```

Faceting works also with free *x* and/or *y* scales in panels when the position of labels is set by a string.

## Code

```
formula <- y ~ poly(x, 3, raw = TRUE)
ggplot(df2, aes(x, y2, fill = block)) +
geom_point(shape = 21, size = 3) +
stat_poly_line(formula = formula) +
stat_poly_eq(use_label("R2"), size = 3,
geom = "label_npc", alpha = 0.33,
formula = formula,
label.x = "right", label.y = "bottom",
rr.digits = 3) +
stat_poly_eq(use_label(c("F", "P")),
size = 3,
geom = "label_npc", alpha = 0.33,
formula = formula) +
facet_wrap(~group, scales = "free_y") +
theme(legend.position = "top")
```

Some artificial data with rather high random variation on both *x* and *y*.

Linear regression through the origin.

## Code

```
# model with intercept = 0
formula <- y ~ x + 0
ggplot(df3, aes(x, y)) +
geom_point() +
stat_poly_line(formula = formula) +
stat_poly_eq(formula = formula,
mapping = use_label("eq"))
```

These two examples show how to fit only one of the two parameters of a linear regression using the statistics from ‘ggpmisc’. These are unusual use cases but serve as examples of how simple transformations included in the model formula using the identity function `I()`

can be “undone” at the `after_stat()`

stage. Both examples use function `stage()`

, available only in ‘ggplot2’ (>= 3.4.0).

Linear regression with a fixed (instead of fitted) *y*-intercept (*y* = 5 at *x* = 0) and a fitted slope.

## Code

```
ggplot(df3) +
stat_poly_line(mapping = aes(x = x,
y = stage(start = y5, after_stat = y + 5),
ymax = after_stat(ymax + 5),
ymin = after_stat(ymin + 5)),
formula = I(y - 5) ~ x + 0) +
stat_poly_eq(mapping = aes(x, y5,
label = after_stat(paste(eq.label, "~+~5*\", \"*", rr.label))),
orientation = "x",
formula = I(y - 5) ~ x + 0) +
geom_point(mapping = aes(x, y5), size = 2.5) +
ylab("y")
```

Linear regression with a fixed slope of 1:1 (instead of fitted) and a fitted *y*-intercept.

## Code

```
ggplot(df3) +
stat_poly_line(mapping = aes(x = x,
y = stage(start = y5, after_stat = y + x - mean(x)),
ymax = after_stat(ymax + x - mean(x)),
ymin = after_stat(ymin + x - mean(x))),
formula = y ~ 1) +
stat_poly_eq(mapping = aes(x, y5 - mean(x),
label = sprintf("y~`=`~%.3g~+~italic(x)",
after_stat(b_0))),
parse = TRUE,
output.type = "numeric",
formula = y ~ 1) +
geom_point(mapping = aes(x, y5), size = 2.5) +
ylab("y")
```

Regressions of *y* on *x* and *x* on *y* in the same plot. The lines are drawn for the range of the explanatory variable, which differs in this case between the two fits.

## Code

```
# the default for formula is y ~ x
ggplot(df3, aes(x, y)) +
geom_point() +
stat_poly_line(color = "blue") +
stat_poly_eq(mapping = use_label(c("R2", "eq")),
color = "blue") +
stat_poly_line(color = "red",
orientation = "y") +
stat_poly_eq(mapping = use_label(c("R2", "eq")),
color = "red",
orientation = "y",
label.y = 0.9)
```

## 6.1 Alternatives

Package ‘ggpubr’ provides a renamed copy of `stat_poly_eq()`

taken from an old version of ggpmisc. The version in ‘ggpubr’ is much more limited in its functionality and even contains bugs. The line drawing code in `stat_smooth()`

from ggplot2 was used as a basis for `stat_poly_line()`

, the main difference is that by default it uses `lm()`

as method irrespective of the data size while `stat_smooth()`

switches to using splines for large numbers of observations.

# 7 Major axis regression

If *x* and *y* are both subject to random errors, and none is clearly the cause of the other, we should use major axis regression instead of linear regression.

The examples in this section make use of `stat_ma_line()`

and `stat_ma_eq()`

from package ggpmisc.

## Code

```
ggplot(df3, aes(x, y)) +
geom_point() +
stat_ma_line() +
stat_ma_eq(mapping = use_label(c("R2", "eq")))
```

No alternatives I know of.

# 8 Quantile regression

Quantile regression is frequently used is some disciplines like Economics but less frequently in other fields. It is robust and can be rather easily interpreted based on its similarity to the well known box plots.

The examples in this section make use of `stat_quant_line()`

, `stat_quant_band()`

and `stat_quant_eq()`

from package ggpmisc.

Quantile regression fits are done by numerical approximation, and frequently generate warnings for non-unique solutions or similar problems. Their importance needs to be assessed by users. In the examples below they seem mostly unimportant given the large number of observations and have been silenced in the output.

A median regression with a band limited by the upper and lower quartile regressions.

## Code

```
# the default for formula is y ~ x
ggplot(df3, aes(x, y)) +
geom_point() +
stat_quant_band() +
stat_quant_eq(quantiles = 0.5)
```

Median regression through the origin.

## Code

```
# model with intercept = 0
formula <- y ~ x + 0
ggplot(df3, aes(x, y)) +
geom_point() +
stat_quant_band(formula = formula) +
stat_quant_eq(formula = formula, quantiles = 0.5)
```

Quantile regressions of *y* on *x* and *x* on *y*.

## Code

```
# the default for formula is y ~ x
ggplot(df3, aes(x, y)) +
geom_point() +
stat_quant_band(color = "blue") +
stat_quant_eq(quantiles = 0.5, color = "blue") +
stat_quant_band(color = "red",
orientation = "y") +
stat_quant_eq(quantiles = 0.5, color = "red",
orientation = "y", label.y = 0.9)
```

Fitting a polynomial by quantile regression.

## Code

```
ggplot(df2, aes(x, y)) +
geom_point() +
stat_quant_band(formula = y ~ poly(x, 2)) +
stat_quant_eq(formula = y ~ poly(x, 2), quantiles = 0.5)
```

Two quantiles are by default plotted as lines,

## Code

```
ggplot(df2, aes(x, y)) +
geom_point() +
stat_quant_line(formula = y ~ poly(x, 2), quantiles = c(0.05, 0.95)) +
stat_quant_eq(formula = y ~ poly(x, 2), quantiles = c(0.05, 0.95))
```

A single quantile, here the median, is plotted as a line plus a confidence band.

## Code

```
ggplot(df2, aes(x, y)) +
geom_point() +
stat_quant_line(formula = y ~ poly(x, 2), quantiles = 0.5) +
stat_quant_eq(formula = y ~ poly(x, 2), quantiles = 0.5)
```

Equations labelled by quantile.

## Code

```
ggplot(df2, aes(x, y)) +
geom_point() +
stat_quant_band(formula = y ~ poly(x, 2),
color = "black", fill = "grey60") +
stat_quant_eq(aes(label = paste(after_stat(grp.label), "*\": \"*",
after_stat(eq.label), sep = "")),
formula = y ~ poly(x, 2)) +
theme_classic()
```

Equations labelled by group and quantile.

## Code

```
ggplot(df2, aes(x, y, group = group, linetype = group,
shape = group, grp.label = group)) +
geom_point() +
stat_quant_line(formula = y ~ poly(x, 2),
quantiles = c(0.1, 0.9),
color = "black") +
stat_quant_eq(aes(label = paste(after_stat(grp.label), "*\": \"*",
after_stat(eq.label), sep = "")),
formula = y ~ poly(x, 2),
quantiles = c(0.1, 0.9)) +
theme_classic()
```

## 8.1 Alternatives

Package ggplot2 defines `stat_quantile()`

which can be used to plot as a line some types of quantile regression fits, but is less flexible than `stat_quant_line()`

and `stat_quant_band()`

.

# 9 Non-linear models

The examples in previous sections were based on ready assembled labels. Here we show examples of how to generate labels for equations and other parameter estimates with an approach applicable to many different types of models.

The examples in this section make use of `stat_fit_tidy()`

from package ggpmisc.

We use as example a fit of the Michaelis-Menthen equation of reaction kinetics, a function that is non-linear in its parameters.

## Code

```
micmen.formula <- y ~ SSmicmen(x, Vm, K)
ggplot(Puromycin, aes(conc, rate, colour = state)) +
geom_point() +
geom_smooth(method = "nls",
formula = micmen.formula,
se = FALSE) +
stat_fit_tidy(method = "nls",
method.args = list(formula = micmen.formula),
label.x = "right",
label.y = "bottom",
aes(label = paste("V[m]~`=`~", signif(after_stat(Vm_estimate), digits = 3),
"%+-%", signif(after_stat(Vm_se), digits = 2),
"~~~~K~`=`~", signif(after_stat(K_estimate), digits = 3),
"%+-%", signif(after_stat(K_se), digits = 2),
sep = "")),
parse = TRUE) +
theme_bw()
```

## Code

```
micmen.formula <- y ~ SSmicmen(x, Vm, K)
ggplot(Puromycin, aes(conc, rate, colour = state)) +
geom_point() +
geom_smooth(method = "nls",
formula = micmen.formula,
se = FALSE) +
stat_fit_tidy(method = "nls",
method.args = list(formula = micmen.formula),
label.x = "center",
label.y = "bottom",
vstep = 0, hstep = -0.3,
aes(label = paste("V~`=`~frac(", signif(after_stat(Vm_estimate), digits = 2), "~C,",
signif(after_stat(K_estimate), digits = 2), "+C)",
sep = "")),
parse = TRUE) +
labs(x = "C", y = "V") +
theme_bw()
```

## 9.1 Alternatives

In all cases it is possible to do the model fitting before creating the plot, assembling labels in one’s own R code and passing them to `ggplot()`

as data. The possibilities are nearly unlimited but implementing them, depending on one’s own familiarity with R and ‘ggplot2’ can become very time consuming. I have in part written ggpp and ggpmisc to avoid as a user having to remind myself of how to write such code repeatedly. My hope is that others will also save time and effort by using the packages I have developed.