Abstract

Example R code for plots with pairwise labels created with geom_text_pairwise() and geom_label_pairwise() from R package ‘ggpp’ demonstrating their use to display the results from multiple comparison tests, fitted contrasts, and ranges.

Warning

Version 0.5.4 of package ‘ggpp’ available through CRAN is required.

Version 0.5.4 of package ‘ggpp’ is the first to include geom_text_pairwise() and geom_label_pairwise() whose use is exemplified by this page.

Tip

In this page most code chunks are “folded” so as to decrease the clutter when searching for examples. A few code chunks that are reused across several plots are by default unfolded to make them more visible. Above each plot you will find one or more “folded” code chuncks signalled by a small triangle followed by “Code”. Clicking on the triangle “unfolds” the code chunk making visible the R code used to produce the plot.

The code in the chunks can be copied by clicking on the top right corner, where an icon appears when the mouse cursor hovers over the code listing.

The </> Code drop down menu to the right of the page title makes it possible to unfold all code chunks and to view the Quarto source of the whole web page.

Names of functions and other R objects are linked to the corresponding on-line help pages. The names of R extension packages are linked to their documentation web sites when available.

1 Introduction

Here you will find examples of ggplots with labeled segments as used to highlight pairwise comparisons among factor levels. The labelled segments have been added to the plots with geometries defined in package ggpp available at CRAN.

Package ggpmisc imports and reexports all definitions from ggpp as well as from ggplot2.

Important

One needs to always check that annotations do not occlude anything significant, such as observations in the base plot. This needs special care when using annotations together with batch plotting. Either ensure that the scale limits of the base plot are expanded to avoid overlap or that the layer with the equations is the lowest one, i.e., added to the plot first.

2 Data labels and plot annotations

Data labels add textual information directly related to individual data points (shown as glyphs). Text position in this case is dependent on the scales used to represent data points. Text is usually displaced so that it does not occlude the glyph representing the data point and when the link to the data point is unclear, this link is signalled with a line segment or arrow. Data labels are distinct from annotations in that they contribute directly to the representation of data on a plot or map.

Annotations differ from data labels, in that their position is decoupled from their meaning. Insets can be thought as larger, but still self-contained annotations. In most cases the reading of inset tables and plots depends only weakly on the plot or map in which they are included.

In the case of annotations and insets the designer of a data visualization has the freedom to locate them anywhere, as long as they do not occlude features used to describe data. I will use the term annotation irrespective if the “labels” are textual or graphical.

The labelled segments used to highlight pairwise comparisons are a special case as they behave as data labels along the axis onto which an explanatory factor has been mapped, usually x, but as annotation along the axis onto which a continuous numeric variable has been mapped, usually y.

3 Pairwise comparisons and labels

Pairwise comparisons are used with grouped observations, i.e., when a factor is mapped to the x (or y) aesthetic. Usually observations are summarized as mean and standard errors, either as points or columns, or using boxplots.

There are two approaches in common use to display significance of differences in this type of plot: use of single letters so that columns or points representing groups that are significantly different do not have any in common will those not significantly different from each other share at least one letter. This approach has a long tradition but is being replaced by annotated horizontal bars, showing either P-values, asterics, or some other in a label. This second approach is easier to read but can result in crowded figures.

Statistically, multiple comparisons are special in that they are not orthogonal and thus P-values must be adjusted or computed using special procedures. Uncorrected t-tests must be avoided. In case of a factor with more than two levels, multiple comparisons should be applied only if the overall significance of the effect of the factor or its interactions are significant.

From version 0.5.4 package ‘ggpp’ includes geometries geom_text_pairwise() and geom_label_pairwise(), that simplify the addition of labelled segments.

Load and attach the packages.

library(ggpmisc)
library(gginnards) # for debugging

3.1 Simple labelling examples

Make a simple plot to label.

Code

my.cars <- mtcars
my.cars$cyl <- factor(my.cars$cyl)

p1 <- ggplot(my.cars, aes(cyl, mpg)) +
  geom_boxplot(width = 0.33)
p1

The first, and simplest example with labelled segments, uses labels to identify the contrasts but, does not use estimates from tests of significance.

Warning

In ‘ggplot2’, when a factor is mapped to x (or y) the values used internally are the numerical representation of the levels in the factor, i.e., as integer values starting from 1. Thus, the leftmost factor level on the x axis coresponds to 1, the second factor level to 2, and so on, irrespective of the labels.

In this page labels are in all examples based on data assembled in user code. It is crucial to make sure that the values mapped to xmin and xmax and to label are correct and consistent.

We construct a data frame with the data for text labels.

labels.df <-
  data.frame(
    pair.label = c("4 vs. 6 cyl.", "4 vs. 8 cyl.", "6 vs. 8 cyl."),
    right.tip = c(2, 3, 3),
    left.tip = c(1, 1, 2),
    y.bar = c(35.5, 38, 33)
  )

Code

 p1 +
   geom_text_pairwise(data = labels.df,
                      aes(xmin = left.tip, 
                          xmax = right.tip,
                          y = y.bar,
                          label = pair.label)) +
  expand_limits(y = 40)

While I will use geom_text_pairwise() for most examples, I here use geom_label_pairwise(). Their differences are similar to those between ggplot2::geom_text() and ggplot2::geom_label(). Other examples later in this page can be easily changed to use geom_label_pairwise() instead of geom_text_pairwise().

Code

 p1 +
  geom_label_pairwise(data = labels.df,
                      aes(xmin = left.tip, 
                          xmax = right.tip,
                          y = y.bar,
                          label = pair.label),
                      segment.linewidth = 0.5,
                      linewidth = 0.5) +
  expand_limits(y = 40) +
  theme_bw()

Code

 p1 +
  geom_label_pairwise(data = labels.df,
                      aes(xmin = left.tip, 
                          xmax = right.tip,
                          y = y.bar,
                          label = pair.label),
                      fill = "black",
                      colour = "white",
                      colour.target = "text",
                      segment.linewidth = 0.75) +
  expand_limits(y = 40)

3.2 Labelling with dodged groups

For a change from the examples above we use stat_summary() with geom = "col" to obtain a column plot. In this case, we have two groupings, created by the mappings to the x and fill aesthetics. We use position = position_dodge2() to avoid the overlap of the columns.

Tip

We can use geom_debug() from package ‘gginnards’ to check the values returned by stat_summary(). The code below is not run.

Code

ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(vs))) +
  # adds layer
  stat_summary(fun = mean, 
               position = position_dodge2(preserve = "single", width = 1), 
               width = 1, 
               geom = "col") +
  # prints data to the console by setting geom = "debug"
  stat_summary(fun = mean, 
               position = position_dodge2(preserve = "single", width = 1), 
               width = 1, 
               geom = "debug")

Using the pairwise geometries together with position_dodge() requires some ingenuity when creating the data frames. When a factor is mapped to x, the positions for the levels along the x axis, increase by one, starting at one. So in this case the tick labelled “4” corresponds to an x value of 1, “6” to 2, and “8” to 3. Be aware, though, that in this case the data are split into five groups.

Function position = position_dodge2() is not passed as argument in the calls to geom_label_pairwise()!

Code

ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(vs))) +
  stat_summary(fun = mean, 
               position = position_dodge2(preserve = "single", width = 1), 
               width = 1, 
               geom = "col") +
  geom_label_pairwise(data = data.frame(xmin = c(1, 1, 2), 
                                        xmax = c(2, 3, 3), 
                                        label = rep("test1", 3), 
                                        y = 31 + (0:2) * 1.75), # start + shifts
                      mapping = aes(xmin = xmin, xmax = xmax, label = label, y = y),
                      fill = "yellow"
  ) +
  geom_label_pairwise(data = data.frame(xmin = 0.75 + (0:1), 
                                        xmax = 1.25 + (0:1), 
                                        label = rep("test2", 2), 
                                        y = c(28.5, 22.2)), # values given directly
                      mapping = aes(xmin = xmin, xmax = xmax, label = label, y = y)
  ) +
  scale_fill_grey()

If we change the width of the columns, we need to adjust the coordinates of the ends of the pairwise bars so that they match the centre of the columns.

Code

ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(vs))) +
  stat_summary(fun = mean, 
               position = position_dodge2(preserve = "single", width = 1), 
               width = 0.75, # 1 -> 0.75
               geom = "col") +
  geom_label_pairwise(data = data.frame(xmin = c(1, 1, 2), 
                                        xmax = c(2, 3, 3), 
                                        label = rep("test1", 3), 
                                        y = 31 + (0:2) * 1.75),
                      mapping = aes(xmin = xmin, xmax = xmax, label = label, y = y),
                      fill = "yellow"
  ) +
  geom_label_pairwise(data = data.frame(xmin = 0.8 + (0:1), # 0.75 -> 0.8
                                        xmax = 1.2 + (0:1), # 1.15 -> 1.2
                                        label = rep("test2", 2), 
                                        y = c(28.5, 22.2)),
                      mapping = aes(xmin = xmin, xmax = xmax, label = label, y = y)
  ) +
  scale_fill_grey()

3.3 Pairwise t-test

To label the bars with P-values we can compute them before plotting and build a suitable data frame.

We run the test and construct a data frame with the results, adding arbitrary values for y to avoid overlaps.

my.htest <- 
  pairwise.t.test(my.cars$mpg, my.cars$cyl, p.adjust.method = "holm")
my.htest


    Pairwise comparisons using t tests with pooled SD 

data:  my.cars$mpg and my.cars$cyl 

  4       6      
6 0.00024 -      
8 2.6e-09 0.00415

P value adjustment method: holm

t.test.df <- data.frame(
  p.value = as.vector(my.htest$p.value),
  right.tip = c(2, 3, NA, 3),
  left.tip = c(1, 1, NA, 2),
  y.bar = c(35.5, 38, NA, 33)
)

Code

 p1 +
   geom_text_pairwise(data = t.test.df,
                      aes(xmin = left.tip, 
                          xmax = right.tip,
                          y = y.bar,
                          label = ifelse(p.value < 1e-3,
                                         "italic(P)~`<`~0.001",
                                         sprintf("italic(P)~`=`~%.3f", p.value))),
                      parse = TRUE,
                      na.rm = TRUE) +
  expand_limits(y = 40)

Code

 p1 +
   geom_text_pairwise(data = t.test.df,
                      aes(xmin = left.tip, 
                          xmax = right.tip,
                          y = y.bar,
                          label = ifelse(p.value < 1e-3,
                                         "italic(P)~`<`~0.001",
                                         sprintf("italic(P)~`=`~%.3f", p.value))),
                      arrow = grid::arrow(angle = 90,
                                          length = unit(1, "mm"),
                                          ends = "both"),
                      parse = TRUE,
                      na.rm = TRUE) +
  expand_limits(y = 40)

Code

 p1 +
   geom_text_pairwise(data = t.test.df,
                      aes(xmin = left.tip, 
                          xmax = right.tip,
                          y = y.bar,
                          colour = p.value < 0.002,
                          label = ifelse(p.value < 1e-3,
                                         "italic(P)~`<`~0.001",
                                         sprintf("italic(P)~`=`~%.3f", p.value))),
                       arrow = grid::arrow(angle = 90,
                                          length = unit(1, "mm"),
                                          ends = "both"),
                      parse = TRUE,
                      na.rm = TRUE) +
  scale_colour_manual(values = c("black", "red"), na.value = NA, guide = "none") +
  expand_limits(y = 40)

Code

 p1 +
   geom_text_pairwise(data = t.test.df,
                      aes(xmin = left.tip, 
                          xmax = right.tip,
                          y = y.bar,
                          colour = p.value < 0.002,
                          label = ifelse(p.value < 1e-3,
                                         "italic(P)~`<`~0.001",
                                         sprintf("italic(P)~`=`~%.3f", p.value))),
                      arrow = grid::arrow(angle = 90,
                                          length = unit(1, "mm"),
                                          ends = "both"),
                      colour.target = "text",
                      parse = TRUE,
                      na.rm = TRUE) +
  scale_colour_manual(values = c("black", "red"), na.value = NA, guide = "none") +
  expand_limits(y = 40)

3.4 Pairwise Wilcox test

The object returned by pairwise.wilcox.text() has the same structure as that returned by pairwise.t.test() so we only need to change the names.

# warnings about ties have been silenced
my.w.test <- 
  pairwise.wilcox.test(my.cars$mpg, my.cars$cyl, p.adjust.method = "holm")
my.w.test


    Pairwise comparisons using Wilcoxon rank sum test with continuity correction 

data:  my.cars$mpg and my.cars$cyl 

  4       6     
6 0.0013  -     
8 8.3e-05 0.0013

P value adjustment method: holm

wilcox.test.df <- data.frame(
  p.value = as.vector(my.w.test$p.value),
  right.tip = c(2, 3, NA, 3),
  left.tip = c(1, 1, NA, 2),
  y.bar = c(35.5, 38, NA, 33)
)

I show only the plot for the first t-test example, using identical code except for the name of the data frame. All other examples will also work without changes.

Code

 p1 +
   geom_text_pairwise(data = wilcox.test.df,
                      aes(xmin = left.tip, 
                          xmax = right.tip,
                          y = y.bar,
                          label = ifelse(p.value < 1e-3,
                                         "italic(P)~`<`~0.001",
                                         sprintf("italic(P)~`=`~%.3f", p.value))),
                      parse = TRUE,
                      na.rm = TRUE) +
  expand_limits(y = 40)

3.5 Fitted contrasts

Package ‘gmodels’ provides methods for fitting arbitrary contrasts.

library(gmodels)

Fitting contrasts individually, we can decide which contrasts are of a priori interest and limit testing to these. As for t-test and Wicox-test, P-values need to be adjusted, but the adjustment in this case is applied separately from the fitting.

To use function fit.contrasts() from package ‘gmodels’ we start by fitting a linear model to the same data as used for the examples above.

mpg.cyl.lm <- lm(mpg ~ cyl, data = my.cars)
anova(mpg.cyl.lm)

Analysis of Variance Table

Response: mpg
          Df Sum Sq Mean Sq F value    Pr(>F)    
cyl        2 824.78  412.39  39.697 4.979e-09 ***
Residuals 29 301.26   10.39                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We next fit de contrasts and extract the P-values from the returned value.

fit.contrast.df <-
  data.frame(left.tip = 1:2,
             right.tip = 2:3,
             y.bar = c(36, 23),
             p.value = p.adjust(
               c(fit.contrast(mpg.cyl.lm, varname = "cyl", coeff = c(-1, 1, 0))[4],
                 fit.contrast(mpg.cyl.lm, varname = "cyl", coeff = c(0, -1, 1))[4])
             ))

Although I could have passed a matrix to parameter coeff to fit both contrasts simultaneously I fitted them one by one for clarity.

I once again reuse unchanged except for the name of the data frame, the code used above for the t-test and Wilcox-tests examples.

Code

 p1 +
   geom_text_pairwise(data = fit.contrast.df,
                      aes(xmin = left.tip, 
                          xmax = right.tip,
                          y = y.bar,
                          label = ifelse(p.value < 1e-3,
                                         "italic(P)~`<`~0.001",
                                         sprintf("italic(P)~`=`~%.3f", p.value))),
                      parse = TRUE,
                      na.rm = TRUE) +
  expand_limits(y = 40)

Function fit.contrast() can be used to fit in addition to pairwise contrasts, more complex contrast involving more than two factor levels. Currently, no geometry can directly add them to a plot as a single layer. One needs to build them layer by layer (not shown).

3.6 Multiple comparisons

Package ‘multicomp’ implements different multiple comparison methods, using a flexible and generally applicable approach.

Tip

Statistic stat_multcomp() from package ggpmisc automates the computation of adjusted P-values and formatted character strings to be parsed into R’s plotmath expressions, for the cases of “Tukey” and “Dunnet” contrasts. See examples in page Multiple comparisons with ‘ggpmisc’.

library(multcomp)

As with package ‘gmodels’ we need to first fit a linear or other model to the data. Here I reuse the fitted model from the previous section. The next example demonstrates the use of function glht() (General Linear Hypotheses Tests) with Tukey’s method for pairwise multiple comparisons.

mpg.cyl.glht <- glht(model = mpg.cyl.lm, linfct = mcp(cyl = "Tukey"), rhs = 0)
summary(mpg.cyl.glht)


     Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: lm(formula = mpg ~ cyl, data = my.cars)

Linear Hypotheses:
           Estimate Std. Error t value Pr(>|t|)    
6 - 4 == 0   -6.921      1.558  -4.441   <0.001 ***
8 - 4 == 0  -11.564      1.299  -8.905   <0.001 ***
8 - 6 == 0   -4.643      1.492  -3.112   0.0111 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)

We need as above to build a data frame with the data for the pairwise labels, in this case extracting the P-value estimates from the object returned by summary().

multcomp.df <- 
  data.frame(
    p.value = summary(mpg.cyl.glht)$test$pvalues,
    left.tip = c(1, 1, 2),
    right.tip = c(2, 3, 3),
    y.bar = c(35.5, 38, 33)
  )

I once again reuse unchanged the code used above for the t-test and Wilcox-tests examples except for the name of the data frame passed as argument to data.

Code

 p1 +
   geom_text_pairwise(data = multcomp.df,
                      aes(xmin = left.tip, 
                          xmax = right.tip,
                          y = y.bar,
                          label = ifelse(p.value < 1e-3,
                                         "italic(P)~`<`~0.001",
                                         sprintf("italic(P)~`=`~%.3f", p.value))),
                      parse = TRUE,
                      na.rm = TRUE) +
  expand_limits(y = 40)

As is the case with function fit.contrasts() used in the previous section, function glht() can be used to fit many different contrasts, pairwise or otherwise, with suitable corrections for multiple comparisons. Currently, no geometry can directly add labelled segments for contrasts that are not pairwise. One needs to build them layer by layer, labels and segments (not shown).

3.7 Other uses

Although labelled segments are most commonly used to report the results of pairwise significance tests, several other uses are possible. An obvious one is to highlight a range of values for a continuous variable or a group of succesive factors levels.

Geometries geom_text_pairwise() and geom_label_pairwise(), used in the examples above with factors, can be also used when the variables mapped onto aesthetics x and y are both continuous.

In this section I show examples of their use to highlight ranges of values of a continuos variable from the same data set. I start by constructing a suitable base plot to use for labelling examples.

Code

p2 <-
  ggplot(my.cars, aes(disp, mpg)) +
  geom_point()+
  geom_hline(yintercept = 0, colour = "grey90", linewidth = 1.2) +
  theme_bw()
p2

I construct a data frame with the data for labelling ranges. In this case as x is numeric, and we set the x values for the segment tips using user-data units (i.e., the values as shown in x axis’ tick marks).

my.ranges <-
  data.frame(A = c(50, 200, 400),
             B = c(200, 400, 500),
             bar.height = 0,
             text = c("small", "medium", "large"))

The code is very similar to that used for the pairwise examples in previous sections. The main change is that I used different names for the columns of the data frame.

Code

p2  +
  geom_text_pairwise(data = my.ranges,
                     aes(xmin = A, 
                         xmax = B,
                         y = bar.height, 
                         label = text,
                         colour = text),
                     segment.linewidth = 1) +
  scale_colour_discrete(guide = "none") +
  expand_limits(y = 0) +
  theme_bw()

Code

p2 +
  geom_text_pairwise(data = my.ranges,
                     aes(xmin = A, 
                         xmax = B,
                         y = bar.height, 
                         label = text),
                     arrow = grid::arrow(ends = "both", length = unit(2, "mm"))) +
  expand_limits(y = 0) +
  theme_bw()

Reuse

CC BY-SA 4.0

--- title: "Pairwise labels with 'ggpp'" subtitle: "Contrasts, multiple comparisons and simple labels" author: "Pedro J. Aphalo" date: "2023-08-01" date-modified: "2024-04-22" keywords: [ggplot2 pkg, ggpp pkg, ggpmisc pkg, data labels, plot annotations, pairwise] categories: [plotting, data visualization, ggpp, ggpmisc, ggplot2] editor: markdown: wrap: 72 code-fold: true format: html: code-link: true code-tools: true abstract: | Example R code for plots with pairwise labels created with `geom_text_pairwise()` and `geom_label_pairwise()` from R package 'ggpp' demonstrating their use to display the results from multiple comparison tests, fitted contrasts, and ranges. --- ::: callout-warning **Version 0.5.4 of package 'ggpp' available through CRAN is required.** Version 0.5.4 of package 'ggpp' is the first to include `geom_text_pairwise()` and `geom_label_pairwise()` whose use is exemplified by this page. ::: ::: callout-tip In this page most code chunks are "folded" so as to decrease the clutter when searching for examples. A few code chunks that are reused across several plots are by default unfolded to make them more visible. Above each plot you will find one or more "folded" code chuncks signalled by a small triangle followed by "Code". Clicking on the triangle "unfolds" the code chunk making visible the R code used to produce the plot. The code in the chunks can be copied by clicking on the top right corner, where an icon appears when the mouse cursor hovers over the code listing. The `</> Code` drop down menu to the right of the page title makes it possible to unfold all code chunks and to view the Quarto source of the whole web page. Names of functions and other R objects are linked to the corresponding on-line help pages. The names of R extension packages are linked to their documentation web sites when available. ::: # Introduction Here you will find examples of ggplots with labeled segments as used to highlight pairwise comparisons among factor levels. The labelled segments have been added to the plots with geometries defined in package [ggpp](https://docs.r4photobiology.info/ggpp/) available at CRAN. Package [ggpmisc](https://docs.r4photobiology.info/ggpmisc/) imports and reexports all definitions from [ggpp](https://docs.r4photobiology.info/ggpp/) as well as from [ggplot2](https://ggplot2.tidyverse.org/). ::: callout-important One needs to always check that annotations do not occlude anything significant, such as observations in the base plot. This needs special care when using annotations together with batch plotting. Either ensure that the scale limits of the base plot are expanded to avoid overlap or that the layer with the equations is the lowest one, i.e., added to the plot first. ::: # Data labels and plot annotations Data labels add textual information directly related to individual data points (shown as glyphs). Text position in this case is dependent on the scales used to represent data points. Text is usually displaced so that it does not occlude the glyph representing the data point and when the link to the data point is unclear, this link is signalled with a line segment or arrow. Data labels are distinct from annotations in that they contribute directly to the representation of data on a plot or map. Annotations differ from data labels, in that their position is decoupled from their meaning. Insets can be thought as larger, but still self-contained annotations. In most cases the reading of inset tables and plots depends only weakly on the plot or map in which they are included. In the case of annotations and insets the designer of a data visualization has the freedom to locate them anywhere, as long as they do not occlude features used to describe data. I will use the term annotation irrespective if the "labels" are textual or graphical. The labelled segments used to highlight pairwise comparisons are a special case as they behave as data labels along the axis onto which an explanatory factor has been mapped, usually _x_, but as annotation along the axis onto which a continuous numeric variable has been mapped, usually _y_. # Pairwise comparisons and labels Pairwise comparisons are used with grouped observations, i.e., when a factor is mapped to the _x_ (or _y_) aesthetic. Usually observations are summarized as mean and standard errors, either as points or columns, or using boxplots. There are two approaches in common use to display significance of differences in this type of plot: use of single letters so that columns or points representing groups that are significantly different do not have any in common will those not significantly different from each other share at least one letter. This approach has a long tradition but is being replaced by annotated horizontal bars, showing either _P_-values, asterics, or some other in a label. This second approach is easier to read but can result in crowded figures. Statistically, multiple comparisons are special in that they are not orthogonal and thus _P_-values must be adjusted or computed using special procedures. Uncorrected _t_-tests must be avoided. In case of a factor with more than two levels, multiple comparisons should be applied only if the overall significance of the effect of the factor or its interactions are significant. From version 0.5.4 package 'ggpp' includes geometries `geom_text_pairwise()` and `geom_label_pairwise()`, that simplify the addition of labelled segments. Load and attach the packages. ```{r, message=FALSE} #| code-fold: false library(ggpmisc) library(gginnards) # for debugging ``` ## Simple labelling examples Make a simple plot to label. ```{r} my.cars <- mtcars my.cars$cyl <- factor(my.cars$cyl) p1 <- ggplot(my.cars, aes(cyl, mpg)) + geom_boxplot(width = 0.33) p1 ``` The first, and simplest example with labelled segments, uses labels to identify the contrasts but, does not use estimates from tests of significance. ::: callout-warning In 'ggplot2', when a factor is mapped to _x_ (or _y_) the values used internally are the numerical representation of the levels in the factor, i.e., as `integer` values starting from 1. Thus, the leftmost factor level on the _x_ axis coresponds to `1`, the second factor level to `2`, and so on, irrespective of the labels. **In this page labels are in all examples based on data assembled in user code. It is crucial to make sure that the values mapped to `xmin` and `xmax` and to `label` are correct and consistent.** ::: We construct a data frame with the data for text labels. ```{r} #| code-fold: false labels.df <- data.frame( pair.label = c("4 vs. 6 cyl.", "4 vs. 8 cyl.", "6 vs. 8 cyl."), right.tip = c(2, 3, 3), left.tip = c(1, 1, 2), y.bar = c(35.5, 38, 33) ) ``` ```{r} p1 + geom_text_pairwise(data = labels.df, aes(xmin = left.tip, xmax = right.tip, y = y.bar, label = pair.label)) + expand_limits(y = 40) ``` While I will use `geom_text_pairwise()` for most examples, I here use `geom_label_pairwise()`. Their differences are similar to those between `ggplot2::geom_text()` and `ggplot2::geom_label()`. Other examples later in this page can be easily changed to use `geom_label_pairwise()` instead of `geom_text_pairwise()`. ```{r} p1 + geom_label_pairwise(data = labels.df, aes(xmin = left.tip, xmax = right.tip, y = y.bar, label = pair.label), segment.linewidth = 0.5, linewidth = 0.5) + expand_limits(y = 40) + theme_bw() ``` ```{r} p1 + geom_label_pairwise(data = labels.df, aes(xmin = left.tip, xmax = right.tip, y = y.bar, label = pair.label), fill = "black", colour = "white", colour.target = "text", segment.linewidth = 0.75) + expand_limits(y = 40) ``` ## Labelling with dodged groups For a change from the examples above we use `stat_summary()` with `geom = "col"` to obtain a column plot. In this case, we have two groupings, created by the mappings to the `x` and `fill` aesthetics. We use `position = position_dodge2()` to avoid the overlap of the columns. ::: callout-tip We can use `geom_debug()` from package 'gginnards' to check the values returned by `stat_summary()`. The code below is not run. ```{r, eval=FALSE} ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(vs))) + # adds layer stat_summary(fun = mean, position = position_dodge2(preserve = "single", width = 1), width = 1, geom = "col") + # prints data to the console by setting geom = "debug" stat_summary(fun = mean, position = position_dodge2(preserve = "single", width = 1), width = 1, geom = "debug") ``` ::: Using the pairwise geometries together with `position_dodge()` requires some ingenuity when creating the data frames. When a factor is mapped to _x_, the positions for the levels along the _x_ axis, increase by one, starting at one. So in this case the tick labelled "4" corresponds to an `x` value of 1, "6" to 2, and "8" to 3. Be aware, though, that in this case the data are split into five groups. _Function `position = position_dodge2()` is not passed as argument in the calls to `geom_label_pairwise()`!_ ```{r} ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(vs))) + stat_summary(fun = mean, position = position_dodge2(preserve = "single", width = 1), width = 1, geom = "col") + geom_label_pairwise(data = data.frame(xmin = c(1, 1, 2), xmax = c(2, 3, 3), label = rep("test1", 3), y = 31 + (0:2) * 1.75), # start + shifts mapping = aes(xmin = xmin, xmax = xmax, label = label, y = y), fill = "yellow" ) + geom_label_pairwise(data = data.frame(xmin = 0.75 + (0:1), xmax = 1.25 + (0:1), label = rep("test2", 2), y = c(28.5, 22.2)), # values given directly mapping = aes(xmin = xmin, xmax = xmax, label = label, y = y) ) + scale_fill_grey() ``` If we change the width of the columns, we need to adjust the coordinates of the ends of the pairwise bars so that they match the centre of the columns. ```{r} ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(vs))) + stat_summary(fun = mean, position = position_dodge2(preserve = "single", width = 1), width = 0.75, # 1 -> 0.75 geom = "col") + geom_label_pairwise(data = data.frame(xmin = c(1, 1, 2), xmax = c(2, 3, 3), label = rep("test1", 3), y = 31 + (0:2) * 1.75), mapping = aes(xmin = xmin, xmax = xmax, label = label, y = y), fill = "yellow" ) + geom_label_pairwise(data = data.frame(xmin = 0.8 + (0:1), # 0.75 -> 0.8 xmax = 1.2 + (0:1), # 1.15 -> 1.2 label = rep("test2", 2), y = c(28.5, 22.2)), mapping = aes(xmin = xmin, xmax = xmax, label = label, y = y) ) + scale_fill_grey() ``` ## Pairwise _t_-test To label the bars with _P_-values we can compute them before plotting and build a suitable data frame. We run the test and construct a data frame with the results, adding arbitrary values for _y_ to avoid overlaps. ```{r} #| code-fold: false my.htest <- pairwise.t.test(my.cars$mpg, my.cars$cyl, p.adjust.method = "holm") my.htest t.test.df <- data.frame( p.value = as.vector(my.htest$p.value), right.tip = c(2, 3, NA, 3), left.tip = c(1, 1, NA, 2), y.bar = c(35.5, 38, NA, 33) ) ``` ```{r} p1 + geom_text_pairwise(data = t.test.df, aes(xmin = left.tip, xmax = right.tip, y = y.bar, label = ifelse(p.value < 1e-3, "italic(P)~`<`~0.001", sprintf("italic(P)~`=`~%.3f", p.value))), parse = TRUE, na.rm = TRUE) + expand_limits(y = 40) ``` ```{r} p1 + geom_text_pairwise(data = t.test.df, aes(xmin = left.tip, xmax = right.tip, y = y.bar, label = ifelse(p.value < 1e-3, "italic(P)~`<`~0.001", sprintf("italic(P)~`=`~%.3f", p.value))), arrow = grid::arrow(angle = 90, length = unit(1, "mm"), ends = "both"), parse = TRUE, na.rm = TRUE) + expand_limits(y = 40) ``` ```{r} p1 + geom_text_pairwise(data = t.test.df, aes(xmin = left.tip, xmax = right.tip, y = y.bar, colour = p.value < 0.002, label = ifelse(p.value < 1e-3, "italic(P)~`<`~0.001", sprintf("italic(P)~`=`~%.3f", p.value))), arrow = grid::arrow(angle = 90, length = unit(1, "mm"), ends = "both"), parse = TRUE, na.rm = TRUE) + scale_colour_manual(values = c("black", "red"), na.value = NA, guide = "none") + expand_limits(y = 40) ``` ```{r} p1 + geom_text_pairwise(data = t.test.df, aes(xmin = left.tip, xmax = right.tip, y = y.bar, colour = p.value < 0.002, label = ifelse(p.value < 1e-3, "italic(P)~`<`~0.001", sprintf("italic(P)~`=`~%.3f", p.value))), arrow = grid::arrow(angle = 90, length = unit(1, "mm"), ends = "both"), colour.target = "text", parse = TRUE, na.rm = TRUE) + scale_colour_manual(values = c("black", "red"), na.value = NA, guide = "none") + expand_limits(y = 40) ``` ## Pairwise Wilcox test The object returned by `pairwise.wilcox.text()` has the same structure as that returned by `pairwise.t.test()` so we only need to change the names. ```{r, warning=FALSE} #| code-fold: false # warnings about ties have been silenced my.w.test <- pairwise.wilcox.test(my.cars$mpg, my.cars$cyl, p.adjust.method = "holm") my.w.test wilcox.test.df <- data.frame( p.value = as.vector(my.w.test$p.value), right.tip = c(2, 3, NA, 3), left.tip = c(1, 1, NA, 2), y.bar = c(35.5, 38, NA, 33) ) ``` I show only the plot for the first _t_-test example, using identical code except for the name of the data frame. All other examples will also work without changes. ```{r} p1 + geom_text_pairwise(data = wilcox.test.df, aes(xmin = left.tip, xmax = right.tip, y = y.bar, label = ifelse(p.value < 1e-3, "italic(P)~`<`~0.001", sprintf("italic(P)~`=`~%.3f", p.value))), parse = TRUE, na.rm = TRUE) + expand_limits(y = 40) ``` ## Fitted contrasts Package 'gmodels' provides methods for fitting arbitrary contrasts. ```{r, message=FALSE} #| code-fold: false library(gmodels) ``` Fitting contrasts individually, we can decide which contrasts are of a priori interest and limit testing to these. As for _t_-test and Wicox-test, _P_-values need to be adjusted, but the adjustment in this case is applied separately from the fitting. To use function `fit.contrasts()` from package 'gmodels' we start by fitting a linear model to the same data as used for the examples above. ```{r} #| code-fold: false mpg.cyl.lm <- lm(mpg ~ cyl, data = my.cars) anova(mpg.cyl.lm) ``` We next fit de contrasts and extract the _P_-values from the returned value. ```{r} #| code-fold: false fit.contrast.df <- data.frame(left.tip = 1:2, right.tip = 2:3, y.bar = c(36, 23), p.value = p.adjust( c(fit.contrast(mpg.cyl.lm, varname = "cyl", coeff = c(-1, 1, 0))[4], fit.contrast(mpg.cyl.lm, varname = "cyl", coeff = c(0, -1, 1))[4]) )) ``` Although I could have passed a `matrix` to parameter `coeff` to fit both contrasts simultaneously I fitted them one by one for clarity. I once again reuse unchanged except for the name of the data frame, the code used above for the _t_-test and Wilcox-tests examples. ```{r} p1 + geom_text_pairwise(data = fit.contrast.df, aes(xmin = left.tip, xmax = right.tip, y = y.bar, label = ifelse(p.value < 1e-3, "italic(P)~`<`~0.001", sprintf("italic(P)~`=`~%.3f", p.value))), parse = TRUE, na.rm = TRUE) + expand_limits(y = 40) ``` Function `fit.contrast()` can be used to fit in addition to pairwise contrasts, more complex contrast involving more than two factor levels. Currently, no geometry can directly add them to a plot as a single layer. One needs to build them layer by layer (not shown). ## Multiple comparisons Package 'multicomp' implements different multiple comparison methods, using a flexible and generally applicable approach. ::: callout-tip Statistic `stat_multcomp()` from package [ggpmisc](https://docs.r4photobiology.info/ggpmisc/) automates the computation of adjusted _P_-values and formatted character strings to be parsed into R's plotmath expressions, for the cases of "Tukey" and "Dunnet" contrasts. See examples in page [Multiple comparisons with 'ggpmisc'](multcomp.qmd). ::: ```{r, message=FALSE} #| code-fold: false library(multcomp) ``` As with package 'gmodels' we need to first fit a linear or other model to the data. Here I reuse the fitted model from the previous section. The next example demonstrates the use of function `glht()` (General Linear Hypotheses Tests) with Tukey's method for pairwise multiple comparisons. ```{r} #| code-fold: false mpg.cyl.glht <- glht(model = mpg.cyl.lm, linfct = mcp(cyl = "Tukey"), rhs = 0) summary(mpg.cyl.glht) ``` We need as above to build a data frame with the data for the pairwise labels, in this case extracting the _P_-value estimates from the object returned by `summary()`. ```{r} #| code-fold: false multcomp.df <- data.frame( p.value = summary(mpg.cyl.glht)$test$pvalues, left.tip = c(1, 1, 2), right.tip = c(2, 3, 3), y.bar = c(35.5, 38, 33) ) ``` I once again reuse unchanged the code used above for the _t_-test and Wilcox-tests examples except for the name of the data frame passed as argument to `data`. ```{r} p1 + geom_text_pairwise(data = multcomp.df, aes(xmin = left.tip, xmax = right.tip, y = y.bar, label = ifelse(p.value < 1e-3, "italic(P)~`<`~0.001", sprintf("italic(P)~`=`~%.3f", p.value))), parse = TRUE, na.rm = TRUE) + expand_limits(y = 40) ``` As is the case with function `fit.contrasts()` used in the previous section, function `glht()` can be used to fit many different contrasts, pairwise or otherwise, with suitable corrections for multiple comparisons. Currently, no geometry can directly add labelled segments for contrasts that are not pairwise. One needs to build them layer by layer, labels and segments (not shown). ## Other uses Although labelled segments are most commonly used to report the results of pairwise significance tests, several other uses are possible. An obvious one is to highlight a range of values for a continuous variable or a group of succesive factors levels. Geometries `geom_text_pairwise()` and `geom_label_pairwise()`, used in the examples above with factors, can be also used when the variables mapped onto aesthetics _x_ and _y_ are both continuous. In this section I show examples of their use to highlight ranges of values of a continuos variable from the same data set. I start by constructing a suitable base plot to use for labelling examples. ```{r} p2 <- ggplot(my.cars, aes(disp, mpg)) + geom_point()+ geom_hline(yintercept = 0, colour = "grey90", linewidth = 1.2) + theme_bw() p2 ``` I construct a data frame with the data for labelling ranges. In this case as _x_ is numeric, and we set the _x_ values for the segment tips using user-data units (i.e., the values as shown in _x_ axis' tick marks). ```{r} #| code-fold: false my.ranges <- data.frame(A = c(50, 200, 400), B = c(200, 400, 500), bar.height = 0, text = c("small", "medium", "large")) ``` The code is very similar to that used for the pairwise examples in previous sections. The main change is that I used different names for the columns of the data frame. ```{r} p2 + geom_text_pairwise(data = my.ranges, aes(xmin = A, xmax = B, y = bar.height, label = text, colour = text), segment.linewidth = 1) + scale_colour_discrete(guide = "none") + expand_limits(y = 0) + theme_bw() ``` ```{r} p2 + geom_text_pairwise(data = my.ranges, aes(xmin = A, xmax = B, y = bar.height, label = text), arrow = grid::arrow(ends = "both", length = unit(2, "mm"))) + expand_limits(y = 0) + theme_bw() ```