library(ggpmisc)
As part of moving the site to Quarto I intend to write a single post to give an overview of changes in minor versions of the packages. In this case, all versions in the 0.5.x series. I will update this same post in the case of minor version updates, and start a new post when I release the first version in the 0.6.x series. I hope this will reduce the clutter and still provide a good overview of progress. Differences between versions are listed in detail in the NEWS file.
Overview of changes
Version 0.5.0 brings several enhancements to the annotations based on model fits. The most visible change is the new convenience function use_label()
that greatly simplifies the assembly of labels from components and their mapping to aesthetics. I exemplify its use and some of the other new features below in section Examples. Several of the model-fit based statistics now return on request additional variables in data
adding flexibility. Function stat_correlation()
now computes confidence intervals for correlation estimates. New functions keep_tidy()
, keep_glance()
and keep_augment()
are wrappers on methods tidy()
, glance()
and augment()
from package ‘broom’. These new functions make it possible to keep a trace of the origin of the “broom-tidied” outputs similalrly as it is possible with "lm"
objects and other objects returned by R’s model fitting functions.
Version 0.5.1 brings additional enhancements to the annotations based on model fits to improve traceability. New scales scale_colour_logFC()
, scale_color_logFC()
and scale_fill_logFC()
and revised scale_colour_outcome()
and scale_fill_outcome()
add flexibility.
Version 0.5.2 fixes bugs and ensures compatibility with updates to ‘ggplot2’ and ‘lubridate’.
Version 0.5.3 fixes bugs, brings compatibility with package ‘gganimate’ and adds parameter n.min
to all statistics that are based on model fitting or correlation testing functions. Arguments passed to n.min
make it possible to increase the previously hard-coded limit that remains in most cases as the default.
Version 0.5.4 brings stat_multcomp()
for computing multiple comparisons within the framework of general linear hypothesis as implemented in R package ‘multcomp’ and annotation of plots with the outcomes.
Version 0.5.5 fixes bug that prevented use of model formulas with a transformation in the lhs, such as use of I()
or other functions instead of bare y
or x
.
Examples
In this section I demonstrate the use of some of the new features described above.
In the first plot I add an estimate of the correlation coefficient R, and the corresponding t-value and P-value.
ggplot(subset(mpg, cyl != 5), aes(displ, hwy, colour = factor(cyl))) +
geom_point() +
stat_correlation(use_label(c("R", "t", "P")),
label.x = "right") +
theme_bw()
The displacement volume of car engines is known without error and it can the thought as a possible explanation for the petrol use per distance (MPG or miles per gallon) for highway driving. I fit a linear regression per group, and annotate the plot with the fitted linear model equations, F-values and P-values.
ggplot(subset(mpg, cyl != 5), aes(displ, hwy, colour = factor(cyl))) +
geom_point() +
stat_poly_line() +
stat_poly_eq(use_label(c("eq", "F", "P")),
label.x = "right") +
theme_bw()
In the case of the petrol use (MPG) in city traffic compared to highway travel the two variables can be expected to be subject to similar error variation and there is no directional cause-effect relationship between them. So, in this case OLS linear regression is not a suitable approach. We use major axis regression instead, and we add to the label, θ, the angle in degrees between the two lines that could have been fitted by linear regression using x or y as the explanatory variable.
ggplot(subset(mpg, cyl != 5), aes(cty, hwy)) +
geom_point(alpha = 0.2) +
stat_ma_line() +
stat_ma_eq(use_label(c("eq", "theta", "R2", "P"))) +
theme_bw()
Multiple comparisons can be used, for example to test for differences among cars with engines with different numbers of cylinders.
ggplot(subset(mpg, cyl != 5), aes(factor(cyl), hwy)) +
geom_violin() +
stat_multcomp() +
theme_bw()
The examples below depend on packages ‘gganimate’, ‘png’ and ‘gifski’ to create the animations. Please, make sure that these packages are installed before running the code chunks below.
Although getting the animations to work at the command prompt was easy, getting the same code to work within Quarto was more difficult. The need to explicitly pass arguments to parameters dev
, width
and height
of function animate()
only applies to Quarto (and possibly R Markdown) code chunks, as defaults work well at the command prompt or in R scripts but trigger errors when rendering Quarto documents.
As is usually the case with ‘ggplot2’, the default size used for text and label in layers as well as the theme’s base font size need to be increased when rendering plots as high resolution bitmaps. Animated plots are by default rendered first as one bitmap per frame and later merged into animated bitmaps.
Package ‘gganimate’ provides an effective way of showing temporal trends or differences among groups in talks or web pages where animations are supported. As of version 0.5.3 all statistics exported by package ‘ggpmisc’ can be used in animated plots. Even in animations it is possible to rely on the default automatic positioning of labels based on grouping in the underlying static ggplot. As examples, I animate the last two examples above, the first of them in two different ways.
One use case is cycling over subsets of data displaying annotations based on model fits to each subset. In this case it is preferable to have the equation label at the same location in each scene of the animation.
library(gganimate)
<-
p ggplot(subset(mpg, cyl != 5), aes(displ, hwy, colour = factor(cyl))) +
geom_point() +
stat_poly_line() +
stat_poly_eq(use_label(c("eq", "F", "P")),
label.x = "right",
size = 6,
vstep = 0) + # all labels at same location
theme_bw(16) +
transition_states(factor(cyl),
transition_length = 1,
state_length = 3) +
enter_fade() +
exit_fade()
animate(p,
fps=8,
renderer = gifski_renderer(loop = TRUE),
dev = 'png',
width = 800, height = 500, pointsize = 16)
Another use case is animating the building up of the plot by adding in each successive scene a subset of the data. In this case we accept the default position of the equation labels as they are diaplayed simultaneously in the last frame. Even though we retain looping, we add pauses at the first and last frames. We use a longer pause at the last frame, as this last plot includes all data.
To prevent endless looping we could add wrap = FALSE
in transition_states()
, if well timed this could be would for a talk or as part of a video. However in a web page like this and end pause is most appropriate as we have no control on which frame the reader sees first when scrolling up or down the page.
<-
p ggplot(subset(mpg, cyl != 5), aes(displ, hwy, colour = factor(cyl))) +
geom_point() +
stat_poly_line() +
stat_poly_eq(use_label(c("eq", "F", "P")),
size = 6,
label.x = "right") +
theme_bw(16) +
transition_states(factor(cyl),
transition_length = 0,
state_length = 1,
wrap = FALSE) +
shadow_mark()
animate(p,
fps=2.5,
nframes = 30,
start_pause = 5, end_pause = 10,
renderer = gifski_renderer(loop = TRUE),
dev = 'png',
width = 800, height = 500, pointsize = 16)
It is also possible to animate the plot so that plot layers are added gradually. In this case I did not use any transition, and having three layers in the plot I reduced the number of frames from the default 100 to only 3, and decreased the frame rate to one frame every four seconds.
<-
p ggplot(subset(mpg, cyl != 5), aes(cty, hwy)) +
geom_point(alpha = 0.2) +
stat_ma_line() +
stat_ma_eq(use_label(c("eq", "theta", "R2", "P")),
size = 6) +
theme_bw(16)+
transition_layers(layer_length = 5,
transition_length = 0)
animate(p,
fps=1/4,
nframes = 3,
renderer = gifski_renderer(),
dev = 'png',
width = 800, height = 500, pointsize = 16)
Documentation web site includes all help pages, with output from all examples and vignettes in HTML format .
Please raise issues concerning bugs or enhancements to this package through GitHub at https://github.com/aphalo/ggpmisc/issues. Pull requests are also welcome.