Is this a polynomial?

Computing on the R langauge

R
plotting
Author

Pedro J. Aphalo

Published

2024-05-16

Modified

2024-05-22

Keywords

ggpmisc pkg, R programming, annotations

The problem

Two statistics from package ‘ggpmisc’ could produce bad return values if the model formula passed by users as argument did not match the expectations. The problem was that any valid model formula accepted by lm() or rlm() would be silently accepted as input but only some formulations would result in valid equation labels added to plots. Although the statistics expect what to me feels like the “normal” way of writing a polynomial and this is clearly described in the documentation, some users had different expectations. So, in spite of the examples in the documentation an issue was raised for a bug. Of course the lack of a proper test was a bug, but not as the user thought, one affecting input that was deemed correct by design. Of course, functions like these should check that user-provided input is valid and will result in correct output.

Why I did not implement a check earlier? It seemed to me at first that implementing code for such a test was an extremely difficult task. However, one of the strong points of the R language is that language objects like model formulas can be manipulated and converted like almost any other object in R. Making the story short, it was fairly easy to write an ad-hoc first version of a function to validate model functions as representing a polynomial. It took a couple of hours of design, coding, testing, revision, and trying to imagine all the different ways in which polynomials can be described in model functions, as well as imagining model functions that are similar to polynomials but not true polynomials. The first version, and the second partial rewrite were not perfect, they produced both false positive and false negative results occasionally. After adding a bunch of unit tests and further changes to the code, the function seems to now work fairly well. It will still mark as bad some possibly borderline cases.

The aim was not simply to detect model formulas describing a polynomial from others. The aim was to detect model formulas describing polynomials that would result in valid equation labels from others. Now using this test, the statistics return NA and issue a warning when they cannot handle the model described by the model formula. In this cases, the user can still assemble the equation label within a call to aes() as the numeric values of the coefficirnt estimates is still returned.

Given how the two statistics work, polynomials have to have the terms in order of increasing powers, and they can have or not a constant (y-intercept) term. In addition terms in a model formula, containing a bare ^ operator are only powers of the explanatory variable if enclosed in an as.is call (I()). Power transformations are currently not accepted if applied to a single term.

Note

Does this function give false positives and false negatives? Likely, but hopefully I haven’t missed any frequently used case. Time and further testing will tell. Please, do raise an issue if you discover a failure.

Meanwhile, the two statistics, stat_poly_eq() and stat_quant_eq(), now issue a warning when passed model formulas that fail this new test and set eq.label to NA in the returned data frame, the data seen by the downstream geometry.

The function, check_poly_formula() has an on-line version of its help page, with some code examples.

And here is the definition of check_poly_formula() as of 2024-05-22:

check_poly_formula <-
  function(formula,
           x.name = "x",
           warning.text = "'formula' not an increasing polynomial: 'eq.label' is NA!") {
  rhs <- as.character(formula)[3]
  rhs.terms <- unlist(strsplit(x = rhs, split = c("+", "*"), fixed = TRUE))
  num.terms <- length(rhs.terms)
  x.terms <- grepl(x.name, rhs.terms)
  poly.in.terms <- grepl("poly *\\(", as.character(formula)[3L])
  power.terms  <- grepl("\\^ *", rhs.terms)
  raw.terms  <- grepl("raw *=", rhs.terms)
  as.is.terms <- grepl("I *\\(", rhs.terms)

  if (num.terms > 1L && poly.in.terms && sum(power.terms) != 0L) {
    stop("Both 'poly()' and power (^) terms in model formula.")
  }
  if (num.terms > 1L && !all(which(power.terms) %in% which(as.is.terms))) {
    warning("Power (^) terms in model formula of a polynomial need to be protected by 'I()'.")
    return(FALSE)
  }
  if (poly.in.terms && !sum(raw.terms)) {
    warning("'poly()' in model formula has to be passed 'raw = TRUE'")
  }
  if (sum(x.terms) == 0L || poly.in.terms && num.terms == 1L) {
    polynomial <- TRUE
    increasing <- TRUE
  } else if (sum(power.terms) < 1L ||
             sum(power.terms) == num.terms - 1L &&
             sum(x.terms) == num.terms ||
             sum(power.terms) == num.terms - 2L &&
             sum(x.terms) == num.terms - 1L) {
    polynomial <- TRUE
    if (sum(x.terms) == 1L || min(which(power.terms)) %in% 2L:3L) {
      powers <- as.numeric(gsub(".*\\^([0-9]+).*", "\\1", rhs.terms[power.terms]))
      increasing <- length(powers) <= 1L ||
        !is.unsorted(powers, strictly = TRUE) &&
        max(powers) == length(powers) + 1 # no missing terms
    } else {
      increasing <- FALSE
    }
  } else {
    polynomial = FALSE
  }
  if (!polynomial || !increasing) {
    if (length(warning.text)) {
      warning(warning.text)
    }
    FALSE
  } else {
    TRUE
  }
}

Am I still missing something? Can the code be really this simple? (well, no longer so simple…)

Some examples follow.

library(ggpmisc)
Loading required package: ggpp
Loading required package: ggplot2
Registered S3 methods overwritten by 'ggpp':
  method                  from   
  heightDetails.titleGrob ggplot2
  widthDetails.titleGrob  ggplot2

Attaching package: 'ggpp'
The following object is masked from 'package:ggplot2':

    annotate
check_poly_formula(y ~ 1)
[1] TRUE
check_poly_formula(y ~ x)
[1] TRUE
check_poly_formula(y ~ x^3)
Warning in check_poly_formula(y ~ x^3): 'formula' not an increasing polynomial:
'eq.label' is NA!
[1] FALSE
check_poly_formula(y ~ poly(x, 2))
Warning in check_poly_formula(y ~ poly(x, 2)): 'poly()' in model formula has to
be passed 'raw = TRUE'
[1] TRUE
check_poly_formula(y ~ x + 0)
[1] TRUE
check_poly_formula(y ~ x - 1)
[1] TRUE
check_poly_formula(y ~ x + 1)
[1] TRUE
check_poly_formula(y ~ x + I(x^2))
[1] TRUE
check_poly_formula(y ~ 1 + x + I(x^2))
[1] TRUE
check_poly_formula(y ~ I(x^2) + x)
Warning in check_poly_formula(y ~ I(x^2) + x): 'formula' not an increasing
polynomial: 'eq.label' is NA!
[1] FALSE
check_poly_formula(y ~ x + I(x^2) + I(x^3))
[1] TRUE
check_poly_formula(y ~ I(x^2) + I(x^3))
Warning in check_poly_formula(y ~ I(x^2) + I(x^3)): 'formula' not an increasing
polynomial: 'eq.label' is NA!
[1] FALSE
check_poly_formula(y ~ x + I(x^3) + I(x^2))
Warning in check_poly_formula(y ~ x + I(x^3) + I(x^2)): 'formula' not an
increasing polynomial: 'eq.label' is NA!
[1] FALSE