function (formula, x.name = "x", warning.text = "'formula' not an increasing polynomial: 'eq.label' is NA!")
{<- as.character(formula)[3]
rhs <- unlist(strsplit(x = rhs, split = c("+", "*"),
rhs.terms fixed = TRUE))
<- length(rhs.terms)
num.terms <- grepl(x.name, rhs.terms)
x.terms <- grepl("poly *\\(", as.character(formula)[3L])
poly.in.terms <- grepl("\\^ *", rhs.terms)
power.terms <- grepl("raw *=", rhs.terms)
raw.terms <- grepl("I *\\(", rhs.terms)
as.is.terms if (num.terms > 1L && poly.in.terms && sum(power.terms) !=
0L) {
stop("Both 'poly()' and power (^) terms in model formula.")
}if (num.terms > 1L && !all(which(power.terms) %in% which(as.is.terms))) {
warning("Power (^) terms in model formula of a polynomial need to be protected by 'I()'.")
return(FALSE)
}if (poly.in.terms && !sum(raw.terms)) {
warning("'poly()' in model formula has to be passed 'raw = TRUE'")
}if (sum(x.terms) == 0L || poly.in.terms && num.terms == 1L) {
<- TRUE
polynomial <- TRUE
increasing
}else if (sum(power.terms) < 1L || sum(power.terms) == num.terms -
1L && sum(x.terms) == num.terms || sum(power.terms) ==
- 2L && sum(x.terms) == num.terms - 1L) {
num.terms <- TRUE
polynomial if (sum(x.terms) == 1L || min(which(power.terms)) %in%
2L:3L) {
<- as.numeric(gsub(".*\\^([0-9]+).*", "\\1",
powers
rhs.terms[power.terms]))<- length(powers) <= 1L || !is.unsorted(powers,
increasing strictly = TRUE) && max(powers) == length(powers) +
1
}else {
<- FALSE
increasing
}
}else {
= FALSE
polynomial
}if (!polynomial || !increasing) {
if (length(warning.text)) {
warning(warning.text)
}FALSE
}else {
TRUE
}
}<bytecode: 0x0000025815155cf0>
<environment: namespace:ggpmisc>
The problem
Two statistics from package ‘ggpmisc’ could produce bad return values if the model formula passed by users as argument did not match the expectations. The problem was that any valid model formula accepted by lm()
or rlm()
would be silently accepted as input but only some formulations would result in valid equation labels added to plots. The statistics expect what to me feels like the “normal” way of writing a polynomial, agreeing with the expectation of polynom::polynomial()
used internally. Although this is clearly described in the documentation, some users had different expectations. So, in spite of the examples in the documentation an issue was raised for a bug. Of course, the lack of a test to validate the argument to ensure the returned value is correct was an important bug, but not as the user who raised the issue thought, one affecting input that was deemed correct by design. To address this I added the previously missing test.
Why did I not implement a such a test earlier? It seemed to me at first that implementing code for this test would be an extremely difficult task. However, one of the strong points of the R language is that language objects like model formulas can be manipulated and converted like almost any other object in R. Making the story short, it was fairly easy to write an ad-hoc first version of a function to validate model functions as representing a polynomial. It took a couple of hours of design, coding, testing, revision, and trying to imagine all the different ways in which polynomials can be described in model functions, as well as imagining model functions that are similar to polynomials but not true polynomials. The first version, and the second partial rewrite were not perfect, they produced both false positive and false negative results occasionally. After adding a bunch of unit tests and further changes to the code, the function seems to now work fairly well. It will still mark as bad some possibly borderline cases.
The aim was not simply to detect model formulas describing a polynomial from others. The aim was to detect model formulas describing polynomials that would result in valid equation labels from others. Now using this test, the statistics return NA
and issue a warning when they cannot handle the model described by the model formula. In this case, the user can still assemble the equation label within a call to aes()
as the numeric values of the coefficient estimates are always extracted from the model object and returned, and available in the data
object received by the downstream geometry.
Given how the two statistics work, polynomials have to have the terms in order of increasing powers, and they can have or not a constant (y-intercept) term. In addition terms in a model formula, containing a bare ^
operator are only powers of the explanatory variable if enclosed in an as.is call (I()
). Power transformations are currently not accepted if applied to a single term.
Does this function give false positives and false negatives? Possibly, but hopefully I haven’t missed any frequently used case. Time and further testing will tell. Please, do raise an issue if you discover a failure.
Meanwhile, the two statistics, stat_poly_eq()
and stat_quant_eq()
, now issue a warning when passed model formulas that fail this new test and set eq.label
to NA
in the returned data frame, the data
seen by the downstream geometry.
The function, check_poly_formula()
has an on-line version of its help page, with some code examples.
The definition of function check_poly_formula()
in version 0.6.1 of package ‘ggpmisc’ is listed below.
Am I still missing something? Can the code be really this simple? (well, no longer so simple…)
Some examples follow.
check_poly_formula(y ~ 1)
[1] TRUE
check_poly_formula(y ~ x)
[1] TRUE
check_poly_formula(y ~ x^3)
Warning in check_poly_formula(y ~ x^3): 'formula' not an increasing polynomial:
'eq.label' is NA!
[1] FALSE
check_poly_formula(y ~ poly(x, 2))
Warning in check_poly_formula(y ~ poly(x, 2)): 'poly()' in model formula has to
be passed 'raw = TRUE'
[1] TRUE
check_poly_formula(y ~ x + 0)
[1] TRUE
check_poly_formula(y ~ x - 1)
[1] TRUE
check_poly_formula(y ~ x + 1)
[1] TRUE
check_poly_formula(y ~ x + I(x^2))
[1] TRUE
check_poly_formula(y ~ 1 + x + I(x^2))
[1] TRUE
check_poly_formula(y ~ I(x^2) + x)
Warning in check_poly_formula(y ~ I(x^2) + x): 'formula' not an increasing
polynomial: 'eq.label' is NA!
[1] FALSE
check_poly_formula(y ~ x + I(x^2) + I(x^3))
[1] TRUE
check_poly_formula(y ~ I(x^2) + I(x^3))
Warning in check_poly_formula(y ~ I(x^2) + I(x^3)): 'formula' not an increasing
polynomial: 'eq.label' is NA!
[1] FALSE
check_poly_formula(y ~ x + I(x^3) + I(x^2))
Warning in check_poly_formula(y ~ x + I(x^3) + I(x^2)): 'formula' not an
increasing polynomial: 'eq.label' is NA!
[1] FALSE