36 / 12 + 1
[1] 4
R vs. calculator vs. Excel
Pedro J. Aphalo
2023-09-19
2023-10-28
introduction
For the simplest computations there is little difference between a calculator and R (e.g., if one uses the number pad of the computer keyboard). Nowadays, advanced calculators show the entered text in full, like R does. One uses enter in R, instead of the equals key in a calculator.
\(36 / 12 + 1\)
Of course, in a computer there are no keys for square root and similar, we type the name of the function instead.
\(\sqrt{36}\)
In a calculator we use “memories” (e.g., M1, M2, etc. ) to store values, with keys frequently labeled “MSTO” and “MRCL” used to store and recal them. In R we use names or variables, that we can chose, and use <name> <-
to store a value and the <name>
to recall the stored value. I use <name>
to signify any valid name.
We have assigned 36
to my_number
and we can now use my_number
in computations.
Why is this useful? Because, we can describe the operations using names, instead of specific numbers or values. This makes it possible to describe the operation to be done, as an abstract rule that we can apply unchanged to different numbers, by assigning them to the name used in the code.
The normal arithmetic precedence rules apply to R expressions, and the order can be altered with parentheses following the normal rules as in arithmetic. While in mathematics it is common to use different brackets depending on the nesting depth \(\{[( )]\}\) in R only parentheses ( )
are used to any depth of nesting. The other brackets are reserved for other uses.
\(36 / (12 + 1)\)
In R one can store values in a data.frame
that is somehow similar to a worksheet, in that each column is a variable and each row corresponds to an observation or measurement event.
We construct a vector by concatenating values with function c()
(concatenate).
We use function data.frame()
to construct a new data frame, that here gets displayed.
data.frame(ID = c(1, 2, 3, 4, 5),
height = c(170, 155, 145, 180, 167),
weight = c(70, 60, 55, 90, 85))
ID height weight
1 1 170 70
2 2 155 60
3 3 145 55
4 4 180 90
5 5 167 85
On the other hand, in R instructions for calculations and data are kept separate. The data frame can contain not only numbers, but also text and other values but not formulas. The calculations are entered separately, and a single “formula” can refer to whole vectors or columns.
We start by giving a name to the data frame, i.e., storing it in a variable, so that it remains available.
my.df <- data.frame(height = c(1.70, 1.55, 1.45, 1.80, 1.67), # meters
weight = c(56, 60, 55, 90, 85)) # kg
print(my.df)
height weight
1 1.70 56
2 1.55 60
3 1.45 55
4 1.80 90
5 1.67 85
To compute the body mass index (BMI) and add it as a new column, we use “instructions” that make reference to whole columns in the data frame. To extract a column, we use here operator $
, so that my.df$weight
is column weight
from data frame my.df
. The same computation is applied to each row.
\[BMI = \frac{m}{h^2}\] where \(m\) is the weight and \(h\) the height of a person.
height weight BMI
1 1.70 56 19.37716
2 1.55 60 24.97399
3 1.45 55 26.15933
4 1.80 90 27.77778
5 1.67 85 30.47797
If my.df
had 1000’s or even 1000000’s of rows, we would have only one copy of the instructions for the operation, or a single code statement. In Excel one copy of the formula in each row of the worksheet would be needed.
Data frames are always rectangular and subject to much more strict rules than worksheets (no empty spaces, no plots, etc.).
Functions are named pieces or chuncks of code, defined using named placeholders or parameters to which we can pass values as arguments.
In an the examples above we called function sqrt()
with a constant value 36
as argument and also with variable my_number
as argument.
There are many different predefined functions in R, and as we will see later, we can also create our own functions.
Start by exploring the help to find the arithmetic operators and functions.
Have a look also at the triginometric functions. Trigonometric functions accept angles in radians, not in degrees!
Now it is time for your to play with numbers. Use R as you would use a (scientific) calculator using both numeric constants like 123
directly and after saving them to a variable.
And a couple of summaries.
Now it is time for your to play with numbers.
\[x = 1, 3, 5, 10, 7, 8\]
You have already used some functions in the exercises above… As mentioned above, a function is a “chunk” of code to which we give a name.
When we compute the mean with function mean()
from base R as
and we say that we call function mean()
with c(1,2,6,10)
as argument.
We can define a very simple and nearly equivalent function using another two base R functions, sum()
and length()
as
In this definition, we say that x
is a formal parameter of function my.mean()
. In the code that forms the body of the function, this formal parameter functions as a placeholder for the argument we pass calling the function.
When we use our function as above,
c(1,2,6,10)
replaces x
and the computation becomes equivalent to
Function my.mean()
lacks error-handling code and for very large arguments it is likely to be slightly slower than mean()
.
We can use these functions repeatedly, calling them with different arguments.
We could say that by defining my.mean()
we have added a new verb to the R language.
Define your own function to compute the variance and compare the results it returns to those returned by base R function var()
Variance can be computed as \(S^2 = \frac{\sum (x_i - \bar{x})^2}{n - 1}\)
---
title: "R at its simplest"
subtitle: "R vs. calculator vs. Excel"
author: "Pedro J. Aphalo"
date: 2023-09-19
date-modified: 2023-10-28
categories: [R]
keywords: [introduction]
format:
html:
code-fold: false
code-tools: true
abstract:
A very simple introduction to R, based on a comparison to calculators and worksheets.
draft: false
---
## R compared to a calculator
For the simplest computations there is little difference between a calculator and R (e.g., if one uses the number pad of the computer keyboard). Nowadays, advanced calculators show the entered text in full, like R does. One uses enter in R, instead of the equals key in a calculator.
$36 / 12 + 1$
```{r}
36 / 12 + 1
```
Of course, in a computer there are no keys for square root and similar, we type the name of the function instead.
$\sqrt{36}$
```{r}
sqrt(36)
```
In a calculator we use "memories" (e.g., M1, M2, etc. ) to store values, with keys frequently labeled "MSTO" and "MRCL" used to store and recal them. In R we use names or _variables_, that we can chose, and use `<name> <-` to store a value and the `<name>` to recall the stored value. I use `<name>` to signify any valid name.
```{r}
my_number <- 36
my_number
```
We have assigned `36` to `my_number` and we can now use `my_number` in computations.
```{r}
my_number / 12 + 1
sqrt(my_number)
```
Why is this useful? Because, we can describe the operations using names, instead of specific numbers or values. This makes it possible to describe the operation to be done, as an abstract rule that we can apply unchanged to different numbers, by assigning them to the name used in the code.
```{r}
my_number <- 24
my_number / 12 + 1
sqrt(my_number)
```
### Precedence rules
The normal arithmetic precedence rules apply to R expressions, and the order can be altered with parentheses following the normal rules as in arithmetic. While in mathematics it is common to use different brackets depending on the nesting depth $\{[( )]\}$ in R only parentheses `( )` are used to any depth of nesting. _The other brackets are reserved for other uses._
$36 / (12 + 1)$
```{r}
36 / (12 + 1)
```
## R compared to Excel
In R one can store values in a `data.frame` that is somehow similar to a worksheet, in that each column is a variable and each row corresponds to an observation or measurement event.
We construct a vector by concatenating values with function `c()` (_concatenate_).
```{r}
c(1, 2, 3, 4, 5)
```
We use function `data.frame()` to construct a new data frame, that here gets displayed.
```{r}
data.frame(ID = c(1, 2, 3, 4, 5),
height = c(170, 155, 145, 180, 167),
weight = c(70, 60, 55, 90, 85))
```
On the other hand, in R instructions for calculations and data are kept separate. The data frame can contain not only numbers, but also text and other values but not formulas. The calculations are entered separately, and a single "formula" can refer to whole vectors or columns.
We start by giving a name to the data frame, i.e., storing it in a variable, so that it remains available.
```{r}
my.df <- data.frame(height = c(1.70, 1.55, 1.45, 1.80, 1.67), # meters
weight = c(56, 60, 55, 90, 85)) # kg
print(my.df)
```
To compute the body mass index (BMI) and add it as a new column, we use "instructions" that make reference to whole columns in the data frame. To _extract_ a column, we use here operator `$`, so that `my.df$weight` is column `weight` from data frame `my.df`. The same computation is applied to each row.
$$BMI = \frac{m}{h^2}$$
where $m$ is the weight and $h$ the height of a person.
```{r}
my.df$BMI <- my.df$weight / my.df$height^2
print(my.df)
```
If `my.df` had 1000's or even 1000000's of rows, we would have only one copy of the instructions for the operation, or a single _code statement_. In Excel one copy of the formula in each row of the worksheet would be needed.
Data frames are always rectangular and subject to much more strict rules than worksheets (no empty spaces, no plots, etc.).
## Functions
Functions are named pieces or _chuncks_ of code, defined using named placeholders or _parameters_ to which we can pass values as _arguments_.
In an the examples above we _called_ function `sqrt()` with a _constant value_ `36` as argument and also with variable `my_number` as argument.
There are many different predefined functions in R, and as we will see later, we can also create our own functions.
## Arithmetic operators and math functions
Start by exploring the help to find the arithmetic operators and functions.
```{r, eval=FALSE}
help(Arithmetic)
```
```{r, eval=FALSE}
help(sqrt)
```
```{r, eval=FALSE}
help(log)
```
Have a look also at the triginometric functions. Trigonometric functions accept angles in radians, not in degrees!
```{r, eval=FALSE}
help(Trig)
```
### Time to play
Now it is time for your to play with numbers. Use R as you would use a (scientific) calculator using both numeric constants like `123` directly and after saving them to a variable.
a. $\sqrt{7 + 2}$
a. $\frac{\log_{10}(100)}{3 + 2}$
a. $e^4$
a. $sin(2 \times \pi)$
a. $cos(\pi / 4)$
a. try your own examples, i.e., **play!**
## Simple statistics
```{r, eval=FALSE}
help(mean) # mean or average
help(var) # variance
help(sd) # standard deviation
help(median) # median
help(mad) # median absolute deviation
help(mode) # mode
```
And a couple of summaries.
```{r, eval=FALSE}
help(sum)
help(prod)
```
### Time to play
Now it is time for your to play with numbers.
$$x = 1, 3, 5, 10, 7, 8$$
a. $\bar{x}$ (mean)
a. $s^2(x)$ (variance)
a. $s(x)$ (standard deviation)
a. $\sum_{i=1}^{i=n} x_i$ (sum)
a. $\prod_{i=1}^{i=n} x_i$ (product)
a. $\bar{x} = \sum_{i=1}^{i=n} x_i / n$
a. try your own examples, i.e., **play!**
## Cronstructing a new function
You have already used some functions in the exercises above... As mentioned above, a function is a "chunk" of code to which we give a name.
When we compute the mean with function `mean()` from _base_ R as
```{r}
mean(c(1,2,6,10))
```
and we say that we call function `mean()` with `c(1,2,6,10)`as argument.
We can define a very simple and nearly equivalent function using another two _base_ R functions, `sum()` and `length()` as
```{r}
my.mean <- function(x) {sum(x) / length(x)}
```
In this definition, we say that `x` is a _formal parameter_ of function `my.mean()`. In the code that forms the _body_ of the function, this formal parameter functions as a placeholder for the argument we pass calling the function.
When we use our function as above,
```{r}
my.mean(c(1,2,6,10))
```
`c(1,2,6,10)` replaces `x` and the computation becomes equivalent to
```{r}
sum(c(1,2,6,10)) / length(c(1,2,6,10))
```
::: callout-note
Function `my.mean()` lacks error-handling code and for very large arguments it is likely to be slightly slower than `mean()`.
:::
We can use these functions repeatedly, calling them with different arguments.
We could say that by defining `my.mean()` we have added a new _verb_ to the R language.
### Time to play
Define your own function to compute the variance and compare the results it returns to those returned by _base_ R function `var()`
Variance can be computed as
$S^2 = \frac{\sum (x_i - \bar{x})^2}{n - 1}$