R at its simplest

R vs. calculator vs. Excel

R
Author

Pedro J. Aphalo

Published

2023-09-19

Modified

2023-10-28

Abstract
A very simple introduction to R, based on a comparison to calculators and worksheets.
Keywords

introduction

1 R compared to a calculator

For the simplest computations there is little difference between a calculator and R (e.g., if one uses the number pad of the computer keyboard). Nowadays, advanced calculators show the entered text in full, like R does. One uses enter in R, instead of the equals key in a calculator.

\(36 / 12 + 1\)

36 / 12 + 1
[1] 4

Of course, in a computer there are no keys for square root and similar, we type the name of the function instead.

\(\sqrt{36}\)

sqrt(36)
[1] 6

In a calculator we use “memories” (e.g., M1, M2, etc. ) to store values, with keys frequently labeled “MSTO” and “MRCL” used to store and recal them. In R we use names or variables, that we can chose, and use <name> <- to store a value and the <name> to recall the stored value. I use <name> to signify any valid name.

my_number <- 36
my_number
[1] 36

We have assigned 36 to my_number and we can now use my_number in computations.

my_number / 12 + 1
[1] 4
sqrt(my_number)
[1] 6

Why is this useful? Because, we can describe the operations using names, instead of specific numbers or values. This makes it possible to describe the operation to be done, as an abstract rule that we can apply unchanged to different numbers, by assigning them to the name used in the code.

my_number <- 24
my_number / 12 + 1
[1] 3
sqrt(my_number)
[1] 4.898979

1.1 Precedence rules

The normal arithmetic precedence rules apply to R expressions, and the order can be altered with parentheses following the normal rules as in arithmetic. While in mathematics it is common to use different brackets depending on the nesting depth \(\{[( )]\}\) in R only parentheses ( ) are used to any depth of nesting. The other brackets are reserved for other uses.

\(36 / (12 + 1)\)

36 / (12 + 1)
[1] 2.769231

2 R compared to Excel

In R one can store values in a data.frame that is somehow similar to a worksheet, in that each column is a variable and each row corresponds to an observation or measurement event.

We construct a vector by concatenating values with function c() (concatenate).

c(1, 2, 3, 4, 5)
[1] 1 2 3 4 5

We use function data.frame() to construct a new data frame, that here gets displayed.

data.frame(ID = c(1, 2, 3, 4, 5), 
           height = c(170, 155, 145, 180, 167), 
           weight = c(70, 60, 55, 90, 85))
  ID height weight
1  1    170     70
2  2    155     60
3  3    145     55
4  4    180     90
5  5    167     85

On the other hand, in R instructions for calculations and data are kept separate. The data frame can contain not only numbers, but also text and other values but not formulas. The calculations are entered separately, and a single “formula” can refer to whole vectors or columns.

We start by giving a name to the data frame, i.e., storing it in a variable, so that it remains available.

my.df <- data.frame(height = c(1.70, 1.55, 1.45, 1.80, 1.67), # meters
                    weight = c(56, 60, 55, 90, 85)) # kg
print(my.df)
  height weight
1   1.70     56
2   1.55     60
3   1.45     55
4   1.80     90
5   1.67     85

To compute the body mass index (BMI) and add it as a new column, we use “instructions” that make reference to whole columns in the data frame. To extract a column, we use here operator $, so that my.df$weight is column weight from data frame my.df. The same computation is applied to each row.

\[BMI = \frac{m}{h^2}\] where \(m\) is the weight and \(h\) the height of a person.

my.df$BMI <- my.df$weight / my.df$height^2 
print(my.df)
  height weight      BMI
1   1.70     56 19.37716
2   1.55     60 24.97399
3   1.45     55 26.15933
4   1.80     90 27.77778
5   1.67     85 30.47797

If my.df had 1000’s or even 1000000’s of rows, we would have only one copy of the instructions for the operation, or a single code statement. In Excel one copy of the formula in each row of the worksheet would be needed.

Data frames are always rectangular and subject to much more strict rules than worksheets (no empty spaces, no plots, etc.).

3 Functions

Functions are named pieces or chuncks of code, defined using named placeholders or parameters to which we can pass values as arguments.

In an the examples above we called function sqrt() with a constant value 36 as argument and also with variable my_number as argument.

There are many different predefined functions in R, and as we will see later, we can also create our own functions.

4 Arithmetic operators and math functions

Start by exploring the help to find the arithmetic operators and functions.

help(Arithmetic)
help(sqrt)
help(log)

Have a look also at the triginometric functions. Trigonometric functions accept angles in radians, not in degrees!

help(Trig)

4.1 Time to play

Now it is time for your to play with numbers. Use R as you would use a (scientific) calculator using both numeric constants like 123 directly and after saving them to a variable.

  1. \(\sqrt{7 + 2}\)
  2. \(\frac{\log_{10}(100)}{3 + 2}\)
  3. \(e^4\)
  4. \(sin(2 \times \pi)\)
  5. \(cos(\pi / 4)\)
  6. try your own examples, i.e., play!

5 Simple statistics

help(mean) # mean or average
help(var) # variance
help(sd) # standard deviation
help(median) # median
help(mad) # median absolute deviation
help(mode) # mode

And a couple of summaries.

help(sum)
help(prod)

5.1 Time to play

Now it is time for your to play with numbers.

\[x = 1, 3, 5, 10, 7, 8\]

  1. \(\bar{x}\) (mean)
  2. \(s^2(x)\) (variance)
  3. \(s(x)\) (standard deviation)
  4. \(\sum_{i=1}^{i=n} x_i\) (sum)
  5. \(\prod_{i=1}^{i=n} x_i\) (product)
  6. \(\bar{x} = \sum_{i=1}^{i=n} x_i / n\)
  7. try your own examples, i.e., play!

6 Cronstructing a new function

You have already used some functions in the exercises above… As mentioned above, a function is a “chunk” of code to which we give a name.

When we compute the mean with function mean() from base R as

mean(c(1,2,6,10))
[1] 4.75

and we say that we call function mean() with c(1,2,6,10)as argument.

We can define a very simple and nearly equivalent function using another two base R functions, sum() and length() as

my.mean <- function(x) {sum(x) / length(x)}

In this definition, we say that x is a formal parameter of function my.mean(). In the code that forms the body of the function, this formal parameter functions as a placeholder for the argument we pass calling the function.

When we use our function as above,

my.mean(c(1,2,6,10))
[1] 4.75

c(1,2,6,10) replaces x and the computation becomes equivalent to

sum(c(1,2,6,10)) / length(c(1,2,6,10))
[1] 4.75
Note

Function my.mean() lacks error-handling code and for very large arguments it is likely to be slightly slower than mean().

We can use these functions repeatedly, calling them with different arguments.

We could say that by defining my.mean() we have added a new verb to the R language.

6.1 Time to play

Define your own function to compute the variance and compare the results it returns to those returned by base R function var()

Variance can be computed as \(S^2 = \frac{\sum (x_i - \bar{x})^2}{n - 1}\)