R at its simplest
Modified | 2023-10-28 |
abstract | A very simple introduction to R, based on a comparison to calculators and worksheets. |
Data Analysis and Visualization
Pedro J. Aphalo
This Contents page lists pages that I have written for course IPS-003 and for the R-peer-support meetings at the Viikki Campus of the University of Helsinki. The level of difficulty level varies from introductory to intermediate. I update these pages from time to time and I will add new pages from time to time.
R is a language and an environment for data analysis and visualization. It has become the standard for data analysis and visualisation in many fields. In Bioinformatics it “competes” with Python, but R can “talk” with code written in most other programming languages. R can be extended by means of code packages which can be locally installed in a library.
Simple computations, introduction to plotting, ANOVA and regression, and some basic computer programming constructs.
Moderately advanced R learning material is available as a free on-line course at intro2R.
The second edition of my book Learn R: As a Language will be published on 26 April 2024. A dedicated website provides additional information and some free extra chapters. My book does not assume previous programing and focuses on the R language itself rather than on using R for specific purposes.
Creating informative and elegant plots for inclussion in publications, reports and theses requires the same kind of approach than text. Design, drafting and frequently several rounds of revision. Plots are also very important for exploration and quality control of data. The requirements are rather different with respect to the graphical design, but not in relation to highlighting different features of a data set. Pages in this section, describe how to create specific types of plots or even how to add specific features to a plot. They assume familiarity with the basics of plotting with package ‘ggplot2’. The ‘ggplot2’ book is available on-line as an open-access web site.
Modified | 2023-02-25 |
abstract | Example R code for plots with labels using position functions from package ggpp that combine the actions of two separate position functions available in package ggplot2, such as simultaneous use of stack and nudge and dodge and nudge. The examples show how to easily add labels to stacked and dodged bar plots retaining full control of the labels’ positioning. |
Modified | 2023-07-16 |
abstract | Example R code for volcano plots and quadrant plots built with packages ggplot2 (>= 3.4.2), ggpp (>= 0.5.3), ggpmisc (>= 0.5.3) and ggrepel (>= 0.9.1). The examples demonstrate the use different types of annotations and data labels. My packages ggpp and ggpmisc include new geometries, statistics and scale functions specific and/or useful when plotting of gene-expression data in volcano and quadrant plots. |
Modified | 2023-02-23 |
abstract | Example R code for plots based on package ggplot2 using geometries defined in package ggpp to add insets. These geometries from package ggpp implement addition of plot layers with plots, tables or other graphical objects as insets to a base plot, through extension of the Grammar of Graphics. |
Modified | 2023-06-26 |
abstract | Example R code for interactive plots based on package ggplot2 using geometries defined in package ggpp and statistics from package ggpmisc together with package plotly. This page is a draft and currently contains a single plot example. |
I have beeen teaching data analysis and design of experiments, which are very tightly dependent on each other. Statistics gives theoretical support to data analysis methods, but efficiently extracting information from observations from experiments and surveys is in many ways like detective work or solving puzzles. Modern data analysis makes heavy use of visual data displays (plots, diagrams, graphs). Much of the material I have used for the course IPS-003 is in the pages listed below.
Interactive dashboards can help understand how the amount uncontrolled variation in observations and the number of replicates affect both tests of significance and parameter estimates when fitting models.
The interactive web page Design of Experiments: Playing with numbers is hosted at the ShinyApps server.
I have published these pages under a Creative Commons licence that allows reuse and derivative works.
Why have I published these pages in my own server with unrestricted access instead of in a course space at the University of Helsinki Moodle server? This is because I want to make sure they are truly open access, and that they remain available to all as long as I can maintain them, even after my retirement.