========================================
== The Salopian Scientific Collective ==
========================================
A data blog by Daniel Greenwood

Knitting automated reports with R Markdown

R Markdown Automation

R Markdown is a simple markup language based based on Markdown, with added functionality for including R code and its output. This entire blog is written as R Markdown documents. I write the text and code in R Markdown, then:

  1. Run all the R code and knit the input and output together into a regular Markdown file.
  2. Knit that Markdown file into a static HTML page
  3. Run the Hugo program, which in a few fractions of a second turns that folder full of HTML pages (together with a few configuration files) into a fully functioning website with homepage, tags, menus etc.
  4. Upload the finished site to my web server

Steps 1 to 3 are all handled by a single function in the blogdown package, which runs in less than a second once the R code has finished. Here’s what the code looks like for this very page:

This shows how R Markdown can be used to create human-friendly output in a linear workflow: one R markdown document creates one webpage. By changing a keyword in the file header, we could instead create a PowerPoint slide or pdf.

From one R Markdown to many outputs

Here I want to show a different kind of use for R Markdown, and one that I find incredibly useful when I need to create automated analysis reports. For example, in a high-content compound screen I typically make the following:

  • QC reports for each assay plate in a high content imaging screen, highlighting outlier values for later troubleshooting
  • An individual report summarising the results for each compound in the assay, ideal for checking top hits later or sharing with collaborators

Here I will show an example using the Diamonds data set. If you want to learn the basics of R Markdown, check out the RStudio tutorial here.

Getting started

For this tutorial, we will create an pdf report for each diamond colour, plotting some summary information on price, clarity etc.

Instead of a single R Markdown including all the code, we will create two:

  1. analysis.R (which processes the data)
  2. report.Rmd (a template which will receive plot the data from analysis.R)

To knit to a pdf, you also need to have a version of LaTeX installed on your computer. If not, you can do so with:

install.packages('tinytex')
tinytex::install_tinytex()

Configuring the R Markdown file

The YAML header of our markdown file is like any other, but with the addition of a params section. We must explicitly state every parameter that we want to pass to this template along with a default value. We can then access these values from within embedded R code using the params object, which is a named list of all of the parameters. We can also use this to specify a custom title for our report, here using # r params$color inside the title.

# ---
# title: "Diamond colour report: `r params$color`"
# author: "Daniel Greenwood"
# output: pdf_document
# date: "`r Sys.Date()`"
# params:
#   color: 'test'
#   data: NULL
# ---
# 
# ```{r setup, include = FALSE}
# library(tidyverse)
# ```

Configuring the output

Next, we use the standard R Markdown format to specify the output we want to produce. Here I’ve made three code chunks to display the average price of that colour of diamond, a table of available cuts and a plot of price vs weight, with diamonds of this colour highlighted in red. Note that we get the value of color using the code params$color.

 #  Summary data from the `diamonds` dataset.
 # 
 #  ### Average price
 # 
 #  ```{r echo=FALSE, paged.print=TRUE}
 #  params$data %>%
 #    filter(color == params$color) %>%
 #    pull(price) %>%
 #    mean(na.rm = T)
 #  ```
 # 
 #  ### Table of cuts frequency
 # 
 #  ```{r echo=FALSE, paged.print=TRUE}
 #  params$data %>%
 #    filter(color == params$color) %>%
 #    pull(cut) %>%
 #    table
 #  ```
 # ```
 #  
 #  ### Plot of price vs weight for this colour vs others
 #  
 #  ```{r echo=FALSE, fig.height=3, paged.print=TRUE}
 #  params$data %>%
 #    ggplot() +
 #    aes(carat, price, colour = color == params$color) %>%
 #    geom_point(size = 0.25) +
 #    scale_colour_manual(values = c('gray50', 'tomato'))
 #  ```

Rendering reports from R

Now we create the script that processes the data and passes it to the template. You can do this from any regular R script as part of a larger analysis, but here we will just make a minimal example.

# Load dependencies
library(tidyverse)

# Load the data set
data(diamonds)

Finally, we use the function rmarkdown::render in a loop to render the template file using each unique value of color.

for(color in unique(diamonds$color)){
  # Render the R Markdown template for each diamond color
  rmarkdown::render(input = "report.Rmd", # The R Markdown template
                    output_file = glue::glue("reports/diamond_colour_{color}.pdf"), # Dynamic file name
                    params = list(
                      data = diamonds, # Pass the data set
                      color = color # Pass the current color
                    )
  )

}

Here we specify three arguments for the render function:

  1. input: The name of the R Markdown template file
  2. output_file: The name of the output file. Using the glue function, we dynamically create the file name using the actual value of color
  3. params: A named list that includes the objects passed to the template file. This names must match exactly those that we use in the template.

The result

Let’s open up one of the pdf reports:

As always, I love to hear feedback. If you found this useful or have any comments or suggestions, feel free to get in touch with the links in the footer. I am always on the lookout for suggestions for future tutorials or posts.