Knitting automated reports with R Markdown
R Markdown Automation
R Markdown
is a simple markup language based based on Markdown
, with added functionality for including R code and its output. This entire blog is written as R Markdown
documents. I write the text and code in R Markdown
, then:
- Run all the R code and knit the input and output together into a regular
Markdown
file. - Knit that
Markdown
file into a staticHTML
page - Run the
Hugo
program, which in a few fractions of a second turns that folder full ofHTML
pages (together with a few configuration files) into a fully functioning website with homepage, tags, menus etc. - Upload the finished site to my web server
Steps 1 to 3 are all handled by a single function in the blogdown
package, which runs in less than a second once the R code has finished. Here’s what the code looks like for this very page:
This shows how R Markdown can be used to create human-friendly output in a linear workflow: one R markdown
document creates one webpage. By changing a keyword in the file header, we could instead create a PowerPoint slide or pdf.
From one R Markdown to many outputs
Here I want to show a different kind of use for R Markdown, and one that I find incredibly useful when I need to create automated analysis reports. For example, in a high-content compound screen I typically make the following:
- QC reports for each assay plate in a high content imaging screen, highlighting outlier values for later troubleshooting
- An individual report summarising the results for each compound in the assay, ideal for checking top hits later or sharing with collaborators
Here I will show an example using the Diamonds
data set. If you want to learn the basics of R Markdown
, check out the RStudio tutorial here.
Getting started
For this tutorial, we will create an pdf report for each diamond colour, plotting some summary information on price, clarity etc.
Instead of a single R Markdown including all the code, we will create two:
- analysis.R (which processes the data)
- report.Rmd (a template which will receive plot the data from analysis.R)
To knit to a pdf, you also need to have a version of LaTeX installed on your computer. If not, you can do so with:
install.packages('tinytex')
tinytex::install_tinytex()
Configuring the R Markdown file
The YAML header of our markdown file is like any other, but with the addition of a params
section. We must explicitly state every parameter that we want to pass to this template along with a default value. We can then access these values from within embedded R code using the params
object, which is a named list of all of the parameters. We can also use this to specify a custom title for our report, here using # r params$color
inside the title.
# ---
# title: "Diamond colour report: `r params$color`"
# author: "Daniel Greenwood"
# output: pdf_document
# date: "`r Sys.Date()`"
# params:
# color: 'test'
# data: NULL
# ---
#
# ```{r setup, include = FALSE}
# library(tidyverse)
# ```
Configuring the output
Next, we use the standard R Markdown format to specify the output we want to produce. Here I’ve made three code chunks to display the average price of that colour of diamond, a table of available cuts and a plot of price vs weight, with diamonds of this colour highlighted in red. Note that we get the value of color
using the code params$color
.
# Summary data from the `diamonds` dataset.
#
# ### Average price
#
# ```{r echo=FALSE, paged.print=TRUE}
# params$data %>%
# filter(color == params$color) %>%
# pull(price) %>%
# mean(na.rm = T)
# ```
#
# ### Table of cuts frequency
#
# ```{r echo=FALSE, paged.print=TRUE}
# params$data %>%
# filter(color == params$color) %>%
# pull(cut) %>%
# table
# ```
# ```
#
# ### Plot of price vs weight for this colour vs others
#
# ```{r echo=FALSE, fig.height=3, paged.print=TRUE}
# params$data %>%
# ggplot() +
# aes(carat, price, colour = color == params$color) %>%
# geom_point(size = 0.25) +
# scale_colour_manual(values = c('gray50', 'tomato'))
# ```
Rendering reports from R
Now we create the script that processes the data and passes it to the template. You can do this from any regular R script as part of a larger analysis, but here we will just make a minimal example.
# Load dependencies
library(tidyverse)
# Load the data set
data(diamonds)
Finally, we use the function rmarkdown::render in a loop to render the template file using each unique value of color.
for(color in unique(diamonds$color)){
# Render the R Markdown template for each diamond color
rmarkdown::render(input = "report.Rmd", # The R Markdown template
output_file = glue::glue("reports/diamond_colour_{color}.pdf"), # Dynamic file name
params = list(
data = diamonds, # Pass the data set
color = color # Pass the current color
)
)
}
Here we specify three arguments for the render function:
input
: The name of theR Markdown
template fileoutput_file
: The name of the output file. Using theglue
function, we dynamically create the file name using the actual value of colorparams
: A named list that includes the objects passed to the template file. This names must match exactly those that we use in the template.
The result
Let’s open up one of the pdf reports:
As always, I love to hear feedback. If you found this useful or have any comments or suggestions, feel free to get in touch with the links in the footer. I am always on the lookout for suggestions for future tutorials or posts.