Reproducible reporting with Quarto

Lecture 15

Dr. Benjamin Soltoff

Cornell University
INFO 5001 - Fall 2025

October 21, 2025

Announcements

Announcements

TODO

Learning objectives

  • Review the importance of reproducibility in scientific research
  • Identify the major components of Quarto
  • Define Quarto code cells
  • Implement cell options to customize output
  • Render different Quarto formats
  • Distinguish between R scripts (.R) and Quarto documents (.qmd)

Application exercise

ae-13

Note

  • Go to the course GitHub org and find your ae-13 (repo name will be suffixed with your GitHub name).
  • Clone the repo in Positron, run renv::restore() to install the required packages, open the Quarto document in the repo, and follow along and complete the exercises.
  • Render, commit, and push your edits by the AE deadline – end of the day

Quarto

Quarto basics

---
title: Gun deaths
author: Your name
date: today
format: html
---

```{r}
#| label: setup
#| include: false
library(tidyverse)
library(rcis)

youth <- gun_deaths |>
  filter(age <= 65)
```

# Gun deaths by age

We have data about `r nrow(gun_deaths)` individuals killed by guns. Only `r nrow(gun_deaths) - nrow(youth)` are older than 65. The distribution of the remainder is shown below:

```{r}
#| label: youth-dist
#| echo: false
ggplot(data = youth, mapping = aes(x = age)) + 
  geom_freqpoly(binwidth = 1)
```

# Gun deaths by race

```{r}
#| label: race-dist
youth |>
  mutate(race = fct_infreq(race) |> fct_rev()) |>
  ggplot(mapping = aes(y = race)) +
  geom_bar() +
  labs(y = "Victim race")
```

Major components

  1. A YAML header surrounded by ---s
  2. Cells of code surounded by ```
  3. Text mixed with simple text formatting using the Markdown syntax

Quarto code cells

Rendering process

A schematic representing rendering of Quarto documents from .qmd, to knitr or jupyter, to plain text markdown, then converted by pandoc into any number of output types including html, PDF, or Word document.

Rendering process

A schematic representing the multi-language input (e.g. Python, R, Observable, Julia) and multi-format output (e.g. PDF, html, Word documents, and more) versatility of Quarto.

⌨️ Add Markdown content

Instructions

  • Render gun-deaths.qmd as an HTML document
  • Add text describing the frequency polygon
05:00

Code cell options

```{r}
#| label: youth-dist
#| message: false
#| warning: false

# code goes here
```
  • Naming code cells
  • Code cell options
  • eval: true
  • include: true
  • echo: true
  • message: true or warning: true
  • cache: false

Caching with dependencies

```{r}
#| cache: true
scdb_case <- read_csv("data/scdb-case.csv") |>
  filter(term >= 1945)
```
```{r}
#| cache: true
scdb_clean <- scdb_case |> 
  mutate(one_vote = majVotes - minVotes == 1)
scdb_clean
```
# A tibble: 9,299 × 53
   caseId   docketId   caseIssuesId dateDecision decisionType usCite sctCite ledCite lexisCite  term
   <chr>    <chr>      <chr>        <chr>               <dbl> <chr>  <chr>   <chr>   <chr>     <dbl>
 1 1945-001 1945-001-… 1945-001-01… 12/10/1945              1 326 U… 66 S. … 90 L. … 1945 U.S…  1945
 2 1945-002 1945-002-… 1945-002-01… 12/3/1945               1 326 U… 66 S. … 90 L. … 1945 U.S…  1945
 3 1945-003 1945-003-… 1945-003-01… 11/13/1945              1 326 U… 66 S. … 90 L. … 1945 U.S…  1945
 4 1945-004 1945-004-… 1945-004-01… 11/13/1945              1 326 U… 66 S. … 90 L. … 1945 U.S…  1945
 5 1945-005 1945-005-… 1945-005-01… 11/5/1945               1 326 U… 66 S. … 90 L. … 1945 U.S…  1945
 6 1945-006 1945-006-… 1945-006-01… 11/5/1945               1 326 U… 66 S. … 90 L. … 1945 U.S…  1945
 7 1945-007 1945-007-… 1945-007-01… 11/5/1945               2 326 U… 66 S. … 90 L. … 1945 U.S…  1945
 8 1945-008 1945-008-… 1945-008-01… 11/5/1945               1 326 U… 66 S. … 90 L. … 1945 U.S…  1945
 9 1945-009 1945-009-… 1945-009-01… 11/5/1945               1 326 U… 66 S. … 90 L. … 1945 U.S…  1945
10 1945-010 1945-010-… 1945-010-01… 12/10/1945              1 326 U… 66 S. … 90 L. … 1945 U.S…  1945
# ℹ 9,289 more rows
# ℹ 43 more variables: naturalCourt <dbl>, chief <chr>, docket <chr>, caseName <chr>,
#   dateArgument <chr>, dateRearg <chr>, petitioner <dbl>, petitionerState <dbl>, respondent <dbl>,
#   respondentState <dbl>, jurisdiction <dbl>, adminAction <dbl>, adminActionState <dbl>,
#   threeJudgeFdc <dbl>, caseOrigin <dbl>, caseOriginState <dbl>, caseSource <dbl>,
#   caseSourceState <dbl>, lcDisagreement <dbl>, certReason <dbl>, lcDisposition <dbl>,
#   lcDispositionDirection <dbl>, declarationUncon <dbl>, caseDisposition <dbl>, …
```{r}
#| cache: true
scdb_case <- read_csv("data/scdb-case.csv")
```
```{r}
#| cache: true
scdb_clean <- scdb_case |> 
  mutate(one_vote = majVotes - minVotes == 1)
scdb_clean
```
# A tibble: 9,299 × 53
   caseId   docketId   caseIssuesId dateDecision decisionType usCite sctCite ledCite lexisCite  term
   <chr>    <chr>      <chr>        <chr>               <dbl> <chr>  <chr>   <chr>   <chr>     <dbl>
 1 1945-001 1945-001-… 1945-001-01… 12/10/1945              1 326 U… 66 S. … 90 L. … 1945 U.S…  1945
 2 1945-002 1945-002-… 1945-002-01… 12/3/1945               1 326 U… 66 S. … 90 L. … 1945 U.S…  1945
 3 1945-003 1945-003-… 1945-003-01… 11/13/1945              1 326 U… 66 S. … 90 L. … 1945 U.S…  1945
 4 1945-004 1945-004-… 1945-004-01… 11/13/1945              1 326 U… 66 S. … 90 L. … 1945 U.S…  1945
 5 1945-005 1945-005-… 1945-005-01… 11/5/1945               1 326 U… 66 S. … 90 L. … 1945 U.S…  1945
 6 1945-006 1945-006-… 1945-006-01… 11/5/1945               1 326 U… 66 S. … 90 L. … 1945 U.S…  1945
 7 1945-007 1945-007-… 1945-007-01… 11/5/1945               2 326 U… 66 S. … 90 L. … 1945 U.S…  1945
 8 1945-008 1945-008-… 1945-008-01… 11/5/1945               1 326 U… 66 S. … 90 L. … 1945 U.S…  1945
 9 1945-009 1945-009-… 1945-009-01… 11/5/1945               1 326 U… 66 S. … 90 L. … 1945 U.S…  1945
10 1945-010 1945-010-… 1945-010-01… 12/10/1945              1 326 U… 66 S. … 90 L. … 1945 U.S…  1945
# ℹ 9,289 more rows
# ℹ 43 more variables: naturalCourt <dbl>, chief <chr>, docket <chr>, caseName <chr>,
#   dateArgument <chr>, dateRearg <chr>, petitioner <dbl>, petitionerState <dbl>, respondent <dbl>,
#   respondentState <dbl>, jurisdiction <dbl>, adminAction <dbl>, adminActionState <dbl>,
#   threeJudgeFdc <dbl>, caseOrigin <dbl>, caseOriginState <dbl>, caseSource <dbl>,
#   caseSourceState <dbl>, lcDisagreement <dbl>, certReason <dbl>, lcDisposition <dbl>,
#   lcDispositionDirection <dbl>, declarationUncon <dbl>, caseDisposition <dbl>, …

Label your cells

```{r}
#| label: raw-data-cache
#| cache: true
scdb_case <- read_csv("data/scdb-case.csv")
```
```{r}
#| label: processed-data-cache
#| cache: true
#| dependson: raw-data-cache
scdb_clean <- scdb_case |> 
  mutate(one_vote = majVotes - minVotes == 1)
scdb_clean
```
# A tibble: 29,021 × 53
   caseId   docketId   caseIssuesId dateDecision decisionType usCite sctCite ledCite lexisCite  term
   <chr>    <chr>      <chr>        <chr>               <dbl> <chr>  <chr>   <chr>   <chr>     <dbl>
 1 1791-001 1791-001-… 1791-001-01… 8/3/1791                6 2 U.S… <NA>    1 L. E… 1791 U.S…  1791
 2 1791-002 1791-002-… 1791-002-01… 8/3/1791                2 2 U.S… <NA>    1 L. E… 1791 U.S…  1791
 3 1792-001 1792-001-… 1792-001-01… 2/14/1792               2 2 U.S… <NA>    1 L. E… 1792 U.S…  1792
 4 1792-002 1792-002-… 1792-002-01… 8/7/1792                2 2 U.S… <NA>    1 L. E… 1792 U.S…  1792
 5 1792-003 1792-003-… 1792-003-01… 8/11/1792               8 2 U.S… <NA>    1 L. E… 1792 U.S…  1792
 6 1792-004 1792-004-… 1792-004-01… 8/11/1792               6 2 U.S… <NA>    1 L. E… 1792 U.S…  1792
 7 1793-001 1793-001-… 1793-001-01… 2/19/1793               8 2 U.S… <NA>    1 L. E… 1793 U.S…  1793
 8 1793-002 1793-002-… 1793-002-01… 2/20/1793               2 2 U.S… <NA>    1 L. E… 1793 U.S…  1793
 9 1793-003 1793-003-… 1793-003-01… 2/20/1793               8 2 U.S… <NA>    1 L. E… 1793 U.S…  1793
10 1794-001 1794-001-… 1794-001-01… 2/7/1794               NA 3 U.S… <NA>    1 L. E… 1794 U.S…  1794
# ℹ 29,011 more rows
# ℹ 43 more variables: naturalCourt <dbl>, chief <chr>, docket <chr>, caseName <chr>,
#   dateArgument <chr>, dateRearg <chr>, petitioner <dbl>, petitionerState <dbl>, respondent <dbl>,
#   respondentState <dbl>, jurisdiction <dbl>, adminAction <dbl>, adminActionState <dbl>,
#   threeJudgeFdc <dbl>, caseOrigin <dbl>, caseOriginState <dbl>, caseSource <dbl>,
#   caseSourceState <dbl>, lcDisagreement <dbl>, certReason <dbl>, lcDisposition <dbl>,
#   lcDispositionDirection <dbl>, declarationUncon <dbl>, caseDisposition <dbl>, …

Caching guidelines

  • Label your code cells
  • Define dependencies
  • Never cache cells that load packages

Inline code

We have data about `{r} nrow(gun_deaths)` individuals killed by guns.

Only `{r} nrow(gun_deaths) - nrow(youth)` are older than 65. The distribution of the remainder is shown below:

We have data about 100798 individuals killed by guns.

Only 15687 are older than 65.

⌨️ Customize code cells

Instructions

  • Set echo: false for each code cell
  • Adjust the figure height and width options for the code cells with plots
  • Enable caching for each cell and render the document. Look at the file structure for the cache. What do you see?
07:00

YAML header

YAML header

---
title: Gun deaths
author: Benjamin Soltoff
date: today
format: html
---
  • YAML Ain’t Markup Language
  • Standardized format for storing hierarchical data in a human-readable syntax
  • Defines how Quarto renders your .qmd file

HTML document

---
title: Gun deaths
author: Benjamin Soltoff
date: today
format: html
---

Table of contents

---
title: Gun deaths
author: Benjamin Soltoff
date: today
format:
  html:
    toc: true
    toc-depth: 2
---

Appearance and style

---
title: Gun deaths
author: Benjamin Soltoff
date: today
format:
  html:
    theme: superhero
    highlight-style: github
---

Global options

---
title: "My Document"
format:
  html:
    fig-width: 7
  typst:
    fig-width: 5
execute:
  echo: true
  message: false
knitr:
  opts_chunk: 
    comment: "#>" 
---

⌨️ Revise the YAML header

Instructions

  • Add a table of contents
  • Use themes for light and dark mode
  • Set relevant code cell options globally
07:00

Other output formats

PDF document

---
title: Gun deaths
author: Benjamin Soltoff
date: today
format: typst
---

Presentation

---
title: Gun deaths
author: Benjamin Soltoff
date: today
format: revealjs
---

Quarto supports multiple presentation formats

  • revealjs (HTML)
  • pptx (PowerPoint)
  • beamer (\(\LaTeX\)/PDF)

Additional Quarto

Use the documentation to learn how to implement these formats

R scripts

# gun-deaths.R
# 2024-10-29
# Examine the distribution of age of victims in gun_deaths

# load packages
library(tidyverse)
library(rcis)

# filter data for under 65
youth <- gun_deaths |>
  filter(age <= 65)

# number of individuals under 65 killed
nrow(gun_deaths) - nrow(youth)

# graph the distribution of youth
ggplot(data = youth, mapping = aes(x = age)) +
  geom_freqpoly(binwidth = 1)

# graph the distribution of youth, by race
youth |>
  mutate(race = fct_infreq(race) |> fct_rev()) |>
  ggplot(mapping = aes(y = race)) +
  geom_bar() +
  labs(y = "Victim race")

When to use a script

  • For troubleshooting
  • Initial stages of project
  • Building a reproducible pipeline
  • Shared functions
  • It depends

Running scripts

  • Interactively
  • Programmatically using source()

Wrap up

Recap

  • Quarto is an open-source, reproducible document system
  • Compatible with R, Python, Julia, Observable, and more
  • Supports multiple output formats