The grammar of graphics

Lecture 2

Dr. Benjamin Soltoff

Cornell University
INFO 5001 - Fall 2024

August 29, 2024

Announcements

Announcements

  • Application exercises 00 are ungraded
  • If you cannot access RStudio Workbench yet, let me know
  • Lab on Friday
  • No class on Tuesday

Warm up

Examining data visualization

Discuss the following for the visualization.

  • What is the visualization trying to show?

  • What is effective, i.e. what is done well?

  • What is ineffective, i.e. what could be improved?

  • What are you curious about after looking at the visualization?

03:00

Source: Twitter

Application exercise

The Bechdel test

A measurement of representation of women in film

In order to pass the test, a movie must have:

  1. At least two named women in it
  2. Who talk to each other
  3. About something besides a man

The Bechdel test

ae-00

ae-00-bechdel

  • Go to ae-00-bechdel and clone the repo in RStudio.
  • Open and render the Quarto document ae-00-bechdel-viz.qmd, review the document, and fill in the blanks.

Warning

ae-00-bechdel is hosted on GitHub.com because we have not configured your authentication method for Cornell’s GitHub. We will do this tomorrow in lab.

Recap of AE

  • Construct plots with ggplot().
  • Components of ggplots are separated by +s.
  • The formula is (almost) always as follows:
ggplot(DATA, aes(x = X - VAR, y = Y - VAR, ...)) +
  geom_XXX()
  • Aesthetic attributes of a geometries (color, size, transparency, etc.) can be mapped to variables in the data or set by the user, e.g. color = binary vs. color = "pink".
  • Use facet_wrap() when faceting (creating small multiples) by one variable and facet_grid() when faceting by two variables.

Recreate the Bechdel Test chart

Code
# load packages
library(tidyverse)
library(scales)
library(httr2)
library(colorspace)
Code
# generate request
bechdel_req <- request("http://bechdeltest.com/api/v1") |>
  req_url_path_append("/getAllMovies")

# perform request
bechdel_resp <- bechdel_req |>
  req_perform()
Code
# extract response and convert to tibble
bechdel_df <- bechdel_resp |>
  resp_body_json() |>
  enframe(name = ".id", value = "result") |>
  # tidy into one row per film
  unnest_wider(col = result) |>
  # convert rating to factor column
  mutate(rating = factor(
    x = rating,
    levels = 0:3,
    labels = c(
      "Fewer than two women",
      "Women don't talk to each other",
      "Women only talk about men",
      "Passes Bechdel test"
    ) |>
      # ensure labels are wrapped for the plot
      str_wrap(width = 18)
  ))

# summarized values for chart
bechdel_pct <- bechdel_df |>
  count(year, rating) |>
  complete(year, rating, fill = list(n = 0)) |>
  mutate(n_pct = n / sum(n), .by = year) |>
  filter(between(year, 1970, 2023))

# labels for plot
bechdel_labels <- bechdel_pct |>
  filter(year == max(year)) |>
  # calculate midpoint for each category to center on last year
  mutate(midpoint = rev(cumsum(rev(n_pct))) - n_pct + (n_pct / 2))

# initiate ggplot object
ggplot(data = bechdel_pct, mapping = aes(x = year, y = n_pct)) +
  # area chart for change in percentages over time
  geom_area(mapping = aes(fill = rating), color = "white") +
  # add text labels directly to the chart
  geom_text(
    data = bechdel_labels,
    mapping = aes(y = midpoint, label = rating, color = rating),
    family = "Atkinson Hyperlegible",
    size = 2.5,
    hjust = 0,
    nudge_x = 1
  ) +
  # add vertical lines to visually separate decades
  geom_vline(
    xintercept = seq(from = 1970, to = 2020, by = 10), color = "white",
    linewidth = 0.25
  ) +
  # percentage labels for y axis
  scale_y_continuous(labels = label_percent()) +
  # manual color palette
  scale_fill_manual(
    values = c(darken("#21918c", amount = 0.3), "#21918c", lighten("#21918c", amount = 0.3), "#440154"),
    aesthetics = c("fill", "color"),
    guide = "none"
  ) +
  # label the chart
  labs(
    title = "The Bechdel Test over time",
    subtitle = "How women are represented in movies",
    x = NULL,
    y = NULL,
    caption = "Source: bechdeltest.com; FiveThirtyEight"
  ) +
  # allow chart contents outside of panel
  coord_cartesian(clip = "off") +
  # different base theme and font
  theme_minimal(base_family = "Atkinson Hyperlegible") +
  # customize theme elements
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    axis.ticks.length.y = unit(1, units = "cm"),
    plot.title.position = "plot",
    plot.margin = margin(r = 50, l = 5, t = 5, b = 5)
  )