AE 13: Debugging R code

Packages

We will use the following packages in this application exercise.

tidyverse: For data import, wrangling, and visualization.
babynames: For working with the Social Security Administration’s baby names data.

library(tidyverse)
library(babynames)

Popular baby names

The Social Security Administration maintains detailed historic records on every child born in the United States dating back to 1880.¹ The data is published annually at the national and state-levels, and includes every name excluding those with fewer than 5 occurrences. The babynames package provides a convenient interface to this data. ²

¹ Source: SSN

² The package is no longer actively maintained. We will use a forked version of the package which has been submitted as a pending pull request.

The instructor has written code to analyze the popularity of baby names in the United States. Alas, the code is not working correctly. Your task is to debug the code and fix any issues. Pay attention to any errors, warnings, or messages generated by the code. It is possible for the code to run without any notification from R, but still produce incorrect output.

Capture and solve a reprex

Your turn: Copy the reprex from the course discussion board. Turn this into clean code using reprex::reprex_clean() and add it to the code chunk below. Then debug the code, fix any issues, and post the corrected code on the discussion board using a reprex.

library(tidyverse)
library(babynames)

name_trend <- function(person_name) {
  babynames |>
    filter(name == person_name) |>
    ggplot(mapping = aes(x = year, y = n, color = sex)) +
    geom_line() +
    scale_color_brewer(type = "qual") +
    labs(
      title = str_glue("Name: {person_name}"),
      x = "Year",
      y = "Number of births",
      color = NULL
    ) +
    theme_minimal()
}

name_trend("Benjamin")

1: Put Name: {person_name} in parentheses

Compare naming trends to Disney princesses around film release years

Your turn: Fix the code below to show the popularity over time of baby names that are shared with Disney princesses.

# create data frame of disney princess films
disney <- tribble(
  ~"princess", ~"film", ~"release_year",
  "Snow White", "Snow White and the Seven Dwarfs", 1937,
  "Cinderella", "Cinderella", 1950,
  "Aurora", "Sleeping Beauty", 1959,
  "Ariel", "The Little Mermaid", 1989,
  "Belle", "Beauty and the Beast", 1991,
  "Jasmine", "Aladdin", 1992,
  "Pocahontas", "Pocahontas", 1995,
  "Mulan", "Mulan", 1998,
  "Tiana", "The Princess and the Frog", 2009,
  "Rapunzel", "Tangled", 2010,
  "Merida", "Brave", 2012,
  "Elsa", "Frozen", 2013,
  "Moana", "Moana", 2016,
  "Raya", "Raya and the Last Dragon", 2021
)

# join together the data frames
babynames |>
  # ignore men named after princesses - is this fair?
  filter(sex == "F") |>
  inner_join(disney, by = c("name" = "princess")) |>
  mutate(name = fct_reorder(.f = name, .x = release_year)) |>
  # plot the trends over time, indicating release year
  ggplot(mapping = aes(x = year, y = n)) +
  facet_wrap(facets = vars(name, film), scales = "free_y", labeller = "label_both") +
  geom_line() +
  geom_vline(mapping = aes(xintercept = release_year), linetype = 2, alpha = .5) +
  scale_x_continuous(breaks = c(1880, 1940, 2000)) +
  theme_minimal() +
  labs(
    title = "Popularity of Disney princess names",
    x = "Year",
    y = "Number of births"
  )

1: Fix column name syntax in tribble()
2: Fix filter(sex == F) to filter(sex == "F")
3: Fix label_both() to "label_both" or label_both
4: Probably not. Some of these names are unisex.

Write a function to show trends over time for the top N names in a specific year

Your turn: Modify the function so that if n_rank is greater than 6, it is capped at 6 and a message is displayed alerting the user to this change.

top_n_trend <- function(n_year, n_rank = 5) {
  # check n_rank to verify it is small enough
  # if not small enough, cap at 6 and warn user
  if (n_rank > 6) {
    n_rank <- 6
    message("Maximum palette size is 12 (6 per gender). Reducing n_rank to 6.")
  }

  # create lookup table
  top_names <- babynames |>
    summarize(count = as.numeric(sum(n)), .by = c(name, sex)) |>
    filter(count > 1000) |>
    select(name, sex)

  # filter babynames for top_names
  filtered_names <- babynames |>
    inner_join(top_names, by = join_by(sex, name))

  # get the top N names from n_year
  top_names <- filtered_names |>
    filter(year == n_year) |>
    summarize(count = sum(n), .by = c(name, sex)) |>
    group_by(sex) |>
    mutate(rank = min_rank(desc(count))) |>
    filter(rank <= n_rank) |>
    arrange(sex, rank) |>
    select(name, sex, rank)

  # keep just the top N names over time and plot
  filtered_names |>
    inner_join(select(top_names, sex, name), by = join_by(sex, name)) |>
    ggplot(mapping = aes(x = year, y = n, color = name)) +
    facet_wrap(facets = vars(sex), ncol = 1) +
    geom_line() +
    scale_color_brewer(type = "qual", palette = "Set3") +
    labs(
      title = str_glue("Most Popular Names of {n_year}"),
      x = "Year",
      y = "Number of births",
      color = "Name"
    ) +
    theme_minimal()
}

top_n_trend(n_year = 1986)

1: Cap n_rank at 6 due to maximum palette size

top_n_trend(n_year = 2014)

top_n_trend(n_year = 1986, n_rank = 10)

Maximum palette size is 12 (6 per gender). Reducing n_rank to 6.

Session information

sessioninfo::session_info()

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.4.1 (2024-06-14)
 os       macOS Sonoma 14.6.1
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2024-10-23
 pandoc   3.4 @ /usr/local/bin/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 ! package      * version    date (UTC) lib source
   babynames    * 1.1.1      2024-10-21 [1] Github (frank113/babynames@c56c8f9)
   cli            3.6.3      2024-06-21 [1] RSPM (R 4.4.0)
 P colorspace     2.1-0      2023-01-23 [?] CRAN (R 4.3.0)
 P digest         0.6.35     2024-03-11 [?] CRAN (R 4.3.1)
 P dplyr        * 1.1.4      2023-11-17 [?] CRAN (R 4.3.1)
 P evaluate       0.24.0     2024-06-10 [?] CRAN (R 4.4.0)
 P fansi          1.0.6      2023-12-08 [?] CRAN (R 4.3.1)
 P farver         2.1.2      2024-05-13 [?] CRAN (R 4.3.3)
 P fastmap        1.2.0      2024-05-15 [?] CRAN (R 4.4.0)
 P forcats      * 1.0.0      2023-01-29 [?] CRAN (R 4.3.0)
 P generics       0.1.3      2022-07-05 [?] CRAN (R 4.3.0)
 P ggplot2      * 3.5.1      2024-04-23 [?] CRAN (R 4.3.1)
   glue           1.8.0      2024-09-30 [1] RSPM (R 4.4.0)
 P gtable         0.3.5      2024-04-22 [?] CRAN (R 4.3.1)
 P here           1.0.1      2020-12-13 [?] CRAN (R 4.3.0)
 P hms            1.1.3      2023-03-21 [?] CRAN (R 4.3.0)
 P htmltools      0.5.8.1    2024-04-04 [?] CRAN (R 4.3.1)
 P htmlwidgets    1.6.4      2023-12-06 [?] CRAN (R 4.3.1)
 P jsonlite       1.8.8      2023-12-04 [?] CRAN (R 4.3.1)
 P knitr          1.47       2024-05-29 [?] CRAN (R 4.4.0)
 P labeling       0.4.3      2023-08-29 [?] CRAN (R 4.3.0)
 P lifecycle      1.0.4      2023-11-07 [?] CRAN (R 4.3.1)
 P lubridate    * 1.9.3      2023-09-27 [?] CRAN (R 4.3.1)
 P magrittr       2.0.3      2022-03-30 [?] CRAN (R 4.3.0)
 P munsell        0.5.1      2024-04-01 [?] CRAN (R 4.3.1)
 P pillar         1.9.0      2023-03-22 [?] CRAN (R 4.3.0)
 P pkgconfig      2.0.3      2019-09-22 [?] CRAN (R 4.3.0)
 P purrr        * 1.0.2      2023-08-10 [?] CRAN (R 4.3.0)
 P R6             2.5.1      2021-08-19 [?] CRAN (R 4.3.0)
 P RColorBrewer   1.1-3      2022-04-03 [?] CRAN (R 4.3.0)
 P readr        * 2.1.5      2024-01-10 [?] CRAN (R 4.3.1)
   renv           1.0.7      2024-04-11 [1] CRAN (R 4.4.0)
 P rlang          1.1.4      2024-06-04 [?] CRAN (R 4.3.3)
 P rmarkdown      2.27       2024-05-17 [?] CRAN (R 4.4.0)
 P rprojroot      2.0.4      2023-11-05 [?] CRAN (R 4.3.1)
 P rstudioapi     0.16.0     2024-03-24 [?] CRAN (R 4.3.1)
 P scales         1.3.0.9000 2024-05-07 [?] Github (r-lib/scales@c0f79d3)
 P sessioninfo    1.2.2      2021-12-06 [?] CRAN (R 4.3.0)
 P stringi        1.8.4      2024-05-06 [?] CRAN (R 4.3.1)
 P stringr      * 1.5.1      2023-11-14 [?] CRAN (R 4.3.1)
 P tibble       * 3.2.1      2023-03-20 [?] CRAN (R 4.3.0)
 P tidyr        * 1.3.1      2024-01-24 [?] CRAN (R 4.3.1)
 P tidyselect     1.2.1      2024-03-11 [?] CRAN (R 4.3.1)
 P tidyverse    * 2.0.0      2023-02-22 [?] CRAN (R 4.3.0)
 P timechange     0.3.0      2024-01-18 [?] CRAN (R 4.3.1)
 P tzdb           0.4.0      2023-05-12 [?] CRAN (R 4.3.0)
 P utf8           1.2.4      2023-10-22 [?] CRAN (R 4.3.1)
 P vctrs          0.6.5      2023-12-01 [?] CRAN (R 4.3.1)
   withr          3.0.1      2024-07-31 [1] RSPM (R 4.4.0)
 P xfun           0.45       2024-06-16 [?] CRAN (R 4.4.0)
 P yaml           2.3.8      2023-12-11 [?] CRAN (R 4.3.1)

 [1] /Users/soltoffbc/Projects/info-5001/course-site/renv/library/macos/R-4.4/aarch64-apple-darwin20
 [2] /Users/soltoffbc/Library/Caches/org.R-project.R/R/renv/sandbox/macos/R-4.4/aarch64-apple-darwin20/f7156815

 P ── Loaded and on-disk path mismatch.

──────────────────────────────────────────────────────────────────────────────