library(tidyverse)
library(babynames)
AE 14: Debugging R code
Suggested answers
Packages
We will use the following packages in this application exercise.
- tidyverse: For data import, wrangling, and visualization.
- babynames: For working with the Social Security Administration’s baby names data.
Popular baby names
The Social Security Administration maintains detailed historic records on every child born in the United States dating back to 1880.1 The data is published annually at the national and state-levels, and includes every name excluding those with fewer than 5 occurrences. The babynames package provides a convenient interface to this data. 2
1 Source: SSN
2 The package is no longer actively maintained. We will use a forked version of the package which has been submitted as a pending pull request.
The instructor has written code to analyze the popularity of baby names in the United States. Alas, the code is not working correctly. Your task is to debug the code and fix any issues. Pay attention to any errors, warnings, or messages generated by the code. It is possible for the code to run without any notification from R, but still produce incorrect output.
Write a function to show trends over time for specific name
<- function(person_name) {
name_trend |>
babynames filter(name == person_name) |>
ggplot(mapping = aes(x = year, y = n, color = sex)) +
geom_line() +
scale_color_brewer(type = "qual") +
labs(
title = str_glue("Name: {person_name}"),
x = "Year",
y = "Number of births",
color = NULL
+
) theme_minimal()
}
name_trend("Benjamin")
- 1
-
Put
Name: {person_name}
in parentheses
Plot the total number of U.S. births over time using a stacked area chart
|>
applicants mutate(
sex = if_else(sex == "F", "Female", "Male"),
n_all = n_all / 1e06
|>
) ggplot(mapping = aes(x = year, y = n_all, fill = sex)) +
geom_area() +
scale_fill_brewer(type = "qual") +
labs(
title = "Total US births",
x = "Year",
y = "Millions",
fill = NULL,
caption = "Source: Social Security Administration"
+
) theme_minimal()
- 1
-
Use
geom_area()
Compare naming trends to Disney princesses around film release years
# create data frame of disney princess films
<- tribble(
disney ~"princess", ~"film", ~"release_year",
"Snow White", "Snow White and the Seven Dwarfs", 1937,
"Cinderella", "Cinderella", 1950,
"Aurora", "Sleeping Beauty", 1959,
"Ariel", "The Little Mermaid", 1989,
"Belle", "Beauty and the Beast", 1991,
"Jasmine", "Aladdin", 1992,
"Pocahontas", "Pocahontas", 1995,
"Mulan", "Mulan", 1998,
"Tiana", "The Princess and the Frog", 2009,
"Rapunzel", "Tangled", 2010,
"Merida", "Brave", 2012,
"Elsa", "Frozen", 2013,
"Moana", "Moana", 2016
)
# join together the data frames
|>
babynames # ignore men named after princesses - is this fair?
filter(sex == "F") |>
inner_join(disney, by = c("name" = "princess")) |>
mutate(name = fct_reorder(.f = name, .x = release_year)) |>
# plot the trends over time, indicating release year
ggplot(mapping = aes(x = year, y = n)) +
facet_wrap(facets = vars(name, film), scales = "free_y", labeller = "label_both") +
geom_line() +
geom_vline(mapping = aes(xintercept = release_year), linetype = 2, alpha = .5) +
scale_x_continuous(breaks = c(1880, 1940, 2000)) +
theme_minimal() +
labs(
title = "Popularity of Disney princess names",
x = "Year",
y = "Number of births"
)
- 1
-
Fix column name syntax in
tribble()
- 2
-
Fix
filter(sex == F)
tofilter(sex == "F")
- 3
-
Fix
label_both()
to"label_both"
orlabel_both
- 4
- Probably not. Some of these names are unisex.
Write a function to show trends over time for the top N names in a specific year
<- function(n_year, n_rank = 5) {
top_n_trend # check n_rank to verify it is small enough
# if not small enough, cap at 6 and warn user
if (n_rank > 6) {
<- 6
n_rank message("Maximum palette size is 12 (6 per gender). Reducing n_rank to 6.")
}
# create lookup table
<- babynames |>
top_names group_by(name, sex) |>
summarize(count = as.numeric(sum(n))) |>
filter(count > 1000) |>
select(name, sex)
# filter babynames for top_names
<- babynames |>
filtered_names inner_join(top_names)
# get the top N names from n_year
<- filtered_names |>
top_names filter(year == n_year) |>
group_by(name, sex) |>
summarize(count = sum(n)) |>
group_by(sex) |>
mutate(rank = min_rank(desc(count))) |>
filter(rank <= n_rank) |>
arrange(sex, rank) |>
select(name, sex, rank)
# keep just the top N names over time and plot
|>
filtered_names inner_join(select(top_names, sex, name)) |>
ggplot(mapping = aes(x = year, y = n, color = name)) +
facet_wrap(facets = vars(sex), ncol = 1) +
geom_line() +
scale_color_brewer(type = "qual", palette = "Set3") +
labs(
title = str_glue("Most Popular Names of {n_year}"),
x = "Year",
y = "Number of births",
color = "Name"
+
) theme_minimal()
}
top_n_trend(n_year = 1986)
- 1
-
Need to replace
count
withn
when it is used as a column name - 2
-
Cap
n_rank
at 6 due to maximum palette size
`summarise()` has grouped output by 'name'. You can override using the
`.groups` argument.
Joining with `by = join_by(sex, name)`
`summarise()` has grouped output by 'name'. You can override using the
`.groups` argument.
Joining with `by = join_by(sex, name)`
top_n_trend(n_year = 2014)
`summarise()` has grouped output by 'name'. You can override using the
`.groups` argument.
Joining with `by = join_by(sex, name)`
`summarise()` has grouped output by 'name'. You can override using the
`.groups` argument.
Joining with `by = join_by(sex, name)`
top_n_trend(n_year = 1986, n_rank = 10)
Maximum palette size is 12 (6 per gender). Reducing n_rank to 6.
`summarise()` has grouped output by 'name'. You can override using the `.groups` argument.Joining with `by = join_by(sex, name)``summarise()` has grouped output by 'name'. You can override using the `.groups` argument.Joining with `by = join_by(sex, name)`
::session_info() sessioninfo
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.3.1 (2023-06-16)
os macOS Ventura 13.5.2
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2023-10-25
pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
babynames * 1.1.1 2023-06-19 [1] Github (frank113/babynames@c56c8f9)
cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0)
colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0)
digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.0)
dplyr * 1.1.3 2023-09-03 [1] CRAN (R 4.3.0)
evaluate 0.22 2023-09-29 [1] CRAN (R 4.3.1)
fansi 1.0.5 2023-10-08 [1] CRAN (R 4.3.1)
farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.0)
fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.0)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0)
ggplot2 * 3.4.2 2023-04-03 [1] CRAN (R 4.3.0)
glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0)
gtable 0.3.3 2023-03-21 [1] CRAN (R 4.3.0)
here 1.0.1 2020-12-13 [1] CRAN (R 4.3.0)
hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.0)
htmltools 0.5.6.1 2023-10-06 [1] CRAN (R 4.3.1)
htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.0)
jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.3.0)
knitr 1.44 2023-09-11 [1] CRAN (R 4.3.0)
labeling 0.4.2 2020-10-20 [1] CRAN (R 4.3.0)
lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.0)
lubridate * 1.9.2 2023-02-10 [1] CRAN (R 4.3.0)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0)
pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0)
purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.3.0)
readr * 2.1.4 2023-02-10 [1] CRAN (R 4.3.0)
rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.0)
rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1)
rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.3.0)
rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.3.0)
scales 1.2.1 2022-08-20 [1] CRAN (R 4.3.0)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
stringi 1.7.12 2023-01-11 [1] CRAN (R 4.3.0)
stringr * 1.5.0 2022-12-02 [1] CRAN (R 4.3.0)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.3.0)
tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.0)
timechange 0.2.0 2023-01-11 [1] CRAN (R 4.3.0)
tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.0)
utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.0)
vctrs 0.6.4 2023-10-12 [1] CRAN (R 4.3.1)
withr 2.5.1 2023-09-26 [1] CRAN (R 4.3.1)
xfun 0.40 2023-08-09 [1] CRAN (R 4.3.0)
yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0)
[1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
──────────────────────────────────────────────────────────────────────────────