Accessible data visualizations

Suggested answers

Application exercise
Answers
Modified

November 21, 2024

Import nursing data

nurses <- read_csv("data/nurses.csv") |> janitor::clean_names()
Rows: 1242 Columns: 22
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (1): State
dbl (21): Year, Total Employed RN, Employed Standard Error (%), Hourly Wage ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# subset to three states
nurses_subset <- nurses |>
  filter(state %in% c("California", "New York", "North Carolina"))

# unemployment data
unemp_state <- read_excel(
  path = "data/emp-unemployment.xls",
  sheet = "States",
  skip = 5
) |>
  pivot_longer(
    cols = -c(Fips, Area),
    names_to = "Year",
    values_to = "unemp"
  ) |>
  rename(state = Area, year = Year) |>
  mutate(year = parse_number(year)) |>
  filter(state != "United States") |>
  # calculate mean unemp rate per state and year
  group_by(state, year) |>
  summarize(unemp_rate = mean(unemp, na.rm = TRUE))
`summarise()` has grouped output by 'state'. You can override using the
`.groups` argument.

Developing alternative text

Bar chart

Demonstration: The following code chunk demonstrates how to add alternative text to a bar chart. The alternative text is added to the chunk header using the fig-alt chunk option. The text is written in Markdown and can be as long as needed. Note that fig-cap is not the same as fig-alt.

```{r}
#| label: nurses-bar
#| fig-cap: "Total employed Registered Nurses"
#| fig-alt: "The figure is a bar chart titled 'Total employed Registered
#| Nurses' that displays the numbers of registered nurses in three states
#| (California, New York, and North Carolina) over a 20 year period, with data
#| recorded in three time points (2000, 2010, and 2020). In each state, the
#| numbers of registered nurses increase over time. The following numbers are
#| all approximate. California started off with 200K registered nurses in 2000,
#| 240K in 2010, and 300K in 2020. New York had 150K in 2000, 160K in 2010, and
#| 170K in 2020. Finally North Carolina had 60K in 2000, 90K in 2010, and 100K
#| in 2020."

nurses_subset |>
  filter(year %in% c(2000, 2010, 2020)) |>
  ggplot(aes(x = state, y = total_employed_rn, fill = factor(year))) +
  geom_col(position = "dodge") +
  scale_fill_viridis_d(option = "E", guide = guide_legend(position = "inside")) +
  scale_y_continuous(labels = label_number(scale = 1 / 1000, suffix = "K")) +
  labs(
    x = "State", y = "Number of Registered Nurses", fill = "Year",
    title = "Total employed Registered Nurses"
  ) +
  theme(
    legend.background = element_rect(fill = "white", color = "white"),
    legend.position.inside = c(0.85, 0.75)
  )
```

The figure is a bar chart titled 'Total employed Registered Nurses' that displays the numbers of registered nurses in three states (California, New York, and North Carolina) over a 20 year period, with data recorded in three time points (2000, 2010, and 2020). In each state, the numbers of registered nurses increase over time. The following numbers are all approximate. California started off with 200K registered nurses in 2000, 240K in 2010, and 300K in 2020. New York had 150K in 2000, 160K in 2010, and 170K in 2020. Finally North Carolina had 60K in 2000, 90K in 2010, and 100K in 2020.

Total employed Registered Nurses

Line chart

Your turn: Add alternative text to the following line chart.

Tip

Remember the major components of alt text:

  • CHART TYPE: It’s helpful for people with partial sight to know what chart type it is and gives context for understanding the rest of the visual.
  • TYPE OF DATA: What data is included in the chart? The x and y axis labels may help you figure this out.
  • REASON FOR INCLUDING CHART: Think about why you’re including this visual. What does it show that’s meaningful. There should be a point to every visual and you should tell people what to look for.
  • Link to data source: Don’t include this in your alt text, but it should be included somewhere in the surrounding text.
```{r}
#| label: nurses-line
#| fig-alt: 'The figure is titled "Annual median salary of Registered Nurses".
#| There are three lines on the plot: the top labelled California, the middle
#| New York, the bottom North Carolina. The vertical axis is labelled "Annual
#| median salary", beginning with $40K, up to $120K. The horizontal axis is
#| labelled "Year", beginning with couple years before 2000 up to 2020. The
#| following numbers are all approximate. In the graph, the California line
#| begins around $50K in 1998 and goes up to  $120K in 2020. The increase is
#| steady, except for stalling for about couple years between 2015 to 2017.
#| The New York line also starts around $50K, just below where the California
#| line starts, and steadily goes up to $90K. And the North Carolina line starts
#| around $40K and steadily goes up to $70K.'

nurses_subset |>
  ggplot(aes(x = year, y = annual_salary_median, color = state)) +
  geom_line(show.legend = FALSE) +
  geom_text(
    data = nurses_subset |> filter(year == max(year)),
    aes(label = state), hjust = 0, nudge_x = 1,
    show.legend = FALSE
  ) +
  scale_color_viridis_d(option = "C", end = 0.5) +
  scale_y_continuous(labels = label_currency(scale = 1 / 1000, suffix = "K")) +
  labs(
    x = "Year", y = "Annual median salary", color = "State",
    title = "Annual median salary of Registered Nurses"
  ) +
  coord_cartesian(clip = "off") +
  theme(
    plot.margin = margin(0.1, 0.9, 0.1, 0.1, "in")
  )
```

The figure is titled "Annual median salary of Registered Nurses". There are three lines on the plot: the top labelled California, the middle New York, the bottom North Carolina. The vertical axis is labelled "Annual median salary", beginning with $40K, up to $120K. The horizontal axis is labelled "Year", beginning with couple years before 2000 up to 2020. The following numbers are all approximate. In the graph, the California line begins around $50K in 1998 and goes up to  $120K in 2020. The increase is steady, except for stalling for about couple years between 2015 to 2017. The New York line also starts around $50K, just below where the California line starts, and steadily goes up to $90K. And the North Carolina line starts around $40K and steadily goes up to $70K.

Scatterplot

Your turn: Add alternative text to the following scatterplot.

```{r}
#| label: nurses-scatter
#| fig-alt: 'The figure is titled "Median hourly wage of Registered Nurses".
#| It is a scatter plot with points for each of the 50 U.S. states from 1998
#| to 2008. The horizontal axis is labeled "Unemployment rate", beginning
#| around 2% up to 14%. The horizontal axis is labelled "Median hourly wage",
#| beginning with amounts under $20 up to approximately $50. The pattern is
#| hard to discern but appears to show a positive correlation between the
#| variables. As unemployment rate increases the median hourly wage also
#| slightly increases. There is more variability in median hourly wage for
#| unemployment rates below 7%.'

nurses |>
  left_join(unemp_state) |>
  drop_na(unemp_rate) |>
  ggplot(aes(x = unemp_rate, y = hourly_wage_median)) +
  geom_point(size = 2, alpha = .5) +
  scale_x_continuous(labels = label_percent(scale = 1)) +
  scale_y_continuous(labels = label_currency()) +
  labs(
    x = "Unemployment rate", y = "Median hourly wage",
    title = "Median hourly wage of Registered Nurses (1998-2018)",
    subtitle = "By state"
  )
```
Joining with `by = join_by(state, year)`

The figure is titled "Median hourly wage of Registered Nurses". It is a scatter plot with points for each of the 50 U.S. states from 1998 to 2008. The horizontal axis is labeled "Unemployment rate", beginning around 2% up to 14%. The horizontal axis is labelled "Median hourly wage", beginning with amounts under $20 up to approximately $50. The pattern is hard to discern but appears to show a positive correlation between the variables. As unemployment rate increases the median hourly wage also slightly increases. There is more variability in median hourly wage for unemployment rates below 7%.

Acknowledgments

sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.4.1 (2024-06-14)
 os       macOS Sonoma 14.6.1
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2024-11-25
 pandoc   3.4 @ /usr/local/bin/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 ! package     * version    date (UTC) lib source
 P bit           4.0.5      2022-11-15 [?] CRAN (R 4.3.0)
 P bit64         4.0.5      2020-08-30 [?] CRAN (R 4.3.0)
 P cellranger    1.1.0      2016-07-27 [?] CRAN (R 4.3.0)
   cli           3.6.3      2024-06-21 [1] RSPM (R 4.4.0)
 P colorblindr * 0.1.0      2023-06-19 [?] Github (clauswilke/colorblindr@e6730be)
 P colorspace  * 2.1-0      2023-01-23 [?] CRAN (R 4.3.0)
 P crayon        1.5.3      2024-06-20 [?] CRAN (R 4.4.0)
 P digest        0.6.35     2024-03-11 [?] CRAN (R 4.3.1)
 P dplyr       * 1.1.4      2023-11-17 [?] CRAN (R 4.3.1)
 P evaluate      0.24.0     2024-06-10 [?] CRAN (R 4.4.0)
 P fansi         1.0.6      2023-12-08 [?] CRAN (R 4.3.1)
 P farver        2.1.2      2024-05-13 [?] CRAN (R 4.3.3)
 P fastmap       1.2.0      2024-05-15 [?] CRAN (R 4.4.0)
 P forcats     * 1.0.0      2023-01-29 [?] CRAN (R 4.3.0)
 P generics      0.1.3      2022-07-05 [?] CRAN (R 4.3.0)
 P ggplot2     * 3.5.1      2024-04-23 [?] CRAN (R 4.3.1)
   glue          1.8.0      2024-09-30 [1] RSPM (R 4.4.0)
 P gtable        0.3.5      2024-04-22 [?] CRAN (R 4.3.1)
 P here          1.0.1      2020-12-13 [?] CRAN (R 4.3.0)
 P hms           1.1.3      2023-03-21 [?] CRAN (R 4.3.0)
 P htmltools     0.5.8.1    2024-04-04 [?] CRAN (R 4.3.1)
 P htmlwidgets   1.6.4      2023-12-06 [?] CRAN (R 4.3.1)
 P janitor       2.2.0      2023-02-02 [?] CRAN (R 4.3.0)
 P jsonlite      1.8.8      2023-12-04 [?] CRAN (R 4.3.1)
 P knitr         1.47       2024-05-29 [?] CRAN (R 4.4.0)
 P labeling      0.4.3      2023-08-29 [?] CRAN (R 4.3.0)
 P lifecycle     1.0.4      2023-11-07 [?] CRAN (R 4.3.1)
 P lubridate   * 1.9.3      2023-09-27 [?] CRAN (R 4.3.1)
 P magrittr      2.0.3      2022-03-30 [?] CRAN (R 4.3.0)
 P munsell       0.5.1      2024-04-01 [?] CRAN (R 4.3.1)
 P pillar        1.9.0      2023-03-22 [?] CRAN (R 4.3.0)
 P pkgconfig     2.0.3      2019-09-22 [?] CRAN (R 4.3.0)
 P purrr       * 1.0.2      2023-08-10 [?] CRAN (R 4.3.0)
 P R6            2.5.1      2021-08-19 [?] CRAN (R 4.3.0)
 P readr       * 2.1.5      2024-01-10 [?] CRAN (R 4.3.1)
 P readxl      * 1.4.3      2023-07-06 [?] CRAN (R 4.3.0)
   renv          1.0.7      2024-04-11 [1] CRAN (R 4.4.0)
 P rlang         1.1.4      2024-06-04 [?] CRAN (R 4.3.3)
 P rmarkdown     2.27       2024-05-17 [?] CRAN (R 4.4.0)
 P rprojroot     2.0.4      2023-11-05 [?] CRAN (R 4.3.1)
 P rstudioapi    0.16.0     2024-03-24 [?] CRAN (R 4.3.1)
 P scales      * 1.3.0.9000 2024-05-07 [?] Github (r-lib/scales@c0f79d3)
 P sessioninfo   1.2.2      2021-12-06 [?] CRAN (R 4.3.0)
 P snakecase     0.11.1     2023-08-27 [?] CRAN (R 4.3.0)
 P stringi       1.8.4      2024-05-06 [?] CRAN (R 4.3.1)
 P stringr     * 1.5.1      2023-11-14 [?] CRAN (R 4.3.1)
 P tibble      * 3.2.1      2023-03-20 [?] CRAN (R 4.3.0)
 P tidyr       * 1.3.1      2024-01-24 [?] CRAN (R 4.3.1)
 P tidyselect    1.2.1      2024-03-11 [?] CRAN (R 4.3.1)
 P tidyverse   * 2.0.0      2023-02-22 [?] CRAN (R 4.3.0)
 P timechange    0.3.0      2024-01-18 [?] CRAN (R 4.3.1)
 P tzdb          0.4.0      2023-05-12 [?] CRAN (R 4.3.0)
 P utf8          1.2.4      2023-10-22 [?] CRAN (R 4.3.1)
 P vctrs         0.6.5      2023-12-01 [?] CRAN (R 4.3.1)
 P viridisLite   0.4.2      2023-05-02 [?] CRAN (R 4.3.0)
 P vroom         1.6.5      2023-12-05 [?] CRAN (R 4.3.1)
   withr         3.0.1      2024-07-31 [1] RSPM (R 4.4.0)
 P xfun          0.45       2024-06-16 [?] CRAN (R 4.4.0)
 P yaml          2.3.8      2023-12-11 [?] CRAN (R 4.3.1)

 [1] /Users/soltoffbc/Projects/info-5001/course-site/renv/library/macos/R-4.4/aarch64-apple-darwin20
 [2] /Users/soltoffbc/Library/Caches/org.R-project.R/R/renv/sandbox/macos/R-4.4/aarch64-apple-darwin20/f7156815

 P ── Loaded and on-disk path mismatch.

──────────────────────────────────────────────────────────────────────────────