Improving data communication

Lecture 24

Dr. Benjamin Soltoff

Cornell University
INFO 5001 - Fall 2024

November 21, 2024

Announcements

Project drafts
Peer reviews in lab

Communicating risk with the general public

Flatten the curve

Why outbreaks like coronavirus spread exponentially, and how to “flatten the curve”

COVID-19 Dashboard

What do they all have in common?

They’re all graphics!

Accessible COVID-19 statistics tracker

Accessibility and screen readers

Alternative text

It is read by screen readers in place of images allowing the content and function of the image to be accessible to those with visual or certain cognitive disabilities.

It is displayed in place of the image in browsers if the image file is not loaded or when the user has chosen not to view images.

It provides a semantic meaning and description to images which can be read by search engines or be used to later determine the content of the image from page context alone.

Alt and surrounding text

"CHART TYPE of TYPE OF DATA where REASON FOR INCLUDING CHART`

+ Link to data source somewhere in the text

CHART TYPE: It’s helpful for people with partial sight to know what chart type it is and gives context for understanding the rest of the visual.
TYPE OF DATA: What data is included in the chart? The x and y axis labels may help you figure this out.
REASON FOR INCLUDING CHART: Think about why you’re including this visual. What does it show that’s meaningful. There should be a point to every visual and you should tell people what to look for.
Link to data source: Don’t include this in your alt text, but it should be included somewhere in the surrounding text.

Data

Registered nurses by state and year
Number of nurses, salaries, employment
Source: TidyTuesday

nurses <- read_csv("data/nurses.csv") |> janitor::clean_names()
glimpse(nurses)

Rows: 1,242
Columns: 22
$ state                                        <chr> "Alabama", "Alaska", "Ari…
$ year                                         <dbl> 2020, 2020, 2020, 2020, 2…
$ total_employed_rn                            <dbl> 48850, 6240, 55520, 25300…
$ employed_standard_error_percent              <dbl> 2.9, 13.0, 3.7, 4.2, 2.0,…
$ hourly_wage_avg                              <dbl> 28.96, 45.81, 38.64, 30.6…
$ hourly_wage_median                           <dbl> 28.19, 45.23, 37.98, 29.9…
$ annual_salary_avg                            <dbl> 60230, 95270, 80380, 6364…
$ annual_salary_median                         <dbl> 58630, 94070, 79010, 6233…
$ wage_salary_standard_error_percent           <dbl> 0.8, 1.4, 0.9, 1.4, 1.0, …
$ hourly_10th_percentile                       <dbl> 20.75, 31.50, 27.66, 21.4…
$ hourly_25th_percentile                       <dbl> 23.73, 36.94, 32.58, 25.7…
$ hourly_75th_percentile                       <dbl> 33.15, 53.31, 44.67, 35.4…
$ hourly_90th_percentile                       <dbl> 38.67, 60.70, 50.14, 39.6…
$ annual_10th_percentile                       <dbl> 43150, 65530, 57530, 4466…
$ annual_25th_percentile                       <dbl> 49360, 76830, 67760, 5349…
$ annual_75th_percentile                       <dbl> 68960, 110890, 92920, 736…
$ annual_90th_percentile                       <dbl> 80420, 126260, 104290, 82…
$ location_quotient                            <dbl> 1.20, 0.98, 0.91, 1.00, 0…
$ total_employed_national_aggregate            <dbl> 140019790, 140019790, 140…
$ total_employed_healthcare_national_aggregate <dbl> 8632190, 8632190, 8632190…
$ total_employed_healthcare_state_aggregate    <dbl> 128600, 17730, 171010, 80…
$ yearly_total_employed_state_aggregate        <dbl> 1903210, 296300, 2835110,…

Bar chart

Provide the title and axis labels
Briefly describe the chart and give a summary of any trends it displays
Convert bar charts to accessible tables or lists
Avoid describing visual attributes of the bars (e.g., dark blue, gray, yellow) unless there’s an explicit need to do so

Developing the alt text

Total employed registered nurses in three states over time.
Total employed registered nurses in California, New York, and North Carolina, in 2000, 2010, and 2020.
A bar chart of total employed registered nurses in California, New York, and North Carolina, in 2000, 2010, and 2020, showing increasing numbers of nurses over time.
The figure is a bar chart titled ‘Total employed Registered Nurses’ that displays the numbers of registered nurses in three states (California, New York, and North Carolina) over a 20 year period, with data recorded in three time points (2000, 2010, and 2020). In each state, the numbers of registered nurses increase over time. The following numbers are all approximate. California started off with 200K registered nurses in 2000, 240K in 2010, and 300K in 2020. New York had 150K in 2000, 160K in 2010, and 170K in 2020. Finally North Carolina had 60K in 2000, 90K in 2010, and 100K in 2020.

Adding alt text to plots

Short:

```{r}
#| fig-alt: Alt text goes here.

# code for plot goes here
```

Longer:

```{r}
#| fig-alt: |
#|   Longer alt text goes here. Make sure to add line breaks ~roughly
#|   80 characters.

# code for plot goes here
```

Developing the alt table

nurses_subset |>
  filter(year %in% c(2000, 2010, 2020)) |>
  arrange(year) |>
  select(state, year, total_employed_rn) |>
  pivot_wider(names_from = year, values_from = total_employed_rn) |>
  gt() |>
  fmt_number(
    columns = -state,
    decimals = 0
  ) |>
  cols_label(state = "State") |>
  tab_spanner(
    label = "Total employed registered nurses",
    columns = everything()
  ) |>
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_column_spanners()
  )

Total employed registered nurses
State	2000	2010	2020
California	203,390	240,030	307,060
New York	159,670	169,710	178,550
North Carolina	60,940	90,730	99,110

Application exercise

`ae-21`

Go to the course GitHub org and find your ae-21 (repo name will be suffixed with your GitHub name).
Clone the repo in RStudio, run renv::restore() to install the required packages, open the Quarto document in the repo, and follow along and complete the exercises.
Render, commit, and push your edits by the AE deadline – end of the day

Accessibility and colors

Color scales

Use colorblind friendly color scales (e.g., Okabe Ito, viridis)

nurses_subset |>
  ggplot(aes(x = year, y = hourly_wage_median, color = state)) +
  geom_line(size = 2) +
  colorblindr::scale_color_OkabeIto(guide = guide_legend(position = "inside")) +
  scale_y_continuous(labels = label_dollar()) +
  labs(
    x = "Year", y = "Median hourly wage", color = "State",
    title = "Median hourly wage of Registered Nurses"
  ) +
  theme(
    legend.position.inside = c(0.15, 0.75),
    legend.background = element_rect(fill = "white", color = "white")
  )

Testing for colorblind friendliness

Best way to test is with users (or collaborators) who have these color deficiencies
colorblindr::cvd_grid()
Simulation software also helps, e.g. Sim Daltonism for Mac and PC

Color contrast

Background and foreground text should have sufficient contrast to be distinguishable by users with different vision
Web app for checking color contrast checking: Color Contrast Analyser
An WIP R package for checking for color contrast: coloratio

Color contrast

cr_get_ratio("black", "white")

[1] 21

cr_get_ratio("black", "gray10")

Warning in cr_get_ratio("black", "gray10"): Aim for a value of 4.5 or higher.

[1] 1.206596

cr_get_ratio("red", "yellow")

Warning in cr_get_ratio("red", "yellow"): Aim for a value of 4.5 or higher.

[1] 3.723534

Double encoding

Use shape and color where possible

Use direct labeling

Prefer direct labeling where color is used to display information over a legend
Quicker to read
Ensures graph can be understood without reliance on color

Without direct labeling

With direct labeling

Use whitespace or pattern to separate elements

Separate elements with whitespace or pattern
Allows for distinguishing between data without entirely relying on contrast between colors

Without whitespace

With whitespace

Accessibility and fonts

Use a font that has been tested for accessibility (e.g., Atkinson Hyperlegible)
Keep plot labels and annotations similarly sized as the rest of your text (e.g., ggplot2::theme_set(ggplot2::theme_minimal(base_size = 16)))

Accessibility and fonts

nurses_subset |>
  ggplot(aes(x = year, y = hourly_wage_median, color = state)) +
  geom_line(size = 2) +
  colorblindr::scale_color_OkabeIto() +
  scale_y_continuous(labels = label_dollar()) +
  labs(
    x = "Year", y = "Median hourly wage", color = "State",
    title = "Median hourly wage of Registered Nurses"
  ) +
  theme_minimal(
    base_size = 16,
    base_family = "Atkinson Hyperlegible"
  )

Wrap-up

When you design for accessibility, you benefit everyone
Use alternative text for images and figures
Use colorblind-friendly palettes
Use whitespace or pattern to separate elements
Use a font that has been tested for accessibility

Acknowledgements

COVID visualization examples:
- The New York Times. Flattening the Coronavirus Curve
- The Washington Post. Why outbreaks like coronavirus spread exponentially, and how to “flatten the curve”
- COVID-19 Dashboard by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU)
- T. Littlefield (2020) COVID-19 Statistics Tracker
Advanced Data Visualization