AE 02: Visualizing the prognosticators

Application exercise
Important

Go to the course GitHub organization and locate the repo titled ae-02-YOUR_GITHUB_USERNAME to get started.

This AE is due August 31 at 11:59pm.

For all analyses, we’ll use the tidyverse packages.

library(tidyverse)

The dataset we will visualize is called seers.1 It contains summary statistics for all known Groundhog Day forecasters. 2 Let’s glimpse() at it .

1 I would prefer prognosticators, but I had way too many typos preparing these materials to make you all use it.

2 Source: Countdown to Groundhog Day. Application exercise inspired by Groundhogs Do Not Make Good Meteorologists originally published on FiveThirtyEight.

# import data using readr::read_csv()
seers <- read_csv("data/prognosticators-sum-stats.csv")
Rows: 138 Columns: 18
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (6): name, forecaster_type, forecaster_simple, climate_region, town, state
dbl (11): preds_n, preds_long_winter, preds_long_winter_pct, preds_correct, ...
lgl  (1): alive

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# add code here

The variables are:

3 Prognosticators labeled as Animatronic/Puppet/Statue/Stuffed/Taxidermied are classified as not alive.

4 We adopt the same definition as FiveThirtyEight. An “Early Spring” is defined as any year in which the average temperature in either February or March was higher than the historic average. A “Late Winter” was when the average temperature in both months was lower than or the same as the historical average.

Visualizing prediction success rate - Demo

Single variable

Note

Analyzing the a single variable is called univariate analysis.

Create visualizations of the distribution of preds_rate for the prognosticators.

  1. Make a histogram. Set an appropriate binwidth.
# add code here
  1. Make a boxplot.
# add code here

Two variables - Your turn

Note

Analyzing the relationship between two variables is called bivariate analysis.

Create visualizations of the distribution of preds_rate by alive (whether or not the prognosticator is alive).

  1. Make a single histogram. Set an appropriate binwidth.
# add code here
  1. Use multiple histograms via faceting, one for each type. Set an appropriate binwidth, add color as you see fit, and turn off legends if not needed.
# add code here
  1. Use side-by-side box plots. Add color as you see fit and turn off legends if not needed.
# add code here
  1. Use density plots. Add color as you see fit.
# add code here
  1. Use violin plots. Add color as you see fit and turn off legends if not needed.
# add code here
  1. Make a jittered scatter plot. Add color as you see fit and turn off legends if not needed.
# add code here
  1. Use beeswarm plots. Add color as you see fit and turn off legends if not needed.
library(ggbeeswarm)

# add code here
  1. Demonstration: Use multiple geoms on a single plot. Be deliberate about the order of plotting. Change the theme and the color scale of the plot. Finally, add informative labels.
# add code here

Multiple variables - Demo

Note

Analyzing the relationship between three or more variables is called multivariate analysis.

  1. Facet the plot you created in the previous exercise by forecaster_simple. Adjust labels accordingly.
# add code here

Before you continue, let’s turn off all warnings the code chunks generate and resize all figures. We’ll do this by editing the YAML.

Visualizing other variables - Your turn!

  1. Pick a single categorical variable from the data set and make a bar plot of its distribution.
# add code here
  1. Pick two categorical variables and make a visualization to visualize the relationship between the two variables. Along with your code and output, provide an interpretation of the visualization.
# add code here

Interpretation goes here…

  1. Make another plot that uses at least three variables. At least one should be numeric and at least one categorical. In 1-2 sentences, describe what the plot shows about the relationships between the variables you plotted. Don’t forget to label your code chunk.
# add code here

Interpretation goes here…