AE 06: Data types and classes

Application exercise
Important

Go to the course GitHub organization and locate the repo titled ae-06-YOUR_GITHUB_USERNAME to get started.

This AE is due September 19 at 11:59pm.

Packages

We will use the following two packages in this application exercise.

  • tidyverse: For data import, wrangling, and visualization.
  • skimr: For summarizing the entire data frame at once.
  • scales: For better axis labels.
library(tidyverse)
library(skimr)
library(scales)

Type coercion

  • Demo: Determine the type of the following vector. And then, change the type to numeric.
x <- c("1", "2", "3")
typeof(x)
[1] "character"
as.numeric(x)
[1] 1 2 3
  • Demo: Once again, determine the type of the following vector. And then, change the type to numeric. What’s different than the previous exercise?
y <- c("a", "b", "c")

# add code here
  • Demo: Once again, determine the type of the following vector. And then, change the type to numeric. What’s different than the previous exercise?
z <- c("1", "2", "three")

# add code here

Recoding survey results

Demo: Suppose you conducted a survey where you asked people how many cars their household owns collectively. And the answers are as follows:

survey_results <- tibble(cars = c(1, 2, "three"))
survey_results
# A tibble: 3 × 1
  cars 
  <chr>
1 1    
2 2    
3 three

This is annoying because of that third survey taker who just had to go and type out the number instead of providing as a numeric value. So now you need to update the cars variable to be numeric. You do the following

survey_results |>
  mutate(cars = as.numeric(cars))
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `cars = as.numeric(cars)`.
Caused by warning:
! NAs introduced by coercion
# A tibble: 3 × 1
   cars
  <dbl>
1     1
2     2
3    NA

And now things are even more annoying because you get a warning NAs introduced by coercion that happened while computing cars = as.numeric(cars) and the response from the third survey taker is now an NA (you lost their data). Fix your mutate() call to avoid this warning.

# add code here

Hotel bookings

# From TidyTuesday: https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-02-11/readme.md

hotels <- read_csv("data/hotels-tt.csv")

Question: Take a look at the the following visualization. How are the months ordered? What would be a better order?

Add your response here.

Solve using factors

Demo: Reorder the months on the x-axis (levels of arrival_date_month) in a way that makes more sense. You will want to use a function from the forcats package, see https://forcats.tidyverse.org/reference/index.html for inspiration and help.

# add code here

Solve using lubridate

Demo: Reorder the months on the x-axis (levels of arrival_date_month) in a way that makes more sense. You will want to use functions from the lubridate package, see https://lubridate.tidyverse.org/reference/index.html for inspiration and help.

# add code here

Stretch goal: If you finish the above task before time is up, change the y-axis label so the values are shown with dollar signs, e.g. $80 instead of 80. You will want to use a function from the scales package, see https://scales.r-lib.org/reference/index.html for inspiration and help.

Additionally, adjust the fig-width code chunk option so that the entire title fits on the plot.

# add code here