library(tidyverse)
library(skimr)
library(scales)
AE 06: Data types and classes
Go to the course GitHub organization and locate the repo titled ae-06-YOUR_GITHUB_USERNAME
to get started.
This AE is due September 19 at 11:59pm.
Packages
We will use the following two packages in this application exercise.
- tidyverse: For data import, wrangling, and visualization.
- skimr: For summarizing the entire data frame at once.
- scales: For better axis labels.
Type coercion
- Demo: Determine the type of the following vector. And then, change the type to numeric.
<- c("1", "2", "3")
x typeof(x)
[1] "character"
as.numeric(x)
[1] 1 2 3
- Demo: Once again, determine the type of the following vector. And then, change the type to numeric. What’s different than the previous exercise?
<- c("a", "b", "c")
y
# add code here
- Demo: Once again, determine the type of the following vector. And then, change the type to numeric. What’s different than the previous exercise?
<- c("1", "2", "three")
z
# add code here
Recoding survey results
Demo: Suppose you conducted a survey where you asked people how many cars their household owns collectively. And the answers are as follows:
<- tibble(cars = c(1, 2, "three"))
survey_results survey_results
# A tibble: 3 × 1
cars
<chr>
1 1
2 2
3 three
This is annoying because of that third survey taker who just had to go and type out the number instead of providing as a numeric value. So now you need to update the cars
variable to be numeric. You do the following
|>
survey_results mutate(cars = as.numeric(cars))
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `cars = as.numeric(cars)`.
Caused by warning:
! NAs introduced by coercion
# A tibble: 3 × 1
cars
<dbl>
1 1
2 2
3 NA
And now things are even more annoying because you get a warning NAs introduced by coercion
that happened while computing cars = as.numeric(cars)
and the response from the third survey taker is now an NA
(you lost their data). Fix your mutate()
call to avoid this warning.
# add code here
Hotel bookings
# From TidyTuesday: https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-02-11/readme.md
<- read_csv("data/hotels-tt.csv") hotels
Question: Take a look at the the following visualization. How are the months ordered? What would be a better order?
Add your response here.
Solve using factors
Demo: Reorder the months on the x-axis (levels of arrival_date_month
) in a way that makes more sense. You will want to use a function from the forcats package, see https://forcats.tidyverse.org/reference/index.html for inspiration and help.
# add code here
Solve using lubridate
Demo: Reorder the months on the x-axis (levels of arrival_date_month
) in a way that makes more sense. You will want to use functions from the lubridate package, see https://lubridate.tidyverse.org/reference/index.html for inspiration and help.
# add code here
Stretch goal: If you finish the above task before time is up, change the y-axis label so the values are shown with dollar signs, e.g. $80 instead of 80. You will want to use a function from the scales package, see https://scales.r-lib.org/reference/index.html for inspiration and help.
Additionally, adjust the fig-width
code chunk option so that the entire title fits on the plot.
# add code here