lr_mod <- logistic_reg() |>
set_engine(engine = "glm") |>
set_mode("classification")
lr_mod
Logistic Regression Model Specification (classification)
Computational engine: glm
Suggested answers
Run the chunk below and look at the output. Then, copy/paste the code and edit to create:
a decision tree model for classification
that uses the C5.0
engine.
Save it as tree_mod
and look at the object. What is different about the output?
Hint: you’ll need https://www.tidymodels.org/find/parsnip/
lr_mod <- logistic_reg() |>
set_engine(engine = "glm") |>
set_mode("classification")
lr_mod
Logistic Regression Model Specification (classification)
Computational engine: glm
tree_mod <- decision_tree() |>
set_engine(engine = "C5.0") |>
set_mode("classification")
tree_mod
Decision Tree Model Specification (classification)
Computational engine: C5.0
Fill in the blanks.
Use initial_split()
, training()
, and testing()
to:
Split hotels into training and test sets. Save the rsplit!
Extract the training data and fit your classification tree model.
Check the proportions of the test
variable in each set.
Keep set.seed(100)
at the start of your code.
Hint: Be sure to remove every _
before running the code!
set.seed(100) # Important!
hotels_split <- initial_split(data = hotels, prop = 3 / 4)
hotels_train <- training(hotels_split)
hotels_test <- testing(hotels_split)
# check distribution
count(x = hotels_train, children) |>
mutate(prop = n / sum(n))
# A tibble: 2 × 3
children n prop
<fct> <int> <dbl>
1 children 1503 0.501
2 none 1497 0.499
count(x = hotels_test, children) |>
mutate(prop = n / sum(n))
# A tibble: 2 × 3
children n prop
<fct> <int> <dbl>
1 children 497 0.497
2 none 503 0.503
Run the code below. What does it return?
set.seed(100)
hotels_folds <- vfold_cv(data = hotels_train, v = 10)
hotels_folds
# 10-fold cross-validation
# A tibble: 10 × 2
splits id
<list> <chr>
1 <split [2700/300]> Fold01
2 <split [2700/300]> Fold02
3 <split [2700/300]> Fold03
4 <split [2700/300]> Fold04
5 <split [2700/300]> Fold05
6 <split [2700/300]> Fold06
7 <split [2700/300]> Fold07
8 <split [2700/300]> Fold08
9 <split [2700/300]> Fold09
10 <split [2700/300]> Fold10
Add a autoplot()
to visualize the ROC AUC. How well does the model perform?
tree_preds <- tree_mod |>
fit_resamples(
children ~ average_daily_rate + stays_in_weekend_nights,
resamples = hotels_folds,
control = control_resamples(save_pred = TRUE)
)
tree_preds |>
collect_predictions() |>
roc_auc(truth = children, .pred_children)
# A tibble: 1 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 roc_auc binary 0.670
tree_preds |>
collect_predictions() |>
roc_curve(truth = children, .pred_children) |>
autoplot()
It’s moderately successful. Better than \(0.5\), but still has a lot of room for improvement.
sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.4.1 (2024-06-14)
os macOS Sonoma 14.6.1
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2024-11-04
pandoc 3.4 @ /usr/local/bin/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
! package * version date (UTC) lib source
P backports 1.5.0 2024-05-23 [?] CRAN (R 4.4.0)
P bit 4.0.5 2022-11-15 [?] CRAN (R 4.3.0)
P bit64 4.0.5 2020-08-30 [?] CRAN (R 4.3.0)
P broom * 1.0.6 2024-05-17 [?] CRAN (R 4.4.0)
P C50 * 0.1.8 2023-02-08 [?] RSPM
P class 7.3-22 2023-05-03 [?] CRAN (R 4.4.0)
cli 3.6.3 2024-06-21 [1] RSPM (R 4.4.0)
P codetools 0.2-20 2024-03-31 [?] CRAN (R 4.4.1)
P colorspace 2.1-0 2023-01-23 [?] CRAN (R 4.3.0)
P crayon 1.5.3 2024-06-20 [?] CRAN (R 4.4.0)
P Cubist 0.4.4 2024-07-02 [?] RSPM
P data.table 1.15.4 2024-03-30 [?] CRAN (R 4.3.1)
P dials * 1.2.1 2024-02-22 [?] CRAN (R 4.3.1)
P DiceDesign 1.10 2023-12-07 [?] CRAN (R 4.3.1)
P digest 0.6.35 2024-03-11 [?] CRAN (R 4.3.1)
P dplyr * 1.1.4 2023-11-17 [?] CRAN (R 4.3.1)
P evaluate 0.24.0 2024-06-10 [?] CRAN (R 4.4.0)
P fansi 1.0.6 2023-12-08 [?] CRAN (R 4.3.1)
P farver 2.1.2 2024-05-13 [?] CRAN (R 4.3.3)
P fastmap 1.2.0 2024-05-15 [?] CRAN (R 4.4.0)
P forcats * 1.0.0 2023-01-29 [?] CRAN (R 4.3.0)
P foreach 1.5.2 2022-02-02 [?] CRAN (R 4.3.0)
P Formula 1.2-5 2023-02-24 [?] CRAN (R 4.3.0)
P furrr 0.3.1 2022-08-15 [?] CRAN (R 4.3.0)
P future 1.33.2 2024-03-26 [?] CRAN (R 4.3.1)
P future.apply 1.11.2 2024-03-28 [?] CRAN (R 4.3.1)
P generics 0.1.3 2022-07-05 [?] CRAN (R 4.3.0)
P ggplot2 * 3.5.1 2024-04-23 [?] CRAN (R 4.3.1)
P globals 0.16.3 2024-03-08 [?] CRAN (R 4.3.1)
glue 1.8.0 2024-09-30 [1] RSPM (R 4.4.0)
P gower 1.0.1 2022-12-22 [?] CRAN (R 4.3.0)
P GPfit 1.0-8 2019-02-08 [?] CRAN (R 4.3.0)
P gtable 0.3.5 2024-04-22 [?] CRAN (R 4.3.1)
P hardhat 1.4.0 2024-06-02 [?] CRAN (R 4.4.0)
P here 1.0.1 2020-12-13 [?] CRAN (R 4.3.0)
P hms 1.1.3 2023-03-21 [?] CRAN (R 4.3.0)
P htmltools 0.5.8.1 2024-04-04 [?] CRAN (R 4.3.1)
P htmlwidgets 1.6.4 2023-12-06 [?] CRAN (R 4.3.1)
P infer * 1.0.7 2024-03-25 [?] CRAN (R 4.3.1)
P inum 1.0-5 2023-03-09 [?] CRAN (R 4.3.0)
P ipred 0.9-14 2023-03-09 [?] CRAN (R 4.3.0)
P iterators 1.0.14 2022-02-05 [?] CRAN (R 4.3.0)
P jsonlite 1.8.8 2023-12-04 [?] CRAN (R 4.3.1)
P knitr 1.47 2024-05-29 [?] CRAN (R 4.4.0)
P labeling 0.4.3 2023-08-29 [?] CRAN (R 4.3.0)
P lattice 0.22-6 2024-03-20 [?] CRAN (R 4.4.0)
P lava 1.8.0 2024-03-05 [?] CRAN (R 4.3.1)
P lhs 1.1.6 2022-12-17 [?] CRAN (R 4.3.0)
P libcoin 1.0-10 2023-09-27 [?] CRAN (R 4.3.1)
P lifecycle 1.0.4 2023-11-07 [?] CRAN (R 4.3.1)
P listenv 0.9.1 2024-01-29 [?] CRAN (R 4.3.1)
P lubridate * 1.9.3 2023-09-27 [?] CRAN (R 4.3.1)
P magrittr 2.0.3 2022-03-30 [?] CRAN (R 4.3.0)
P MASS 7.3-61 2024-06-13 [?] CRAN (R 4.4.0)
P Matrix 1.7-0 2024-03-22 [?] CRAN (R 4.4.0)
P modeldata * 1.4.0 2024-06-19 [?] CRAN (R 4.4.0)
P modelenv 0.1.1 2023-03-08 [?] CRAN (R 4.3.0)
P munsell 0.5.1 2024-04-01 [?] CRAN (R 4.3.1)
P mvtnorm 1.2-5 2024-05-21 [?] CRAN (R 4.4.0)
P nnet 7.3-19 2023-05-03 [?] CRAN (R 4.4.0)
P parallelly 1.37.1 2024-02-29 [?] CRAN (R 4.3.1)
P parsnip * 1.2.1 2024-03-22 [?] CRAN (R 4.3.1)
P partykit 1.2-20 2023-04-14 [?] CRAN (R 4.3.0)
P pillar 1.9.0 2023-03-22 [?] CRAN (R 4.3.0)
P pkgconfig 2.0.3 2019-09-22 [?] CRAN (R 4.3.0)
P plyr 1.8.9 2023-10-02 [?] CRAN (R 4.3.1)
P prodlim 2023.08.28 2023-08-28 [?] CRAN (R 4.3.0)
P purrr * 1.0.2 2023-08-10 [?] CRAN (R 4.3.0)
P R6 2.5.1 2021-08-19 [?] CRAN (R 4.3.0)
P Rcpp 1.0.12 2024-01-09 [?] CRAN (R 4.3.1)
P readr * 2.1.5 2024-01-10 [?] CRAN (R 4.3.1)
P recipes * 1.0.10 2024-02-18 [?] CRAN (R 4.3.1)
renv 1.0.7 2024-04-11 [1] CRAN (R 4.4.0)
P reshape2 1.4.4 2020-04-09 [?] CRAN (R 4.3.0)
P rlang 1.1.4 2024-06-04 [?] CRAN (R 4.3.3)
P rmarkdown 2.27 2024-05-17 [?] CRAN (R 4.4.0)
P rpart 4.1.23 2023-12-05 [?] CRAN (R 4.4.0)
P rprojroot 2.0.4 2023-11-05 [?] CRAN (R 4.3.1)
P rsample * 1.2.1 2024-03-25 [?] CRAN (R 4.3.1)
P rstudioapi 0.16.0 2024-03-24 [?] CRAN (R 4.3.1)
P scales * 1.3.0.9000 2024-05-07 [?] Github (r-lib/scales@c0f79d3)
P sessioninfo 1.2.2 2021-12-06 [?] CRAN (R 4.3.0)
P stringi 1.8.4 2024-05-06 [?] CRAN (R 4.3.1)
P stringr * 1.5.1 2023-11-14 [?] CRAN (R 4.3.1)
P survival 3.7-0 2024-06-05 [?] CRAN (R 4.4.0)
P tibble * 3.2.1 2023-03-20 [?] CRAN (R 4.3.0)
P tidymodels * 1.2.0 2024-03-25 [?] CRAN (R 4.3.1)
P tidyr * 1.3.1 2024-01-24 [?] CRAN (R 4.3.1)
P tidyselect 1.2.1 2024-03-11 [?] CRAN (R 4.3.1)
P tidyverse * 2.0.0 2023-02-22 [?] CRAN (R 4.3.0)
P timechange 0.3.0 2024-01-18 [?] CRAN (R 4.3.1)
P timeDate 4032.109 2023-12-14 [?] CRAN (R 4.3.1)
P tune * 1.2.1 2024-04-18 [?] CRAN (R 4.3.1)
P tzdb 0.4.0 2023-05-12 [?] CRAN (R 4.3.0)
P utf8 1.2.4 2023-10-22 [?] CRAN (R 4.3.1)
P vctrs 0.6.5 2023-12-01 [?] CRAN (R 4.3.1)
P vroom 1.6.5 2023-12-05 [?] CRAN (R 4.3.1)
withr 3.0.1 2024-07-31 [1] RSPM (R 4.4.0)
P workflows * 1.1.4 2024-02-19 [?] CRAN (R 4.3.1)
P workflowsets * 1.1.0 2024-03-21 [?] CRAN (R 4.3.1)
P xfun 0.45 2024-06-16 [?] CRAN (R 4.4.0)
P yaml 2.3.8 2023-12-11 [?] CRAN (R 4.3.1)
P yardstick * 1.3.1 2024-03-21 [?] CRAN (R 4.3.1)
[1] /Users/soltoffbc/Projects/info-5001/course-site/renv/library/macos/R-4.4/aarch64-apple-darwin20
[2] /Users/soltoffbc/Library/Caches/org.R-project.R/R/renv/sandbox/macos/R-4.4/aarch64-apple-darwin20/f7156815
P ── Loaded and on-disk path mismatch.
──────────────────────────────────────────────────────────────────────────────