tree_mod <- decision_tree(engine = "rpart") |>
set_mode("classification")
tree_wf <- workflow() |>
add_formula(children ~ .) |>
add_model(tree_mod)AE 16: Tune better models to predict children in hotel bookings
Suggested answers
Your Turn 1
Fill in the blanks to return the accuracy and ROC AUC for this model using 10-fold cross-validation.
Fill in the blanks to return the accuracy and ROC AUC for this model using 10-fold cross-validation.
set.seed(100)
______ |>
______(resamples = hotels_folds) |>
______Answer:
set.seed(100)
tree_wf |>
fit_resamples(resamples = hotels_folds) |>
collect_metrics()# A tibble: 3 × 6
.metric .estimator mean n std_err .config
<chr> <chr> <dbl> <int> <dbl> <chr>
1 accuracy binary 0.773 10 0.00567 pre0_mod0_post0
2 brier_class binary 0.158 10 0.00322 pre0_mod0_post0
3 roc_auc binary 0.832 10 0.00672 pre0_mod0_post0
Your Turn 2
Create a new parsnip model called rf_mod, which will learn an ensemble of classification trees from our training data using the ranger package. Update your tree_wf with this new model.
Fit your workflow with 10-fold cross-validation and compare the ROC AUC of the random forest to your single decision tree model — which predicts the test set better?
Hint: you’ll need https://www.tidymodels.org/find/parsnip/
# model
rf_mod <- _____ |>
_____("ranger") |>
_____("classification")
# workflow
rf_wf <- tree_wf |>
update_model(_____)
# fit with cross-validation
set.seed(100)
_____ |>
fit_resamples(resamples = hotels_folds) |>
collect_metrics()Answer:
# model
rf_mod <- rand_forest(engine = "ranger") |>
set_mode("classification")
# workflow
rf_wf <- tree_wf |>
update_model(rf_mod)
# fit with cross-validation
set.seed(100)
rf_wf |>
fit_resamples(resamples = hotels_folds) |>
collect_metrics()# A tibble: 3 × 6
.metric .estimator mean n std_err .config
<chr> <chr> <dbl> <int> <dbl> <chr>
1 accuracy binary 0.829 10 0.00382 pre0_mod0_post0
2 brier_class binary 0.123 10 0.00176 pre0_mod0_post0
3 roc_auc binary 0.912 10 0.00319 pre0_mod0_post0
Your Turn 3
Edit the random forest model to tune the mtry and min_n hyper-parameters; call the new model spec rf_tuner.
Update your workflow to use the tuned model.
Then use tune_grid() to find the best combination of hyper-parameters to maximize roc_auc; let tune set up the grid for you.
How does it compare to the average ROC AUC across folds from fit_resamples()?
rf_mod <- rand_forest(engine = "ranger") |>
set_mode("classification")
rf_wf <- workflow() |>
add_formula(children ~ .) |>
add_model(rf_mod)
set.seed(100) # Important!
rf_results <- rf_wf |>
fit_resamples(
resamples = hotels_folds,
metrics = metric_set(roc_auc),
# change me to control_grid() with tune_grid
control = control_resamples(save_workflow = TRUE)
)
rf_results |>
collect_metrics()# A tibble: 1 × 6
.metric .estimator mean n std_err .config
<chr> <chr> <dbl> <int> <dbl> <chr>
1 roc_auc binary 0.912 10 0.00319 pre0_mod0_post0
Answer:
rf_tuner <- rand_forest(
engine = "ranger",
mtry = tune(),
min_n = tune()
) |>
set_mode("classification")
rf_wf <- rf_wf |>
update_model(rf_tuner)
set.seed(100) # Important!
rf_results <- rf_wf |>
tune_grid(
resamples = hotels_folds,
control = control_grid(save_workflow = TRUE)
)i Creating pre-processing data to finalize 1 unknown parameter: "mtry"
Your Turn 4
Use fit_best() to take the best combination of hyper-parameters from rf_results and use them to predict the test set.
How does our actual test ROC AUC compare to our cross-validated estimate?
hotels_best <- fit_best(rf_results)
# cross validated ROC AUC
rf_results |>
show_best(metric = "roc_auc", n = 5)# A tibble: 5 × 8
mtry min_n .metric .estimator mean n std_err .config
<int> <int> <chr> <chr> <dbl> <int> <dbl> <chr>
1 5 2 roc_auc binary 0.912 10 0.00331 pre0_mod03_post0
2 7 18 roc_auc binary 0.911 10 0.00358 pre0_mod04_post0
3 3 31 roc_auc binary 0.908 10 0.00304 pre0_mod02_post0
4 12 6 roc_auc binary 0.908 10 0.00425 pre0_mod06_post0
5 9 35 roc_auc binary 0.907 10 0.00386 pre0_mod05_post0
# test set ROC AUC
augment(hotels_best, new_data = hotels_test) |>
roc_auc(truth = children, .pred_children)# A tibble: 1 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 roc_auc binary 0.912
# test set ROC curve
augment(hotels_best, new_data = hotels_test) |>
roc_curve(truth = children, .pred_children) |>
autoplot()Acknowledgments
- Materials derived from Tidymodels, Virtually: An Introduction to Machine Learning with Tidymodels by Allison Hill.
- Dataset and some modeling steps derived from A predictive modeling case study and licensed under a Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA) License.
sessioninfo::session_info()─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.5.1 (2025-06-13)
os macOS Tahoe 26.0.1
system aarch64, darwin20
ui X11
language (EN)
collate C.UTF-8
ctype C.UTF-8
tz America/New_York
date 2025-10-31
pandoc 3.6.3 @ /Applications/Positron.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
quarto 1.8.24 @ /Applications/quarto/bin/quarto
─ Packages ───────────────────────────────────────────────────────────────────
! package * version date (UTC) lib source
P backports 1.5.0 2024-05-23 [?] RSPM (R 4.5.0)
P bit 4.6.0 2025-03-06 [?] RSPM (R 4.5.0)
P bit64 4.6.0-1 2025-01-16 [?] RSPM (R 4.5.0)
P broom * 1.0.9 2025-07-28 [?] RSPM (R 4.5.0)
P class 7.3-23 2025-01-01 [?] RSPM (R 4.5.0)
P cli 3.6.5 2025-04-23 [?] RSPM (R 4.5.0)
P codetools 0.2-20 2024-03-31 [?] RSPM (R 4.5.0)
P crayon 1.5.3 2024-06-20 [?] RSPM (R 4.5.0)
P data.table 1.17.8 2025-07-10 [?] RSPM (R 4.5.0)
P dials * 1.4.2 2025-09-04 [?] RSPM
P DiceDesign 1.10 2023-12-07 [?] RSPM (R 4.5.0)
P digest 0.6.37 2024-08-19 [?] RSPM (R 4.5.0)
P dplyr * 1.1.4 2023-11-17 [?] RSPM (R 4.5.0)
P evaluate 1.0.4 2025-06-18 [?] RSPM (R 4.5.1)
P farver 2.1.2 2024-05-13 [?] RSPM (R 4.5.0)
P fastmap 1.2.0 2024-05-15 [?] RSPM (R 4.5.0)
P forcats * 1.0.0 2023-01-29 [?] RSPM (R 4.5.0)
P furrr 0.3.1 2022-08-15 [?] RSPM
P future 1.67.0 2025-07-29 [?] RSPM
P future.apply 1.20.0 2025-06-06 [?] RSPM
P generics 0.1.4 2025-05-09 [?] RSPM (R 4.5.0)
P ggplot2 * 3.5.2 2025-04-09 [?] RSPM (R 4.5.0)
P globals 0.18.0 2025-05-08 [?] RSPM
P glue 1.8.0 2024-09-30 [?] RSPM (R 4.5.0)
P gower 1.0.2 2024-12-17 [?] RSPM
P GPfit 1.0-9 2025-04-12 [?] RSPM (R 4.5.0)
P gtable 0.3.6 2024-10-25 [?] RSPM (R 4.5.0)
P hardhat 1.4.2 2025-08-20 [?] RSPM
P here 1.0.1 2020-12-13 [?] RSPM (R 4.5.0)
P hms 1.1.3 2023-03-21 [?] RSPM (R 4.5.0)
P htmltools 0.5.8.1 2024-04-04 [?] RSPM (R 4.5.0)
P htmlwidgets 1.6.4 2023-12-06 [?] RSPM (R 4.5.0)
P infer * 1.0.9 2025-06-26 [?] RSPM
P ipred 0.9-15 2024-07-18 [?] RSPM
P jsonlite 2.0.0 2025-03-27 [?] RSPM (R 4.5.0)
P knitr 1.50 2025-03-16 [?] RSPM (R 4.5.0)
P labeling 0.4.3 2023-08-29 [?] RSPM (R 4.5.0)
P lattice 0.22-7 2025-04-02 [?] RSPM (R 4.5.0)
P lava 1.8.1 2025-01-12 [?] RSPM
P lhs 1.2.0 2024-06-30 [?] RSPM (R 4.5.0)
P lifecycle 1.0.4 2023-11-07 [?] RSPM (R 4.5.0)
P listenv 0.9.1 2024-01-29 [?] RSPM
P lubridate * 1.9.4 2024-12-08 [?] RSPM (R 4.5.0)
P magrittr 2.0.3 2022-03-30 [?] RSPM (R 4.5.1)
P MASS 7.3-65 2025-02-28 [?] RSPM (R 4.5.0)
P Matrix 1.7-3 2025-03-11 [?] RSPM (R 4.5.0)
P modeldata * 1.5.1 2025-08-22 [?] RSPM
P modelenv 0.2.0 2024-10-14 [?] RSPM
P nnet 7.3-20 2025-01-01 [?] RSPM (R 4.5.0)
P parallelly 1.45.1 2025-07-24 [?] RSPM
P parsnip * 1.3.3 2025-08-31 [?] RSPM
P pillar 1.11.0 2025-07-04 [?] RSPM (R 4.5.1)
P pkgconfig 2.0.3 2019-09-22 [?] RSPM (R 4.5.0)
P prodlim 2025.04.28 2025-04-28 [?] RSPM
P purrr * 1.1.0 2025-07-10 [?] RSPM (R 4.5.0)
P R6 2.6.1 2025-02-15 [?] RSPM (R 4.5.0)
P ranger 0.17.0 2024-11-08 [?] RSPM
P RColorBrewer 1.1-3 2022-04-03 [?] RSPM (R 4.5.0)
P Rcpp 1.1.0 2025-07-02 [?] RSPM (R 4.5.0)
P readr * 2.1.5 2024-01-10 [?] RSPM (R 4.5.0)
P recipes * 1.3.1 2025-05-21 [?] RSPM
P renv 1.1.5 2025-07-24 [?] RSPM
P rlang 1.1.6 2025-04-11 [?] RSPM (R 4.5.0)
P rmarkdown 2.29 2024-11-04 [?] RSPM
P rpart 4.1.24 2025-01-07 [?] RSPM (R 4.5.0)
P rprojroot 2.1.0 2025-07-12 [?] RSPM (R 4.5.0)
P rsample * 1.3.1 2025-07-29 [?] RSPM
P rstudioapi 0.17.1 2024-10-22 [?] RSPM (R 4.5.0)
P scales * 1.4.0 2025-04-24 [?] RSPM (R 4.5.0)
P sessioninfo 1.2.3 2025-02-05 [?] RSPM (R 4.5.0)
P sfd 0.1.0 2024-01-08 [?] RSPM
P sparsevctrs 0.3.4 2025-05-25 [?] RSPM
P stringi 1.8.7 2025-03-27 [?] RSPM (R 4.5.0)
P stringr * 1.5.1 2023-11-14 [?] RSPM (R 4.5.1)
P survival 3.8-3 2024-12-17 [?] RSPM (R 4.5.0)
P tailor * 0.1.0 2025-08-25 [?] RSPM
P tibble * 3.3.0 2025-06-08 [?] RSPM (R 4.5.0)
P tidymodels * 1.4.1 2025-09-08 [?] RSPM
P tidyr * 1.3.1 2024-01-24 [?] RSPM (R 4.5.0)
P tidyselect 1.2.1 2024-03-11 [?] RSPM (R 4.5.0)
P tidyverse * 2.0.0 2023-02-22 [?] RSPM (R 4.5.0)
P timechange 0.3.0 2024-01-18 [?] RSPM (R 4.5.0)
P timeDate 4041.110 2024-09-22 [?] RSPM
P tune * 2.0.0 2025-09-01 [?] RSPM
P tzdb 0.5.0 2025-03-15 [?] RSPM (R 4.5.0)
P utf8 1.2.6 2025-06-08 [?] RSPM (R 4.5.0)
P vctrs 0.6.5 2023-12-01 [?] RSPM (R 4.5.0)
P vroom 1.6.5 2023-12-05 [?] RSPM (R 4.5.1)
P withr 3.0.2 2024-10-28 [?] RSPM (R 4.5.0)
P workflows * 1.3.0 2025-08-27 [?] RSPM
P workflowsets * 1.1.1 2025-05-27 [?] RSPM
P xfun 0.52 2025-04-02 [?] RSPM (R 4.5.1)
P yaml 2.3.10 2024-07-26 [?] RSPM (R 4.5.0)
P yardstick * 1.3.2 2025-01-22 [?] RSPM
[1] /Users/bcs88/Projects/info-5001/course-site/renv/library/macos/R-4.5/aarch64-apple-darwin20
[2] /Users/bcs88/Library/Caches/org.R-project.R/R/renv/sandbox/macos/R-4.5/aarch64-apple-darwin20/4cd76b74
* ── Packages attached to the search path.
P ── Loaded and on-disk path mismatch.
──────────────────────────────────────────────────────────────────────────────
