step_normalize(all_numeric())
recipe(children ~ ., data = hotels)
step_rm(arrival_date)
step_date(arrival_date)
step_downsample(children)
step_holiday(arrival_date, holidays = holidays)
step_dummy(all_nominal_predictors())
step_zv(all_predictors())AE 15: Building better training data to predict children in hotel bookings
Suggested answers
Application exercise
Answers
Your Turn 1
Unscramble! You have all the steps from our knn_rec- your challenge is to unscramble them into the right order!
Save the result as knn_rec
Answer:
knn_rec <- recipe(children ~ ., data = hotels) |>
step_date(arrival_date) |>
step_holiday(arrival_date, holidays = holidays) |>
step_rm(arrival_date) |>
step_dummy(all_nominal_predictors()) |>
step_zv(all_predictors()) |>
step_normalize(all_numeric()) |>
step_downsample(children)
knn_rec
── Recipe ──────────────────────────────────────────────────────────────────────
── Inputs
Number of variables by role
outcome: 1
predictor: 21
── Operations
• Date features from: arrival_date
• Holiday features from: arrival_date
• Variables removed: arrival_date
• Dummy variables from: all_nominal_predictors()
• Zero variance filter on: all_predictors()
• Centering and scaling for: all_numeric()
• Down-sampling based on: children
Your Turn 2
Fill in the blanks to make a workflow that combines knn_rec and with knn_mod.
knn_wf <- ______ |>
______(knn_rec) |>
______(knn_mod)
knn_wfAnswer:
knn_wf <- workflow() |>
add_recipe(knn_rec) |>
add_model(knn_mod)
knn_wf══ Workflow ════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: nearest_neighbor()
── Preprocessor ────────────────────────────────────────────────────────────────
7 Recipe Steps
• step_date()
• step_holiday()
• step_rm()
• step_dummy()
• step_zv()
• step_normalize()
• step_downsample()
── Model ───────────────────────────────────────────────────────────────────────
K-Nearest Neighbor Model Specification (classification)
Computational engine: kknn
Your Turn 3
Edit the code chunk below to fit the entire knn_wflow instead of just knn_mod.
set.seed(100)
knn_mod |>
fit_resamples(
children ~ .,
resamples = hotels_folds,
# print progress of model fitting
control = control_resamples(verbose = TRUE)
) |>
collect_metrics()Answer:
set.seed(100)
knn_wf |>
fit_resamples(
resamples = hotels_folds,
control = control_resamples(verbose = TRUE)
) |>
collect_metrics()i Fold01: preprocessor 1/1
i Fold01: preprocessor 1/1, model 1/1
i Fold01: preprocessor 1/1, model 1/1 (predictions)
i Fold02: preprocessor 1/1
i Fold02: preprocessor 1/1, model 1/1
i Fold02: preprocessor 1/1, model 1/1 (predictions)
i Fold03: preprocessor 1/1
i Fold03: preprocessor 1/1, model 1/1
i Fold03: preprocessor 1/1, model 1/1 (predictions)
i Fold04: preprocessor 1/1
i Fold04: preprocessor 1/1, model 1/1
i Fold04: preprocessor 1/1, model 1/1 (predictions)
i Fold05: preprocessor 1/1
i Fold05: preprocessor 1/1, model 1/1
i Fold05: preprocessor 1/1, model 1/1 (predictions)
i Fold06: preprocessor 1/1
i Fold06: preprocessor 1/1, model 1/1
i Fold06: preprocessor 1/1, model 1/1 (predictions)
i Fold07: preprocessor 1/1
i Fold07: preprocessor 1/1, model 1/1
i Fold07: preprocessor 1/1, model 1/1 (predictions)
i Fold08: preprocessor 1/1
i Fold08: preprocessor 1/1, model 1/1
i Fold08: preprocessor 1/1, model 1/1 (predictions)
i Fold09: preprocessor 1/1
i Fold09: preprocessor 1/1, model 1/1
i Fold09: preprocessor 1/1, model 1/1 (predictions)
i Fold10: preprocessor 1/1
i Fold10: preprocessor 1/1, model 1/1
i Fold10: preprocessor 1/1, model 1/1 (predictions)
# A tibble: 3 × 6
.metric .estimator mean n std_err .config
<chr> <chr> <dbl> <int> <dbl> <chr>
1 accuracy binary 0.744 10 0.00186 pre0_mod0_post0
2 brier_class binary 0.171 10 0.00134 pre0_mod0_post0
3 roc_auc binary 0.837 10 0.00370 pre0_mod0_post0
Your Turn 4
Turns out, the same knn_rec recipe can also be used to fit a penalized logistic regression model using the lasso. Let’s try it out!
plr_mod <- logistic_reg(penalty = .01, mixture = 1) |>
set_engine("glmnet") |>
set_mode("classification")
plr_mod |>
translate()Logistic Regression Model Specification (classification)
Main Arguments:
penalty = 0.01
mixture = 1
Computational engine: glmnet
Model fit template:
glmnet::glmnet(x = missing_arg(), y = missing_arg(), weights = missing_arg(),
alpha = 1, family = "binomial")
Answer:
glmnet_wf <- knn_wf |>
update_model(plr_mod)
glmnet_wf |>
fit_resamples(resamples = hotels_folds) |>
collect_metrics()# A tibble: 3 × 6
.metric .estimator mean n std_err .config
<chr> <chr> <dbl> <int> <dbl> <chr>
1 accuracy binary 0.826 10 0.00213 pre0_mod0_post0
2 brier_class binary 0.139 10 0.000925 pre0_mod0_post0
3 roc_auc binary 0.874 10 0.00213 pre0_mod0_post0
Acknowledgments
- Materials derived from Tidymodels, Virtually: An Introduction to Machine Learning with Tidymodels by Allison Hill.
- Dataset and some modeling steps derived from A predictive modeling case study and licensed under a Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA) License.
NoteSession information
sessioninfo::session_info()─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.5.1 (2025-06-13)
os macOS Tahoe 26.0.1
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2025-10-29
pandoc 3.4 @ /usr/local/bin/ (via rmarkdown)
quarto 1.8.24 @ /usr/local/bin/quarto
─ Packages ───────────────────────────────────────────────────────────────────
! package * version date (UTC) lib source
P backports 1.5.0 2024-05-23 [?] RSPM (R 4.5.0)
P bit 4.6.0 2025-03-06 [?] RSPM (R 4.5.0)
P bit64 4.6.0-1 2025-01-16 [?] RSPM (R 4.5.0)
P broom * 1.0.9 2025-07-28 [?] RSPM (R 4.5.0)
P class 7.3-23 2025-01-01 [?] RSPM (R 4.5.0)
P cli 3.6.5 2025-04-23 [?] RSPM (R 4.5.0)
P clock 0.7.3 2025-03-21 [?] RSPM
P codetools 0.2-20 2024-03-31 [?] RSPM (R 4.5.0)
P crayon 1.5.3 2024-06-20 [?] RSPM (R 4.5.0)
P data.table 1.17.8 2025-07-10 [?] RSPM (R 4.5.0)
P dials * 1.4.2 2025-09-04 [?] RSPM
P DiceDesign 1.10 2023-12-07 [?] RSPM (R 4.5.0)
P digest 0.6.37 2024-08-19 [?] RSPM (R 4.5.0)
P dplyr * 1.1.4 2023-11-17 [?] RSPM (R 4.5.0)
P evaluate 1.0.4 2025-06-18 [?] RSPM (R 4.5.1)
P farver 2.1.2 2024-05-13 [?] RSPM (R 4.5.0)
P fastmap 1.2.0 2024-05-15 [?] RSPM (R 4.5.0)
P forcats * 1.0.0 2023-01-29 [?] RSPM (R 4.5.0)
P foreach 1.5.2 2022-02-02 [?] RSPM
P furrr 0.3.1 2022-08-15 [?] RSPM
P future 1.67.0 2025-07-29 [?] RSPM
P future.apply 1.20.0 2025-06-06 [?] RSPM
P generics 0.1.4 2025-05-09 [?] RSPM (R 4.5.0)
P ggplot2 * 3.5.2 2025-04-09 [?] RSPM (R 4.5.0)
P glmnet 4.1-10 2025-07-17 [?] RSPM
P globals 0.18.0 2025-05-08 [?] RSPM
P glue 1.8.0 2024-09-30 [?] RSPM (R 4.5.0)
P gower 1.0.2 2024-12-17 [?] RSPM
P GPfit 1.0-9 2025-04-12 [?] RSPM (R 4.5.0)
P gtable 0.3.6 2024-10-25 [?] RSPM (R 4.5.0)
P hardhat 1.4.2 2025-08-20 [?] RSPM
P here 1.0.1 2020-12-13 [?] RSPM (R 4.5.0)
P hms 1.1.3 2023-03-21 [?] RSPM (R 4.5.0)
P htmltools 0.5.8.1 2024-04-04 [?] RSPM (R 4.5.0)
P htmlwidgets 1.6.4 2023-12-06 [?] RSPM (R 4.5.0)
P igraph 2.1.4 2025-01-23 [?] RSPM (R 4.5.0)
P infer * 1.0.9 2025-06-26 [?] RSPM
P ipred 0.9-15 2024-07-18 [?] RSPM
P iterators 1.0.14 2022-02-05 [?] RSPM
P jsonlite 2.0.0 2025-03-27 [?] RSPM (R 4.5.0)
P kknn * 1.4.1 2025-05-19 [?] RSPM
P knitr 1.50 2025-03-16 [?] RSPM (R 4.5.0)
P lattice 0.22-7 2025-04-02 [?] RSPM (R 4.5.0)
P lava 1.8.1 2025-01-12 [?] RSPM
P lhs 1.2.0 2024-06-30 [?] RSPM (R 4.5.0)
P lifecycle 1.0.4 2023-11-07 [?] RSPM (R 4.5.0)
P listenv 0.9.1 2024-01-29 [?] RSPM
P lubridate * 1.9.4 2024-12-08 [?] RSPM (R 4.5.0)
P magrittr 2.0.3 2022-03-30 [?] RSPM (R 4.5.1)
P MASS 7.3-65 2025-02-28 [?] RSPM (R 4.5.0)
P Matrix 1.7-3 2025-03-11 [?] RSPM (R 4.5.0)
P modeldata * 1.5.1 2025-08-22 [?] RSPM
P nnet 7.3-20 2025-01-01 [?] RSPM (R 4.5.0)
P parallelly 1.45.1 2025-07-24 [?] RSPM
P parsnip * 1.3.3 2025-08-31 [?] RSPM
P pillar 1.11.0 2025-07-04 [?] RSPM (R 4.5.1)
P pkgconfig 2.0.3 2019-09-22 [?] RSPM (R 4.5.0)
P prodlim 2025.04.28 2025-04-28 [?] RSPM
P purrr * 1.1.0 2025-07-10 [?] RSPM (R 4.5.0)
P R6 2.6.1 2025-02-15 [?] RSPM (R 4.5.0)
P RColorBrewer 1.1-3 2022-04-03 [?] RSPM (R 4.5.0)
P Rcpp 1.1.0 2025-07-02 [?] RSPM (R 4.5.0)
P readr * 2.1.5 2024-01-10 [?] RSPM (R 4.5.0)
P recipes * 1.3.1 2025-05-21 [?] RSPM
P renv 1.1.5 2025-07-24 [?] RSPM
P rlang 1.1.6 2025-04-11 [?] RSPM (R 4.5.0)
P rmarkdown 2.29 2024-11-04 [?] RSPM
P ROSE 0.0-4 2021-06-14 [?] RSPM
P rpart 4.1.24 2025-01-07 [?] RSPM (R 4.5.0)
P rprojroot 2.1.0 2025-07-12 [?] RSPM (R 4.5.0)
P rsample * 1.3.1 2025-07-29 [?] RSPM
P rstudioapi 0.17.1 2024-10-22 [?] RSPM (R 4.5.0)
P scales * 1.4.0 2025-04-24 [?] RSPM (R 4.5.0)
P sessioninfo 1.2.3 2025-02-05 [?] RSPM (R 4.5.0)
P shape 1.4.6.1 2024-02-23 [?] RSPM
P sparsevctrs 0.3.4 2025-05-25 [?] RSPM
P stringi 1.8.7 2025-03-27 [?] RSPM (R 4.5.0)
P stringr * 1.5.1 2023-11-14 [?] RSPM (R 4.5.1)
P survival 3.8-3 2024-12-17 [?] RSPM (R 4.5.0)
P tailor * 0.1.0 2025-08-25 [?] RSPM
P themis * 1.0.3 2025-01-23 [?] RSPM
P tibble * 3.3.0 2025-06-08 [?] RSPM (R 4.5.0)
P tidymodels * 1.4.1 2025-09-08 [?] RSPM
P tidyr * 1.3.1 2024-01-24 [?] RSPM (R 4.5.0)
P tidyselect 1.2.1 2024-03-11 [?] RSPM (R 4.5.0)
P tidyverse * 2.0.0 2023-02-22 [?] RSPM (R 4.5.0)
P timechange 0.3.0 2024-01-18 [?] RSPM (R 4.5.0)
P timeDate 4041.110 2024-09-22 [?] RSPM
P tune * 2.0.0 2025-09-01 [?] RSPM
P tzdb 0.5.0 2025-03-15 [?] RSPM (R 4.5.0)
P utf8 1.2.6 2025-06-08 [?] RSPM (R 4.5.0)
P vctrs 0.6.5 2023-12-01 [?] RSPM (R 4.5.0)
P vroom 1.6.5 2023-12-05 [?] RSPM (R 4.5.1)
P withr 3.0.2 2024-10-28 [?] RSPM (R 4.5.0)
P workflows * 1.3.0 2025-08-27 [?] RSPM
P workflowsets * 1.1.1 2025-05-27 [?] RSPM
P xfun 0.52 2025-04-02 [?] RSPM (R 4.5.1)
P yaml 2.3.10 2024-07-26 [?] RSPM (R 4.5.0)
P yardstick * 1.3.2 2025-01-22 [?] RSPM
[1] /Users/bcs88/Projects/info-5001/course-site/renv/library/macos/R-4.5/aarch64-apple-darwin20
[2] /Users/bcs88/Library/Caches/org.R-project.R/R/renv/sandbox/macos/R-4.5/aarch64-apple-darwin20/4cd76b74
* ── Packages attached to the search path.
P ── Loaded and on-disk path mismatch.
──────────────────────────────────────────────────────────────────────────────