AE 15: Building better training data to predict children in hotel bookings

Suggested answers

Application exercise
Answers
Modified

October 22, 2025

Your Turn 1

Unscramble! You have all the steps from our knn_rec- your challenge is to unscramble them into the right order!

Save the result as knn_rec

step_normalize(all_numeric())

recipe(children ~ ., data = hotels)

step_rm(arrival_date)

step_date(arrival_date)

step_downsample(children)

step_holiday(arrival_date, holidays = holidays)

step_dummy(all_nominal_predictors())

step_zv(all_predictors())

Answer:

knn_rec <- recipe(children ~ ., data = hotels) |>
  step_date(arrival_date) |>
  step_holiday(arrival_date, holidays = holidays) |>
  step_rm(arrival_date) |>
  step_dummy(all_nominal_predictors()) |>
  step_zv(all_predictors()) |>
  step_normalize(all_numeric()) |>
  step_downsample(children)
knn_rec
── Recipe ──────────────────────────────────────────────────────────────────────
── Inputs 
Number of variables by role
outcome:    1
predictor: 21
── Operations 
• Date features from: arrival_date
• Holiday features from: arrival_date
• Variables removed: arrival_date
• Dummy variables from: all_nominal_predictors()
• Zero variance filter on: all_predictors()
• Centering and scaling for: all_numeric()
• Down-sampling based on: children

Your Turn 2

Fill in the blanks to make a workflow that combines knn_rec and with knn_mod.

knn_wf <- ______ |>
  ______(knn_rec) |>
  ______(knn_mod)
knn_wf

Answer:

knn_wf <- workflow() |>
  add_recipe(knn_rec) |>
  add_model(knn_mod)
knn_wf
══ Workflow ════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: nearest_neighbor()

── Preprocessor ────────────────────────────────────────────────────────────────
7 Recipe Steps

• step_date()
• step_holiday()
• step_rm()
• step_dummy()
• step_zv()
• step_normalize()
• step_downsample()

── Model ───────────────────────────────────────────────────────────────────────
K-Nearest Neighbor Model Specification (classification)

Computational engine: kknn 

Your Turn 3

Edit the code chunk below to fit the entire knn_wflow instead of just knn_mod.

set.seed(100)
knn_mod |>
  fit_resamples(
    children ~ .,
    resamples = hotels_folds,
    # print progress of model fitting
    control = control_resamples(verbose = TRUE)
  ) |>
  collect_metrics()

Answer:

set.seed(100)
knn_wf |>
  fit_resamples(
    resamples = hotels_folds,
    control = control_resamples(verbose = TRUE)
  ) |>
  collect_metrics()
i Fold01: preprocessor 1/1
i Fold01: preprocessor 1/1, model 1/1
i Fold01: preprocessor 1/1, model 1/1 (predictions)
i Fold02: preprocessor 1/1
i Fold02: preprocessor 1/1, model 1/1
i Fold02: preprocessor 1/1, model 1/1 (predictions)
i Fold03: preprocessor 1/1
i Fold03: preprocessor 1/1, model 1/1
i Fold03: preprocessor 1/1, model 1/1 (predictions)
i Fold04: preprocessor 1/1
i Fold04: preprocessor 1/1, model 1/1
i Fold04: preprocessor 1/1, model 1/1 (predictions)
i Fold05: preprocessor 1/1
i Fold05: preprocessor 1/1, model 1/1
i Fold05: preprocessor 1/1, model 1/1 (predictions)
i Fold06: preprocessor 1/1
i Fold06: preprocessor 1/1, model 1/1
i Fold06: preprocessor 1/1, model 1/1 (predictions)
i Fold07: preprocessor 1/1
i Fold07: preprocessor 1/1, model 1/1
i Fold07: preprocessor 1/1, model 1/1 (predictions)
i Fold08: preprocessor 1/1
i Fold08: preprocessor 1/1, model 1/1
i Fold08: preprocessor 1/1, model 1/1 (predictions)
i Fold09: preprocessor 1/1
i Fold09: preprocessor 1/1, model 1/1
i Fold09: preprocessor 1/1, model 1/1 (predictions)
i Fold10: preprocessor 1/1
i Fold10: preprocessor 1/1, model 1/1
i Fold10: preprocessor 1/1, model 1/1 (predictions)
# A tibble: 3 × 6
  .metric     .estimator  mean     n std_err .config        
  <chr>       <chr>      <dbl> <int>   <dbl> <chr>          
1 accuracy    binary     0.744    10 0.00186 pre0_mod0_post0
2 brier_class binary     0.171    10 0.00134 pre0_mod0_post0
3 roc_auc     binary     0.837    10 0.00370 pre0_mod0_post0

Your Turn 4

Turns out, the same knn_rec recipe can also be used to fit a penalized logistic regression model using the lasso. Let’s try it out!

plr_mod <- logistic_reg(penalty = .01, mixture = 1) |>
  set_engine("glmnet") |>
  set_mode("classification")

plr_mod |>
  translate()
Logistic Regression Model Specification (classification)

Main Arguments:
  penalty = 0.01
  mixture = 1

Computational engine: glmnet 

Model fit template:
glmnet::glmnet(x = missing_arg(), y = missing_arg(), weights = missing_arg(), 
    alpha = 1, family = "binomial")

Answer:

glmnet_wf <- knn_wf |>
  update_model(plr_mod)

glmnet_wf |>
  fit_resamples(resamples = hotels_folds) |>
  collect_metrics()
# A tibble: 3 × 6
  .metric     .estimator  mean     n  std_err .config        
  <chr>       <chr>      <dbl> <int>    <dbl> <chr>          
1 accuracy    binary     0.826    10 0.00213  pre0_mod0_post0
2 brier_class binary     0.139    10 0.000925 pre0_mod0_post0
3 roc_auc     binary     0.874    10 0.00213  pre0_mod0_post0

Acknowledgments

sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.5.1 (2025-06-13)
 os       macOS Tahoe 26.0.1
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2025-10-29
 pandoc   3.4 @ /usr/local/bin/ (via rmarkdown)
 quarto   1.8.24 @ /usr/local/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 ! package      * version    date (UTC) lib source
 P backports      1.5.0      2024-05-23 [?] RSPM (R 4.5.0)
 P bit            4.6.0      2025-03-06 [?] RSPM (R 4.5.0)
 P bit64          4.6.0-1    2025-01-16 [?] RSPM (R 4.5.0)
 P broom        * 1.0.9      2025-07-28 [?] RSPM (R 4.5.0)
 P class          7.3-23     2025-01-01 [?] RSPM (R 4.5.0)
 P cli            3.6.5      2025-04-23 [?] RSPM (R 4.5.0)
 P clock          0.7.3      2025-03-21 [?] RSPM
 P codetools      0.2-20     2024-03-31 [?] RSPM (R 4.5.0)
 P crayon         1.5.3      2024-06-20 [?] RSPM (R 4.5.0)
 P data.table     1.17.8     2025-07-10 [?] RSPM (R 4.5.0)
 P dials        * 1.4.2      2025-09-04 [?] RSPM
 P DiceDesign     1.10       2023-12-07 [?] RSPM (R 4.5.0)
 P digest         0.6.37     2024-08-19 [?] RSPM (R 4.5.0)
 P dplyr        * 1.1.4      2023-11-17 [?] RSPM (R 4.5.0)
 P evaluate       1.0.4      2025-06-18 [?] RSPM (R 4.5.1)
 P farver         2.1.2      2024-05-13 [?] RSPM (R 4.5.0)
 P fastmap        1.2.0      2024-05-15 [?] RSPM (R 4.5.0)
 P forcats      * 1.0.0      2023-01-29 [?] RSPM (R 4.5.0)
 P foreach        1.5.2      2022-02-02 [?] RSPM
 P furrr          0.3.1      2022-08-15 [?] RSPM
 P future         1.67.0     2025-07-29 [?] RSPM
 P future.apply   1.20.0     2025-06-06 [?] RSPM
 P generics       0.1.4      2025-05-09 [?] RSPM (R 4.5.0)
 P ggplot2      * 3.5.2      2025-04-09 [?] RSPM (R 4.5.0)
 P glmnet         4.1-10     2025-07-17 [?] RSPM
 P globals        0.18.0     2025-05-08 [?] RSPM
 P glue           1.8.0      2024-09-30 [?] RSPM (R 4.5.0)
 P gower          1.0.2      2024-12-17 [?] RSPM
 P GPfit          1.0-9      2025-04-12 [?] RSPM (R 4.5.0)
 P gtable         0.3.6      2024-10-25 [?] RSPM (R 4.5.0)
 P hardhat        1.4.2      2025-08-20 [?] RSPM
 P here           1.0.1      2020-12-13 [?] RSPM (R 4.5.0)
 P hms            1.1.3      2023-03-21 [?] RSPM (R 4.5.0)
 P htmltools      0.5.8.1    2024-04-04 [?] RSPM (R 4.5.0)
 P htmlwidgets    1.6.4      2023-12-06 [?] RSPM (R 4.5.0)
 P igraph         2.1.4      2025-01-23 [?] RSPM (R 4.5.0)
 P infer        * 1.0.9      2025-06-26 [?] RSPM
 P ipred          0.9-15     2024-07-18 [?] RSPM
 P iterators      1.0.14     2022-02-05 [?] RSPM
 P jsonlite       2.0.0      2025-03-27 [?] RSPM (R 4.5.0)
 P kknn         * 1.4.1      2025-05-19 [?] RSPM
 P knitr          1.50       2025-03-16 [?] RSPM (R 4.5.0)
 P lattice        0.22-7     2025-04-02 [?] RSPM (R 4.5.0)
 P lava           1.8.1      2025-01-12 [?] RSPM
 P lhs            1.2.0      2024-06-30 [?] RSPM (R 4.5.0)
 P lifecycle      1.0.4      2023-11-07 [?] RSPM (R 4.5.0)
 P listenv        0.9.1      2024-01-29 [?] RSPM
 P lubridate    * 1.9.4      2024-12-08 [?] RSPM (R 4.5.0)
 P magrittr       2.0.3      2022-03-30 [?] RSPM (R 4.5.1)
 P MASS           7.3-65     2025-02-28 [?] RSPM (R 4.5.0)
 P Matrix         1.7-3      2025-03-11 [?] RSPM (R 4.5.0)
 P modeldata    * 1.5.1      2025-08-22 [?] RSPM
 P nnet           7.3-20     2025-01-01 [?] RSPM (R 4.5.0)
 P parallelly     1.45.1     2025-07-24 [?] RSPM
 P parsnip      * 1.3.3      2025-08-31 [?] RSPM
 P pillar         1.11.0     2025-07-04 [?] RSPM (R 4.5.1)
 P pkgconfig      2.0.3      2019-09-22 [?] RSPM (R 4.5.0)
 P prodlim        2025.04.28 2025-04-28 [?] RSPM
 P purrr        * 1.1.0      2025-07-10 [?] RSPM (R 4.5.0)
 P R6             2.6.1      2025-02-15 [?] RSPM (R 4.5.0)
 P RColorBrewer   1.1-3      2022-04-03 [?] RSPM (R 4.5.0)
 P Rcpp           1.1.0      2025-07-02 [?] RSPM (R 4.5.0)
 P readr        * 2.1.5      2024-01-10 [?] RSPM (R 4.5.0)
 P recipes      * 1.3.1      2025-05-21 [?] RSPM
 P renv           1.1.5      2025-07-24 [?] RSPM
 P rlang          1.1.6      2025-04-11 [?] RSPM (R 4.5.0)
 P rmarkdown      2.29       2024-11-04 [?] RSPM
 P ROSE           0.0-4      2021-06-14 [?] RSPM
 P rpart          4.1.24     2025-01-07 [?] RSPM (R 4.5.0)
 P rprojroot      2.1.0      2025-07-12 [?] RSPM (R 4.5.0)
 P rsample      * 1.3.1      2025-07-29 [?] RSPM
 P rstudioapi     0.17.1     2024-10-22 [?] RSPM (R 4.5.0)
 P scales       * 1.4.0      2025-04-24 [?] RSPM (R 4.5.0)
 P sessioninfo    1.2.3      2025-02-05 [?] RSPM (R 4.5.0)
 P shape          1.4.6.1    2024-02-23 [?] RSPM
 P sparsevctrs    0.3.4      2025-05-25 [?] RSPM
 P stringi        1.8.7      2025-03-27 [?] RSPM (R 4.5.0)
 P stringr      * 1.5.1      2023-11-14 [?] RSPM (R 4.5.1)
 P survival       3.8-3      2024-12-17 [?] RSPM (R 4.5.0)
 P tailor       * 0.1.0      2025-08-25 [?] RSPM
 P themis       * 1.0.3      2025-01-23 [?] RSPM
 P tibble       * 3.3.0      2025-06-08 [?] RSPM (R 4.5.0)
 P tidymodels   * 1.4.1      2025-09-08 [?] RSPM
 P tidyr        * 1.3.1      2024-01-24 [?] RSPM (R 4.5.0)
 P tidyselect     1.2.1      2024-03-11 [?] RSPM (R 4.5.0)
 P tidyverse    * 2.0.0      2023-02-22 [?] RSPM (R 4.5.0)
 P timechange     0.3.0      2024-01-18 [?] RSPM (R 4.5.0)
 P timeDate       4041.110   2024-09-22 [?] RSPM
 P tune         * 2.0.0      2025-09-01 [?] RSPM
 P tzdb           0.5.0      2025-03-15 [?] RSPM (R 4.5.0)
 P utf8           1.2.6      2025-06-08 [?] RSPM (R 4.5.0)
 P vctrs          0.6.5      2023-12-01 [?] RSPM (R 4.5.0)
 P vroom          1.6.5      2023-12-05 [?] RSPM (R 4.5.1)
 P withr          3.0.2      2024-10-28 [?] RSPM (R 4.5.0)
 P workflows    * 1.3.0      2025-08-27 [?] RSPM
 P workflowsets * 1.1.1      2025-05-27 [?] RSPM
 P xfun           0.52       2025-04-02 [?] RSPM (R 4.5.1)
 P yaml           2.3.10     2024-07-26 [?] RSPM (R 4.5.0)
 P yardstick    * 1.3.2      2025-01-22 [?] RSPM

 [1] /Users/bcs88/Projects/info-5001/course-site/renv/library/macos/R-4.5/aarch64-apple-darwin20
 [2] /Users/bcs88/Library/Caches/org.R-project.R/R/renv/sandbox/macos/R-4.5/aarch64-apple-darwin20/4cd76b74

 * ── Packages attached to the search path.
 P ── Loaded and on-disk path mismatch.

──────────────────────────────────────────────────────────────────────────────