tree_mod <- decision_tree(engine = "rpart") |>
set_mode("classification")
tree_wf <- workflow() |>
add_formula(children ~ .) |>
add_model(tree_mod)
Tune better models to predict children in hotel bookings
Your Turn 1
Fill in the blanks to return the accuracy and ROC AUC for this model using 10-fold cross-validation.
set.seed(100)
|>
______ ______(resamples = hotels_folds) |>
______
Your Turn 2
Create a new parsnip model called rf_mod
, which will learn an ensemble of classification trees from our training data using the ranger package. Update your tree_wf
with this new model.
Fit your workflow with 10-fold cross-validation and compare the ROC AUC of the random forest to your single decision tree model — which predicts the test set better?
Hint: you’ll need https://www.tidymodels.org/find/parsnip/
# model
<- _____ |>
rf_mod _____("ranger") |>
_____("classification")
# workflow
<- tree_wf |>
rf_wf update_model(_____)
# fit with cross-validation
set.seed(100)
|>
_____ fit_resamples(resamples = hotels_folds) |>
collect_metrics()
Your Turn 3
Challenge: Fit 3 more random forest models, each using 5, 12, and 21 variables at each split. Update your rf_wf
with each new model. Which value maximizes the area under the ROC curve?
rf5_mod <- rf_mod |>
set_args(mtry = 5)
rf12_mod <- rf_mod |>
set_args(mtry = 12)
rf21_mod <- rf_mod |>
set_args(mtry = 21)
Do this for each model above:
<- rf_wf |>
_____ update_model(_____)
set.seed(100)
|>
_____ fit_resamples(resamples = hotels_folds) |>
collect_metrics()
Your Turn 4
Edit the random forest model to tune the mtry
and min_n
hyper-parameters; call the new model spec rf_tuner
.
Update your workflow to use the tuned model.
Then use tune_grid()
to find the best combination of hyper-parameters to maximize roc_auc
; let tune set up the grid for you.
How does it compare to the average ROC AUC across folds from fit_resamples()
?
rf_mod <- rand_forest(engine = "ranger") |>
set_mode("classification")
rf_wf <- workflow() |>
add_formula(children ~ .) |>
add_model(rf_mod)
set.seed(100) # Important!
rf_results <- rf_wf |>
fit_resamples(resamples = hotels_folds,
metrics = metric_set(roc_auc),
# change me to control_grid() with tune_grid
control = control_resamples(verbose = TRUE,
save_workflow = TRUE))
rf_results |>
collect_metrics()
# your code here
Your Turn 5
Use fit_best()
to take the best combination of hyper-parameters from rf_results
and use them to predict the test set.
How does our actual test ROC AUC compare to our cross-validated estimate?
# your code here
Acknowledgments
- Materials derived from Tidymodels, Virtually: An Introduction to Machine Learning with Tidymodels by Allison Hill.
- Dataset and some modeling steps derived from A predictive modeling case study and licensed under a Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA) License.