AE 09: Writing functions

Suggested answers

Application exercise
Answers
Modified

September 24, 2025

Packages

We will use the following packages in this application exercise.

  • {tidyverse}: For data import, wrangling, and visualization.
library(tidyverse)

# create synthetic dataset for vector function exercises
set.seed(123)

vals <- tibble(
  # generate 10,000 observations drawn from an exponential distribution
  # with rate of 10
  x = rexp(10000, 10)
)

Write a vector function

Your turn: Write a function that performs the Box-Cox power transformation using the value of (non-zero) lambda (\(\lambda\)) supplied.

\[ bc = \frac{x^{\lambda} - 1}{\lambda} \text{ for }\lambda \ne 0 \]

Set the default \(\lambda = 1\).

# write function
to_box_cox <- function(x, lambda = 1) {
  (x^lambda - 1) / lambda
}

# test on data values
vals |>
  mutate(x_bc = to_box_cox(x, lambda = 0.3))
# A tibble: 10,000 × 2
         x  x_bc
     <dbl> <dbl>
 1 0.0843  -1.75
 2 0.0577  -1.92
 3 0.133   -1.51
 4 0.00316 -2.74
 5 0.00562 -2.63
 6 0.0317  -2.15
 7 0.0314  -2.15
 8 0.0145  -2.40
 9 0.273   -1.08
10 0.00292 -2.75
# ℹ 9,990 more rows
vals |>
  mutate(x_bc = to_box_cox(x, lambda = 0.3)) |>
  ggplot(mapping = aes(x = x_bc)) +
  geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Your turn: Revise your function to check if \(\lambda \ne 0\). If \(\lambda = 0\), generate an error with an informative message.

TipGenerating conditions in R

Conditions in R are raised by three distinct functions:

  • stop() - Stops execution of the current expression and executes an error action.
  • warning() - Generates a warning message that corresponds to its argument(s) and (optionally) the expression or function from which it was called.
  • message() - Generate a diagnostic message from its arguments.
to_box_cox <- function(x, lambda = 1) {
  if (lambda == 0) {
    stop("Lambda set to 0. Re-run with a non-zero value for lambda.")
  }

  (x^lambda - 1) / lambda
}

vals |>
  mutate(x_bc = to_box_cox(x, lambda = 0))
Error in `mutate()`:
ℹ In argument: `x_bc = to_box_cox(x, lambda = 0)`.
Caused by error in `to_box_cox()`:
! Lambda set to 0. Re-run with a non-zero value for lambda.

Demonstration: Revise your function for:

\[ bc = \begin{cases} \frac{x^{\lambda} - 1}{\lambda} & \text{for }\lambda \ne 0\\ \ln(x) & \text{for }\lambda = 0 \end{cases} \]

to_box_cox <- function(x, lambda = 1) {
  if (lambda == 0) {
    return(log(x))
  } else {
    return((x^lambda - 1) / lambda)
  }
}

vals |>
  mutate(x_bc = to_box_cox(x, lambda = 0))
# A tibble: 10,000 × 2
         x  x_bc
     <dbl> <dbl>
 1 0.0843  -2.47
 2 0.0577  -2.85
 3 0.133   -2.02
 4 0.00316 -5.76
 5 0.00562 -5.18
 6 0.0317  -3.45
 7 0.0314  -3.46
 8 0.0145  -4.23
 9 0.273   -1.30
10 0.00292 -5.84
# ℹ 9,990 more rows

Write a data frame function

Your turn: Write a function to calculate the median, maximum and minimum values of a variable grouped by another variable. Test it using the penguins data set.

# basic summary function
my_summary <- function(df, summary_var, group_var) {
  df |>
    group_by({{ group_var }}) |>
    summarize(
      median = median({{ summary_var }}, na.rm = TRUE),
      minimum = min({{ summary_var }}, na.rm = TRUE),
      maximum = max({{ summary_var }}, na.rm = TRUE),
      .groups = "drop"
    )
}

my_summary(df = penguins, summary_var = bill_len, group_var = species)
# A tibble: 3 × 4
  species   median minimum maximum
  <fct>      <dbl>   <dbl>   <dbl>
1 Adelie      38.8    32.1    46  
2 Chinstrap   49.6    40.9    58  
3 Gentoo      47.3    40.9    59.6
# default NULL for the grouping variable
my_summary <- function(df, summary_var, group_var = NULL) {
  df |>
    group_by({{ group_var }}) |>
    summarize(
      median = median({{ summary_var }}, na.rm = TRUE),
      minimum = min({{ summary_var }}, na.rm = TRUE),
      maximum = max({{ summary_var }}, na.rm = TRUE),
      .groups = "drop"
    )
}

my_summary(df = penguins, summary_var = bill_len)
# A tibble: 1 × 3
  median minimum maximum
   <dbl>   <dbl>   <dbl>
1   44.4    32.1    59.6
my_summary(df = penguins, summary_var = bill_len, group_var = species)
# A tibble: 3 × 4
  species   median minimum maximum
  <fct>      <dbl>   <dbl>   <dbl>
1 Adelie      38.8    32.1    46  
2 Chinstrap   49.6    40.9    58  
3 Gentoo      47.3    40.9    59.6
# use pick() to allow for multiple grouping variables
my_summary <- function(df, summary_var, group_var = NULL) {
  df |>
    group_by(pick({{ group_var }})) |>
    summarize(
      median = median({{ summary_var }}, na.rm = TRUE),
      minimum = min({{ summary_var }}, na.rm = TRUE),
      maximum = max({{ summary_var }}, na.rm = TRUE),
      .groups = "drop"
    )
}

my_summary(penguins, bill_len, c(species, island))
# A tibble: 5 × 5
  species   island    median minimum maximum
  <fct>     <fct>      <dbl>   <dbl>   <dbl>
1 Adelie    Biscoe      38.7    34.5    45.6
2 Adelie    Dream       38.6    32.1    44.1
3 Adelie    Torgersen   38.9    33.5    46  
4 Chinstrap Dream       49.6    40.9    58  
5 Gentoo    Biscoe      47.3    40.9    59.6

Acknowledgments

sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.5.1 (2025-06-13)
 os       macOS Tahoe 26.0.1
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  C.UTF-8
 ctype    C.UTF-8
 tz       America/New_York
 date     2025-10-01
 pandoc   3.6.3 @ /Applications/Positron.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
 quarto   1.8.24 @ /Applications/quarto/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 ! package      * version date (UTC) lib source
 P cli            3.6.5   2025-04-23 [?] RSPM (R 4.5.0)
 P digest         0.6.37  2024-08-19 [?] RSPM (R 4.5.0)
 P dplyr        * 1.1.4   2023-11-17 [?] RSPM (R 4.5.0)
 P evaluate       1.0.4   2025-06-18 [?] RSPM (R 4.5.1)
 P farver         2.1.2   2024-05-13 [?] RSPM (R 4.5.0)
 P fastmap        1.2.0   2024-05-15 [?] RSPM (R 4.5.0)
 P forcats      * 1.0.0   2023-01-29 [?] RSPM (R 4.5.0)
 P generics       0.1.4   2025-05-09 [?] RSPM (R 4.5.0)
 P ggplot2      * 3.5.2   2025-04-09 [?] RSPM (R 4.5.0)
 P glue           1.8.0   2024-09-30 [?] RSPM (R 4.5.0)
 P gtable         0.3.6   2024-10-25 [?] RSPM (R 4.5.0)
 P here           1.0.1   2020-12-13 [?] RSPM (R 4.5.0)
 P hms            1.1.3   2023-03-21 [?] RSPM (R 4.5.0)
 P htmltools      0.5.8.1 2024-04-04 [?] RSPM (R 4.5.0)
 P htmlwidgets    1.6.4   2023-12-06 [?] RSPM (R 4.5.0)
 P jsonlite       2.0.0   2025-03-27 [?] RSPM (R 4.5.0)
 P knitr          1.50    2025-03-16 [?] RSPM (R 4.5.0)
 P labeling       0.4.3   2023-08-29 [?] RSPM (R 4.5.0)
 P lifecycle      1.0.4   2023-11-07 [?] RSPM (R 4.5.0)
 P lubridate    * 1.9.4   2024-12-08 [?] RSPM (R 4.5.0)
 P magrittr       2.0.3   2022-03-30 [?] RSPM (R 4.5.1)
 P pillar         1.11.0  2025-07-04 [?] RSPM (R 4.5.1)
 P pkgconfig      2.0.3   2019-09-22 [?] RSPM (R 4.5.0)
 P purrr        * 1.1.0   2025-07-10 [?] RSPM (R 4.5.0)
 P R6             2.6.1   2025-02-15 [?] RSPM (R 4.5.0)
 P RColorBrewer   1.1-3   2022-04-03 [?] RSPM (R 4.5.0)
 P readr        * 2.1.5   2024-01-10 [?] RSPM (R 4.5.0)
 P renv           1.1.5   2025-07-24 [?] RSPM
 P rlang          1.1.6   2025-04-11 [?] RSPM (R 4.5.0)
 P rmarkdown      2.29    2024-11-04 [?] RSPM
 P rprojroot      2.1.0   2025-07-12 [?] RSPM (R 4.5.0)
 P scales         1.4.0   2025-04-24 [?] RSPM (R 4.5.0)
 P sessioninfo    1.2.3   2025-02-05 [?] RSPM (R 4.5.0)
 P stringi        1.8.7   2025-03-27 [?] RSPM (R 4.5.0)
 P stringr      * 1.5.1   2023-11-14 [?] RSPM (R 4.5.1)
 P tibble       * 3.3.0   2025-06-08 [?] RSPM (R 4.5.0)
 P tidyr        * 1.3.1   2024-01-24 [?] RSPM (R 4.5.0)
 P tidyselect     1.2.1   2024-03-11 [?] RSPM (R 4.5.0)
 P tidyverse    * 2.0.0   2023-02-22 [?] RSPM (R 4.5.0)
 P timechange     0.3.0   2024-01-18 [?] RSPM (R 4.5.0)
 P tzdb           0.5.0   2025-03-15 [?] RSPM (R 4.5.0)
 P utf8           1.2.6   2025-06-08 [?] RSPM (R 4.5.0)
 P vctrs          0.6.5   2023-12-01 [?] RSPM (R 4.5.0)
 P withr          3.0.2   2024-10-28 [?] RSPM (R 4.5.0)
 P xfun           0.52    2025-04-02 [?] RSPM (R 4.5.1)
 P yaml           2.3.10  2024-07-26 [?] RSPM (R 4.5.0)

 [1] /Users/bcs88/Projects/info-5001/course-site/renv/library/macos/R-4.5/aarch64-apple-darwin20
 [2] /Users/bcs88/Library/Caches/org.R-project.R/R/renv/sandbox/macos/R-4.5/aarch64-apple-darwin20/4cd76b74

 * ── Packages attached to the search path.
 P ── Loaded and on-disk path mismatch.

──────────────────────────────────────────────────────────────────────────────