AE 09: Writing functions

Suggested answers

Application exercise
Answers
Modified

October 2, 2024

Packages

We will use the following packages in this application exercise.

  • tidyverse: For data import, wrangling, and visualization.
  • nycflights13: For data sets.

Vector function: fizzbuzz

Fizz buzz is a children’s game that teaches about division. Players take turns counting incrementally, replacing any number divisible by three with the word “fizz” and any number divisible by five with the word “buzz”.

We will write a vector function that helps the user play fizzbuzz by calculating the correct response for any possible combination of divisors.

Function requirements

The function you write should adhere to the following requirements:

  • Three arguments/inputs
    • nums: A vector of “integers”1
    • div1: An integer value (default value is 3)
    • div2: An integer value (default value is 5)
  • Output: A vector of characters
    • If the number is divisible by div1, return "Fizz".
    • If the number is divisible by div2, return "Buzz".
    • If the number is divisible by div1 and div2, return "FizzBuzz".
    • Otherwise, return the number as a character.

1 You can interpret this literally as an integer type in R, but here I use “integer” in a mathematical sense. The function should work for any numeric type, including integers, doubles, and complex numbers, as long as the value represents a whole number.

Tip

We have not yet covered explicit iterative operations. There is no need to use a for loop, apply() or map() functions, or any other explicit iteration. Instead, use your existing knowledge of vectorized operations and functions to write the function.

A helpful hint about modular division

%% is modular division. It returns the remainder left over after the division, rather than a floating point number.

5 / 3
[1] 1.666667
5 %% 3
[1] 2
A helpful hint about case_when()

case_when() is a vectorized version of if_else() that allows you to perform multiple tests at once. For example,

x <- c("apple", "orange", "cherry", "onion", "broccoli", "cucumber")

# convert to fruit or vegetable
case_when(
  x %in% c("apple", "orange", "cherry") ~ "fruit",
  x %in% c("onion", "broccoli", "cucumber") ~ "vegetable"
)
[1] "fruit"     "fruit"     "fruit"     "vegetable" "vegetable" "vegetable"
fizzbuzz <- function(nums, div1 = 3L, div2 = 5L) {
  # case_when() makes a lot of sense here
  case_when(
    # perform most stringent test first, otherwise will get incorrect result
    nums %% div1 == 0 & nums %% div2 == 0 ~ "FizzBuzz",
    nums %% div1 == 0 ~ "Fizz",
    nums %% div2 == 0 ~ "Buzz",
    .default = as.character(nums)
  )
}

Test the function

Test your function to ensure it produces the correct results.

  • Create a sequence of integers between 1 and 30
  • Use the function to calculate the fizzbuzz response for each number and the default rules (divisors 3 and 5). Store the output as a vector object.
  • Use the function to calculate the fizzbuzz response for each number and the divisors 3 and 4. Store the output as a vector object.
  • Repeat your tests, but this time store the results as columns in a data frame along with the original value. Write the operation using mutate().
# test on a vector of numbers
test_nums <- 1:30

# output is a character vector
fizzbuzz(nums = test_nums)
 [1] "1"        "2"        "Fizz"     "4"        "Buzz"     "Fizz"    
 [7] "7"        "8"        "Fizz"     "Buzz"     "11"       "Fizz"    
[13] "13"       "14"       "FizzBuzz" "16"       "17"       "Fizz"    
[19] "19"       "Buzz"     "Fizz"     "22"       "23"       "Fizz"    
[25] "Buzz"     "26"       "Fizz"     "28"       "29"       "FizzBuzz"
fizzbuzz(nums = test_nums, div2 = 4L)
 [1] "1"        "2"        "Fizz"     "Buzz"     "5"        "Fizz"    
 [7] "7"        "Buzz"     "Fizz"     "10"       "11"       "FizzBuzz"
[13] "13"       "14"       "Fizz"     "Buzz"     "17"       "Fizz"    
[19] "19"       "Buzz"     "Fizz"     "22"       "23"       "FizzBuzz"
[25] "25"       "26"       "Fizz"     "Buzz"     "29"       "Fizz"    
# implement function within a data frame using mutate()
tibble(nums = test_nums) |>
  mutate(
    fizzbuzz_3_5 = fizzbuzz(nums = nums),
    fizzbuzz_3_4 = fizzbuzz(nums = nums, div2 = 4L)
  )
# A tibble: 30 × 3
    nums fizzbuzz_3_5 fizzbuzz_3_4
   <int> <chr>        <chr>       
 1     1 1            1           
 2     2 2            2           
 3     3 Fizz         Fizz        
 4     4 4            Buzz        
 5     5 Buzz         5           
 6     6 Fizz         Fizz        
 7     7 7            7           
 8     8 8            Buzz        
 9     9 Fizz         Fizz        
10    10 Buzz         10          
# ℹ 20 more rows

Data frame functions

nycflights13 is an R package that contains several data tables containing information about all flights that departed from NYC (e.g. EWR, JFK and LGA) to destinations in the United States, Puerto Rico, and the American Virgin Islands) in 2013. In total it includes 336,776 flights.

Use the datasets from nycflights13 to write the following functions.

Find all flights that were cancelled (i.e. is.na(arr_time)) or delayed by more than an hour

filter_severe <- function(df = NULL) {
  df |>
    filter(is.na(arr_time) | arr_delay > 60)
}

flights |> filter_severe()
# A tibble: 36,502 × 19
    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
 1  2013     1     1      811            630       101     1047            830
 2  2013     1     1      848           1835       853     1001           1950
 3  2013     1     1      957            733       144     1056            853
 4  2013     1     1     1114            900       134     1447           1222
 5  2013     1     1     1120            944        96     1331           1213
 6  2013     1     1     1255           1200        55     1451           1330
 7  2013     1     1     1301           1150        71     1518           1345
 8  2013     1     1     1337           1220        77     1649           1531
 9  2013     1     1     1342           1320        22     1617           1504
10  2013     1     1     1400           1250        70     1645           1502
# ℹ 36,492 more rows
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#   hour <dbl>, minute <dbl>, time_hour <dttm>

Count the number of cancelled flights and the number of flights delayed by more than an hour for each destination

summarize_severe <- function(df = NULL) {
  df |>
    summarize(
      n_cancelled = sum(is.na(arr_time)),
      n_delayed = sum(arr_delay > 60, na.rm = TRUE)
    )
}

flights |> group_by(dest) |> summarize_severe()
# A tibble: 105 × 3
   dest  n_cancelled n_delayed
   <chr>       <int>     <int>
 1 ABQ             0        25
 2 ACK             0        12
 3 ALB            21        59
 4 ANC             0         0
 5 ATL           342      1433
 6 AUS            22       219
 7 AVL            12        21
 8 BDL            31        41
 9 BGR            17        44
10 BHM            28        46
# ℹ 95 more rows

Find all flights that were cancelled or delayed by more than a user supplied number of hours

filter_severe <- function(df = NULL, hours = 1) {
  df |>
    filter(is.na(arr_time) | arr_delay > hours * 60)
}

flights |> filter_severe(hours = 2)
# A tibble: 18,747 × 19
    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
 1  2013     1     1      811            630       101     1047            830
 2  2013     1     1      848           1835       853     1001           1950
 3  2013     1     1      957            733       144     1056            853
 4  2013     1     1     1114            900       134     1447           1222
 5  2013     1     1     1505           1310       115     1638           1431
 6  2013     1     1     1525           1340       105     1831           1626
 7  2013     1     1     1549           1445        64     1912           1656
 8  2013     1     1     1558           1359       119     1718           1515
 9  2013     1     1     1732           1630        62     2028           1825
10  2013     1     1     1803           1620       103     2008           1750
# ℹ 18,737 more rows
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#   hour <dbl>, minute <dbl>, time_hour <dttm>

Summarize the weather to compute the minimum, mean, and maximum, of a user supplied variable

summarize_weather <- function(df = NULL, var = NULL) {
  df |>
    summarize(
      min = min({{ var }}, na.rm = TRUE),
      mean = mean({{ var }}, na.rm = TRUE),
      max = max({{ var }}, na.rm = TRUE)
    )
}

weather |> summarize_weather(temp)
# A tibble: 1 × 3
    min  mean   max
  <dbl> <dbl> <dbl>
1  10.9  55.3  100.

Convert the user supplied variable that uses clock time (e.g., dep_time, arr_time, etc.) into a decimal time (i.e. hours + (minutes / 60))

Finding the quotient

%/% is integer division. It returns the quotient of the division, rather than a floating point number.

5 / 3
[1] 1.666667
5 %/% 3
[1] 1
standardize_time <- function(df = NULL, var = NULL) {
  df |>
    mutate(
      {{ var }} := {{ var }} %/% 100 + {{ var }} %% 100 / 60
    )
}

flights |> standardize_time(sched_dep_time)
# A tibble: 336,776 × 19
    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
   <int> <int> <int>    <int>          <dbl>     <dbl>    <int>          <int>
 1  2013     1     1      517           5.25         2      830            819
 2  2013     1     1      533           5.48         4      850            830
 3  2013     1     1      542           5.67         2      923            850
 4  2013     1     1      544           5.75        -1     1004           1022
 5  2013     1     1      554           6           -6      812            837
 6  2013     1     1      554           5.97        -4      740            728
 7  2013     1     1      555           6           -5      913            854
 8  2013     1     1      557           6           -3      709            723
 9  2013     1     1      557           6           -3      838            846
10  2013     1     1      558           6           -2      753            745
# ℹ 336,766 more rows
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#   hour <dbl>, minute <dbl>, time_hour <dttm>

Acknowledgments

sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.4.1 (2024-06-14)
 os       macOS Sonoma 14.6.1
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2024-10-08
 pandoc   3.4 @ /usr/local/bin/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 ! package      * version    date (UTC) lib source
   cli            3.6.3      2024-06-21 [1] RSPM (R 4.4.0)
 P colorspace     2.1-0      2023-01-23 [?] CRAN (R 4.3.0)
 P digest         0.6.35     2024-03-11 [?] CRAN (R 4.3.1)
 P dplyr        * 1.1.4      2023-11-17 [?] CRAN (R 4.3.1)
 P evaluate       0.24.0     2024-06-10 [?] CRAN (R 4.4.0)
 P fansi          1.0.6      2023-12-08 [?] CRAN (R 4.3.1)
 P fastmap        1.2.0      2024-05-15 [?] CRAN (R 4.4.0)
 P forcats      * 1.0.0      2023-01-29 [?] CRAN (R 4.3.0)
 P generics       0.1.3      2022-07-05 [?] CRAN (R 4.3.0)
 P ggplot2      * 3.5.1      2024-04-23 [?] CRAN (R 4.3.1)
 P glue           1.7.0      2024-01-09 [?] CRAN (R 4.3.1)
 P gtable         0.3.5      2024-04-22 [?] CRAN (R 4.3.1)
 P here           1.0.1      2020-12-13 [?] CRAN (R 4.3.0)
 P hms            1.1.3      2023-03-21 [?] CRAN (R 4.3.0)
 P htmltools      0.5.8.1    2024-04-04 [?] CRAN (R 4.3.1)
 P htmlwidgets    1.6.4      2023-12-06 [?] CRAN (R 4.3.1)
 P jsonlite       1.8.8      2023-12-04 [?] CRAN (R 4.3.1)
 P knitr          1.47       2024-05-29 [?] CRAN (R 4.4.0)
 P lifecycle      1.0.4      2023-11-07 [?] CRAN (R 4.3.1)
 P lubridate    * 1.9.3      2023-09-27 [?] CRAN (R 4.3.1)
 P magrittr       2.0.3      2022-03-30 [?] CRAN (R 4.3.0)
 P munsell        0.5.1      2024-04-01 [?] CRAN (R 4.3.1)
 P nycflights13 * 1.0.2      2021-04-12 [?] CRAN (R 4.3.0)
 P pillar         1.9.0      2023-03-22 [?] CRAN (R 4.3.0)
 P pkgconfig      2.0.3      2019-09-22 [?] CRAN (R 4.3.0)
 P purrr        * 1.0.2      2023-08-10 [?] CRAN (R 4.3.0)
 P R6             2.5.1      2021-08-19 [?] CRAN (R 4.3.0)
 P readr        * 2.1.5      2024-01-10 [?] CRAN (R 4.3.1)
   renv           1.0.7      2024-04-11 [1] CRAN (R 4.4.0)
 P rlang          1.1.4      2024-06-04 [?] CRAN (R 4.3.3)
 P rmarkdown      2.27       2024-05-17 [?] CRAN (R 4.4.0)
 P rprojroot      2.0.4      2023-11-05 [?] CRAN (R 4.3.1)
 P scales         1.3.0.9000 2024-05-07 [?] Github (r-lib/scales@c0f79d3)
 P sessioninfo    1.2.2      2021-12-06 [?] CRAN (R 4.3.0)
 P stringi        1.8.4      2024-05-06 [?] CRAN (R 4.3.1)
 P stringr      * 1.5.1      2023-11-14 [?] CRAN (R 4.3.1)
 P tibble       * 3.2.1      2023-03-20 [?] CRAN (R 4.3.0)
 P tidyr        * 1.3.1      2024-01-24 [?] CRAN (R 4.3.1)
 P tidyselect     1.2.1      2024-03-11 [?] CRAN (R 4.3.1)
 P tidyverse    * 2.0.0      2023-02-22 [?] CRAN (R 4.3.0)
 P timechange     0.3.0      2024-01-18 [?] CRAN (R 4.3.1)
 P tzdb           0.4.0      2023-05-12 [?] CRAN (R 4.3.0)
 P utf8           1.2.4      2023-10-22 [?] CRAN (R 4.3.1)
 P vctrs          0.6.5      2023-12-01 [?] CRAN (R 4.3.1)
   withr          3.0.1      2024-07-31 [1] RSPM (R 4.4.0)
 P xfun           0.45       2024-06-16 [?] CRAN (R 4.4.0)
 P yaml           2.3.8      2023-12-11 [?] CRAN (R 4.3.1)

 [1] /Users/soltoffbc/Projects/info-5001/course-site/renv/library/macos/R-4.4/aarch64-apple-darwin20
 [2] /Users/soltoffbc/Library/Caches/org.R-project.R/R/renv/sandbox/macos/R-4.4/aarch64-apple-darwin20/f7156815

 P ── Loaded and on-disk path mismatch.

──────────────────────────────────────────────────────────────────────────────