AE 09: Writing functions
Suggested answers
Packages
We will use the following packages in this application exercise.
- tidyverse: For data import, wrangling, and visualization.
- nycflights13: For data sets.
Vector function: fizzbuzz
Fizz buzz is a children’s game that teaches about division. Players take turns counting incrementally, replacing any number divisible by three with the word “fizz” and any number divisible by five with the word “buzz”.
We will write a vector function that helps the user play fizzbuzz by calculating the correct response for any possible combination of divisors.
Function requirements
The function you write should adhere to the following requirements:
- Three arguments/inputs
-
nums
: A vector of “integers”1 -
div1
: An integer value (default value is 3) -
div2
: An integer value (default value is 5)
-
- Output: A vector of characters
- If the number is divisible by
div1
, return"Fizz"
. - If the number is divisible by
div2
, return"Buzz"
. - If the number is divisible by
div1
anddiv2
, return"FizzBuzz"
. - Otherwise, return the number as a character.
- If the number is divisible by
1 You can interpret this literally as an integer type in R, but here I use “integer” in a mathematical sense. The function should work for any numeric type, including integers, doubles, and complex numbers, as long as the value represents a whole number.
%%
is modular division. It returns the remainder left over after the division, rather than a floating point number.
case_when()
case_when()
is a vectorized version of if_else()
that allows you to perform multiple tests at once. For example,
fizzbuzz <- function(nums, div1 = 3L, div2 = 5L) {
# case_when() makes a lot of sense here
case_when(
# perform most stringent test first, otherwise will get incorrect result
nums %% div1 == 0 & nums %% div2 == 0 ~ "FizzBuzz",
nums %% div1 == 0 ~ "Fizz",
nums %% div2 == 0 ~ "Buzz",
.default = as.character(nums)
)
}
Test the function
Test your function to ensure it produces the correct results.
- Create a sequence of integers between 1 and 30
- Use the function to calculate the fizzbuzz response for each number and the default rules (divisors 3 and 5). Store the output as a vector object.
- Use the function to calculate the fizzbuzz response for each number and the divisors 3 and 4. Store the output as a vector object.
- Repeat your tests, but this time store the results as columns in a data frame along with the original value. Write the operation using
mutate()
.
# test on a vector of numbers
test_nums <- 1:30
# output is a character vector
fizzbuzz(nums = test_nums)
[1] "1" "2" "Fizz" "4" "Buzz" "Fizz"
[7] "7" "8" "Fizz" "Buzz" "11" "Fizz"
[13] "13" "14" "FizzBuzz" "16" "17" "Fizz"
[19] "19" "Buzz" "Fizz" "22" "23" "Fizz"
[25] "Buzz" "26" "Fizz" "28" "29" "FizzBuzz"
fizzbuzz(nums = test_nums, div2 = 4L)
[1] "1" "2" "Fizz" "Buzz" "5" "Fizz"
[7] "7" "Buzz" "Fizz" "10" "11" "FizzBuzz"
[13] "13" "14" "Fizz" "Buzz" "17" "Fizz"
[19] "19" "Buzz" "Fizz" "22" "23" "FizzBuzz"
[25] "25" "26" "Fizz" "Buzz" "29" "Fizz"
# implement function within a data frame using mutate()
tibble(nums = test_nums) |>
mutate(
fizzbuzz_3_5 = fizzbuzz(nums = nums),
fizzbuzz_3_4 = fizzbuzz(nums = nums, div2 = 4L)
)
# A tibble: 30 × 3
nums fizzbuzz_3_5 fizzbuzz_3_4
<int> <chr> <chr>
1 1 1 1
2 2 2 2
3 3 Fizz Fizz
4 4 4 Buzz
5 5 Buzz 5
6 6 Fizz Fizz
7 7 7 7
8 8 8 Buzz
9 9 Fizz Fizz
10 10 Buzz 10
# ℹ 20 more rows
Data frame functions
nycflights13 is an R package that contains several data tables containing information about all flights that departed from NYC (e.g. EWR, JFK and LGA) to destinations in the United States, Puerto Rico, and the American Virgin Islands) in 2013. In total it includes 336,776 flights.
Use the datasets from nycflights13 to write the following functions.
Find all flights that were cancelled (i.e. is.na(arr_time)
) or delayed by more than an hour
filter_severe <- function(df = NULL) {
df |>
filter(is.na(arr_time) | arr_delay > 60)
}
flights |> filter_severe()
# A tibble: 36,502 × 19
year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
<int> <int> <int> <int> <int> <dbl> <int> <int>
1 2013 1 1 811 630 101 1047 830
2 2013 1 1 848 1835 853 1001 1950
3 2013 1 1 957 733 144 1056 853
4 2013 1 1 1114 900 134 1447 1222
5 2013 1 1 1120 944 96 1331 1213
6 2013 1 1 1255 1200 55 1451 1330
7 2013 1 1 1301 1150 71 1518 1345
8 2013 1 1 1337 1220 77 1649 1531
9 2013 1 1 1342 1320 22 1617 1504
10 2013 1 1 1400 1250 70 1645 1502
# ℹ 36,492 more rows
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
# tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
# hour <dbl>, minute <dbl>, time_hour <dttm>
Count the number of cancelled flights and the number of flights delayed by more than an hour for each destination
summarize_severe <- function(df = NULL) {
df |>
summarize(
n_cancelled = sum(is.na(arr_time)),
n_delayed = sum(arr_delay > 60, na.rm = TRUE)
)
}
flights |> group_by(dest) |> summarize_severe()
# A tibble: 105 × 3
dest n_cancelled n_delayed
<chr> <int> <int>
1 ABQ 0 25
2 ACK 0 12
3 ALB 21 59
4 ANC 0 0
5 ATL 342 1433
6 AUS 22 219
7 AVL 12 21
8 BDL 31 41
9 BGR 17 44
10 BHM 28 46
# ℹ 95 more rows
Find all flights that were cancelled or delayed by more than a user supplied number of hours
filter_severe <- function(df = NULL, hours = 1) {
df |>
filter(is.na(arr_time) | arr_delay > hours * 60)
}
flights |> filter_severe(hours = 2)
# A tibble: 18,747 × 19
year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
<int> <int> <int> <int> <int> <dbl> <int> <int>
1 2013 1 1 811 630 101 1047 830
2 2013 1 1 848 1835 853 1001 1950
3 2013 1 1 957 733 144 1056 853
4 2013 1 1 1114 900 134 1447 1222
5 2013 1 1 1505 1310 115 1638 1431
6 2013 1 1 1525 1340 105 1831 1626
7 2013 1 1 1549 1445 64 1912 1656
8 2013 1 1 1558 1359 119 1718 1515
9 2013 1 1 1732 1630 62 2028 1825
10 2013 1 1 1803 1620 103 2008 1750
# ℹ 18,737 more rows
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
# tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
# hour <dbl>, minute <dbl>, time_hour <dttm>
Summarize the weather to compute the minimum, mean, and maximum, of a user supplied variable
Convert the user supplied variable that uses clock time (e.g., dep_time
, arr_time
, etc.) into a decimal time (i.e. hours + (minutes / 60))
%/%
is integer division. It returns the quotient of the division, rather than a floating point number.
standardize_time <- function(df = NULL, var = NULL) {
df |>
mutate(
{{ var }} := {{ var }} %/% 100 + {{ var }} %% 100 / 60
)
}
flights |> standardize_time(sched_dep_time)
# A tibble: 336,776 × 19
year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
<int> <int> <int> <int> <dbl> <dbl> <int> <int>
1 2013 1 1 517 5.25 2 830 819
2 2013 1 1 533 5.48 4 850 830
3 2013 1 1 542 5.67 2 923 850
4 2013 1 1 544 5.75 -1 1004 1022
5 2013 1 1 554 6 -6 812 837
6 2013 1 1 554 5.97 -4 740 728
7 2013 1 1 555 6 -5 913 854
8 2013 1 1 557 6 -3 709 723
9 2013 1 1 557 6 -3 838 846
10 2013 1 1 558 6 -2 753 745
# ℹ 336,766 more rows
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
# tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
# hour <dbl>, minute <dbl>, time_hour <dttm>
Acknowledgments
- Data frame function exercises are drawn from R for Data Science
sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.4.1 (2024-06-14)
os macOS Sonoma 14.6.1
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2024-10-08
pandoc 3.4 @ /usr/local/bin/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
! package * version date (UTC) lib source
cli 3.6.3 2024-06-21 [1] RSPM (R 4.4.0)
P colorspace 2.1-0 2023-01-23 [?] CRAN (R 4.3.0)
P digest 0.6.35 2024-03-11 [?] CRAN (R 4.3.1)
P dplyr * 1.1.4 2023-11-17 [?] CRAN (R 4.3.1)
P evaluate 0.24.0 2024-06-10 [?] CRAN (R 4.4.0)
P fansi 1.0.6 2023-12-08 [?] CRAN (R 4.3.1)
P fastmap 1.2.0 2024-05-15 [?] CRAN (R 4.4.0)
P forcats * 1.0.0 2023-01-29 [?] CRAN (R 4.3.0)
P generics 0.1.3 2022-07-05 [?] CRAN (R 4.3.0)
P ggplot2 * 3.5.1 2024-04-23 [?] CRAN (R 4.3.1)
P glue 1.7.0 2024-01-09 [?] CRAN (R 4.3.1)
P gtable 0.3.5 2024-04-22 [?] CRAN (R 4.3.1)
P here 1.0.1 2020-12-13 [?] CRAN (R 4.3.0)
P hms 1.1.3 2023-03-21 [?] CRAN (R 4.3.0)
P htmltools 0.5.8.1 2024-04-04 [?] CRAN (R 4.3.1)
P htmlwidgets 1.6.4 2023-12-06 [?] CRAN (R 4.3.1)
P jsonlite 1.8.8 2023-12-04 [?] CRAN (R 4.3.1)
P knitr 1.47 2024-05-29 [?] CRAN (R 4.4.0)
P lifecycle 1.0.4 2023-11-07 [?] CRAN (R 4.3.1)
P lubridate * 1.9.3 2023-09-27 [?] CRAN (R 4.3.1)
P magrittr 2.0.3 2022-03-30 [?] CRAN (R 4.3.0)
P munsell 0.5.1 2024-04-01 [?] CRAN (R 4.3.1)
P nycflights13 * 1.0.2 2021-04-12 [?] CRAN (R 4.3.0)
P pillar 1.9.0 2023-03-22 [?] CRAN (R 4.3.0)
P pkgconfig 2.0.3 2019-09-22 [?] CRAN (R 4.3.0)
P purrr * 1.0.2 2023-08-10 [?] CRAN (R 4.3.0)
P R6 2.5.1 2021-08-19 [?] CRAN (R 4.3.0)
P readr * 2.1.5 2024-01-10 [?] CRAN (R 4.3.1)
renv 1.0.7 2024-04-11 [1] CRAN (R 4.4.0)
P rlang 1.1.4 2024-06-04 [?] CRAN (R 4.3.3)
P rmarkdown 2.27 2024-05-17 [?] CRAN (R 4.4.0)
P rprojroot 2.0.4 2023-11-05 [?] CRAN (R 4.3.1)
P scales 1.3.0.9000 2024-05-07 [?] Github (r-lib/scales@c0f79d3)
P sessioninfo 1.2.2 2021-12-06 [?] CRAN (R 4.3.0)
P stringi 1.8.4 2024-05-06 [?] CRAN (R 4.3.1)
P stringr * 1.5.1 2023-11-14 [?] CRAN (R 4.3.1)
P tibble * 3.2.1 2023-03-20 [?] CRAN (R 4.3.0)
P tidyr * 1.3.1 2024-01-24 [?] CRAN (R 4.3.1)
P tidyselect 1.2.1 2024-03-11 [?] CRAN (R 4.3.1)
P tidyverse * 2.0.0 2023-02-22 [?] CRAN (R 4.3.0)
P timechange 0.3.0 2024-01-18 [?] CRAN (R 4.3.1)
P tzdb 0.4.0 2023-05-12 [?] CRAN (R 4.3.0)
P utf8 1.2.4 2023-10-22 [?] CRAN (R 4.3.1)
P vctrs 0.6.5 2023-12-01 [?] CRAN (R 4.3.1)
withr 3.0.1 2024-07-31 [1] RSPM (R 4.4.0)
P xfun 0.45 2024-06-16 [?] CRAN (R 4.4.0)
P yaml 2.3.8 2023-12-11 [?] CRAN (R 4.3.1)
[1] /Users/soltoffbc/Projects/info-5001/course-site/renv/library/macos/R-4.4/aarch64-apple-darwin20
[2] /Users/soltoffbc/Library/Caches/org.R-project.R/R/renv/sandbox/macos/R-4.4/aarch64-apple-darwin20/f7156815
P ── Loaded and on-disk path mismatch.
──────────────────────────────────────────────────────────────────────────────