Iteration

Lecture 13

Dr. Benjamin Soltoff

Cornell University
INFO 5001 - Fall 2023

2023-10-11

Announcements

Announcements

  • Extra credit assignment
  • Project proposals

Atomic vectors

Subsetting vectors with [] and [[]]

x <- c("one", "two", "three", "four", "five")
  • With positive integers
x[c(3, 2, 5)]
## [1] "three" "two"   "five"
  • With negative integers
x[c(-1, -3, -5)]
## [1] "two"  "four"
  • Don’t mix positive and negative
x[c(-1, 1)]
## Error in x[c(-1, 1)]: only 0's may be mixed with negative subscripts

Subset with a logical vector

(x <- c(10, 3, NA, 5, 8, 1, NA))
[1] 10  3 NA  5  8  1 NA
# All non-missing values of x
!is.na(x)
[1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE
x[!is.na(x)]
[1] 10  3  5  8  1
# All even (or missing!) values of x
x[x %% 2 == 0]
[1] 10 NA  8 NA

Lists

Lists

x <- list(1, 2, 3)
x
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

Lists: str()

str(x)
List of 3
 $ : num 1
 $ : num 2
 $ : num 3
x_named <- list(a = 1, b = 2, c = 3)
str(x_named)
List of 3
 $ a: num 1
 $ b: num 2
 $ c: num 3

Store a mix of objects

y <- list("a", 1L, 1.5, TRUE)
str(y)
List of 4
 $ : chr "a"
 $ : int 1
 $ : num 1.5
 $ : logi TRUE

Nested lists

z <- list(list(1, 2), list(3, 4))
str(z)
List of 2
 $ :List of 2
  ..$ : num 1
  ..$ : num 2
 $ :List of 2
  ..$ : num 3
  ..$ : num 4

Secret lists

str(mass_shootings)
tibble [125 Γ— 14] (S3: tbl_df/tbl/data.frame)
 $ case                : chr [1:125] "Oxford High School shooting" "San Jose VTA shooting" "FedEx warehouse shooting" "Orange office complex shooting" ...
 $ year                : num [1:125] 2021 2021 2021 2021 2021 ...
 $ month               : chr [1:125] "Nov" "May" "Apr" "Mar" ...
 $ day                 : int [1:125] 30 26 15 31 22 16 16 26 10 6 ...
 $ location            : chr [1:125] "Oxford, Michigan" "San Jose, California" "Indianapolis, Indiana" "Orange, California" ...
 $ summary             : chr [1:125] "Ethan Crumbley, a 15-year-old student at Oxford High School, opened fire with a Sig Sauer 9mm pistol purchased "| __truncated__ "Samuel Cassidy, 57, a Valley Transportation Authorty employee, opened fire at a union meeting at the light rail"| __truncated__ "Brandon Scott Hole, 19, opened fire around 11 p.m. in the parking lot and inside the warehouse, and then shot h"| __truncated__ "Aminadab Gaxiola Gonzalez, 44, allegedly opened fire inside a small business at an office complex, killing at l"| __truncated__ ...
 $ fatalities          : num [1:125] 4 9 8 4 10 8 4 5 4 3 ...
 $ injured             : num [1:125] 7 0 7 1 0 1 0 0 3 8 ...
 $ total_victims       : num [1:125] 11 9 15 5 10 9 4 5 7 11 ...
 $ location_type       : chr [1:125] "School" "Workplace" "Workplace" "Workplace" ...
 $ male                : logi [1:125] TRUE TRUE TRUE TRUE TRUE TRUE ...
 $ age_of_shooter      : num [1:125] 15 57 19 NA 21 21 31 51 NA NA ...
 $ race                : chr [1:125] NA NA "White" NA ...
 $ prior_mental_illness: chr [1:125] NA "Yes" "Yes" NA ...

Iteration

Iteration

df <- tibble(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
)
median(df$a)
[1] -0.07983455
median(df$b)
[1] 0.3802926
median(df$c)
[1] -0.6769652
median(df$d)
[1] 0.4901909

Iteration three ways

  1. for loops
  2. map_*() functions
  3. across()

Iteration with for loops

Iteration with for loop

output <- vector(mode = "double", length = ncol(df))
for (i in seq_along(df)) {
  output[[i]] <- median(df[[i]])
}
output
[1] -0.07983455  0.38029264 -0.67696525  0.49019094

Output

output <- vector(mode = "double", length = ncol(df))
vector(mode = "double", length = ncol(df))
[1] 0 0 0 0
vector(mode = "logical", length = ncol(df))
[1] FALSE FALSE FALSE FALSE
vector(mode = "character", length = ncol(df))
[1] "" "" "" ""
vector(mode = "list", length = ncol(df))
[[1]]
NULL

[[2]]
NULL

[[3]]
NULL

[[4]]
NULL

Sequence

i in seq_along(df)
seq_along(df)
[1] 1 2 3 4

Body

output[[i]] <- median(df[[i]])

Preallocation

# no preallocation
mpg_no_preall <- tibble()

for(i in 1:100){
  mpg_no_preall <- bind_rows(mpg_no_preall, mpg)
}

# with preallocation using a list
mpg_preall <- vector(mode = "list", length = 100)

for(i in 1:100){
  mpg_preall[[i]] <- mpg
}

mpg_preall <- list_rbind(mpg_preall)
Warning in microbenchmark(`No preallocation` = {: less accurate nanosecond
times to avoid potential integer overflows

Iteration with map_*() functions

Map functions

  • Why for loops are good
  • Why map() functions may be better
  • Types of map() functions
    • map() makes a list
    • map_lgl() makes a logical vector
    • map_int() makes an integer vector
    • map_dbl() makes a double vector
    • map_chr() makes a character vector

Map functions

map_dbl(.x = df, .f = mean)
          a           b           c           d 
 0.07462564  0.20862196 -0.42455887  0.32204455 
map_dbl(.x = df, .f = median)
          a           b           c           d 
-0.07983455  0.38029264 -0.67696525  0.49019094 
map_dbl(.x = df, .f = sd)
        a         b         c         d 
0.9537841 1.0380734 0.9308092 0.5273024 

Map functions

map_dbl(.x = df, .f = \(x) mean(x, na.rm = TRUE))
          a           b           c           d 
 0.07462564  0.20862196 -0.42455887  0.32204455 
df |>
  map_dbl(.f = \(x) mean(x, na.rm = TRUE))
          a           b           c           d 
 0.07462564  0.20862196 -0.42455887  0.32204455 

Application exercise

ae-11

  • Go to the course GitHub org and find your ae-11 (repo name will be suffixed with your GitHub name).
  • Clone the repo in RStudio Workbench, open the R script in the repo, and follow along and complete the exercises.
  • Render, commit, and push your edits by the AE deadline – end of tomorrow

Recap

  • Use [], [[]], and $ notation to extract elements from an atomic vector or list object
  • for loops + map() functions are common methods for iteration in R
  • When using for loops, always preallocate the output vector
  • map() functions are a family of functions that apply a function to each element of a vector or list

Will you get lucky?

A screenshot of the Powerball website showing the jackpot at $1.73 billion.