Rectangling data

Lecture 14

Dr. Benjamin Soltoff

Cornell University
INFO 5001 - Fall 2025

October 16, 2025

Announcements

Project proposal feedback coming soon
Project EDA due October 30th
Homework 05 this week
Quiz 02 next week

Learning objectives

Define the JSON data structure
Distinguish hierarchical data structures from tabular data objects
Identify techniques for rectangling hierarchical structures into data frames
Practice tidying messy JSON data objects using {tidyr}

Lists and list-columns

Lists

x1 <- list(1:4, "a", TRUE)
x1

[[1]]
[1] 1 2 3 4

[[2]]
[1] "a"

[[3]]
[1] TRUE

Named lists

x2 <- list(a = 1:2, b = 1:3, c = 1:4)
x2

$a
[1] 1 2

$b
[1] 1 2 3

$c
[1] 1 2 3 4

Structure of lists

str(x1)

List of 3
 $ : int [1:4] 1 2 3 4
 $ : chr "a"
 $ : logi TRUE

str(x2)

List of 3
 $ a: int [1:2] 1 2
 $ b: int [1:3] 1 2 3
 $ c: int [1:4] 1 2 3 4

Nested lists

x3 <- list(list(1, 2), list(3, 4))
str(x3)

List of 2
 $ :List of 2
  ..$ : num 1
  ..$ : num 2
 $ :List of 2
  ..$ : num 3
  ..$ : num 4

List-columns

df <- tibble(
  x = 1:2,
  y = c("a", "b"),
  z = list(list(1, 2), list(3, 4, 5))
)
df

# A tibble: 2 × 3
      x y     z         
  <int> <chr> <list>    
1     1 a     <list [2]>
2     2 b     <list [3]>

What’s up with the Sharknado results?

Sharknado results from OMDB API

# A tibble: 10 × 25
   Title     Year  Rated Released Runtime Genre Director Writer Actors Plot  Language Country Awards
   <chr>     <chr> <chr> <chr>    <chr>   <chr> <chr>    <chr>  <chr>  <chr> <chr>    <chr>   <chr> 
 1 Sharknado 2013  Not … 11 Jul … 86 min  Acti… Anthony… Thund… Ian Z… When… English  United… 1 win…
 2 Sharknado 2013  Not … 11 Jul … 86 min  Acti… Anthony… Thund… Ian Z… When… English  United… 1 win…
 3 Sharknad… 2014  TV-14 30 Jul … 95 min  Acti… Anthony… Thund… Ian Z… Fin … English  United… N/A   
 4 Sharknad… 2014  TV-14 30 Jul … 95 min  Acti… Anthony… Thund… Ian Z… Fin … English  United… N/A   
 5 Sharknad… 2015  TV-14 22 Jul … 93 min  Acti… Anthony… Thund… Ian Z… A mo… English  United… 2 nom…
 6 Sharknad… 2015  TV-14 22 Jul … 93 min  Acti… Anthony… Thund… Ian Z… A mo… English  United… 2 nom…
 7 Sharknad… 2016  TV-14 31 Jul … 95 min  Acti… Anthony… Thund… Ian Z… Fin,… English  United… 1 nom…
 8 Sharknad… 2016  TV-14 31 Jul … 95 min  Acti… Anthony… Thund… Ian Z… Fin,… English  United… 1 nom…
 9 Sharknad… 2017  TV-14 06 Aug … 93 min  Acti… Anthony… Thund… Ian Z… With… English  United… 1 nom…
10 Sharknad… 2017  TV-14 06 Aug … 93 min  Acti… Anthony… Thund… Ian Z… With… English  United… 1 nom…
# ℹ 12 more variables: Poster <chr>, Ratings <list>, Metascore <chr>, imdbRating <chr>,
#   imdbVotes <chr>, imdbID <chr>, Type <chr>, DVD <chr>, BoxOffice <chr>, Production <chr>,
#   Website <chr>, Response <chr>

Sharknado

List of 25
 $ Title     : chr "Sharknado"
 $ Year      : chr "2013"
 $ Rated     : chr "Not Rated"
 $ Released  : chr "11 Jul 2013"
 $ Runtime   : chr "86 min"
 $ Genre     : chr "Action, Adventure, Comedy"
 $ Director  : chr "Anthony C. Ferrante"
 $ Writer    : chr "Thunder Levin"
 $ Actors    : chr "Ian Ziering, Tara Reid, John Heard"
 $ Plot      : chr "When a freak hurricane swamps Los Angeles, nature's deadliest killer rules sea, land, and air as thousands of s"| __truncated__
 $ Language  : chr "English"
 $ Country   : chr "United States"
 $ Awards    : chr "1 win & 2 nominations total"
 $ Poster    : chr "https://m.media-amazon.com/images/M/MV5BNTNkOTA0NjYtM2VjOC00ZmYzLWI1NzQtMjY4NTU2MGNkZmU0XkEyXkFqcGc@._V1_SX300.jpg"
 $ Ratings   :List of 2
  ..$ :List of 2
  .. ..$ Source: chr "Internet Movie Database"
  .. ..$ Value : chr "3.3/10"
  ..$ :List of 2
  .. ..$ Source: chr "Rotten Tomatoes"
  .. ..$ Value : chr "77%"
 $ Metascore : chr "N/A"
 $ imdbRating: chr "3.3"
 $ imdbVotes : chr "55,330"
 $ imdbID    : chr "tt2724064"
 $ Type      : chr "movie"
 $ DVD       : chr "N/A"
 $ BoxOffice : chr "N/A"
 $ Production: chr "N/A"
 $ Website   : chr "N/A"
 $ Response  : chr "True"

Rectangling messy data

Rectangling and {tidyr}

Art and craft of taking a deeply nested list and taming it into a tidy data set of rows and columns

unnest_longer() - each row contains multiple observations
unnest_wider() - each row contains a single observation
unnest_auto() - make an educated guess
hoist() - extract a specific element

Unnesting very wide data

`gh_repos`

library(repurrrsive)
repos <- tibble(json = gh_repos)
repos

# A tibble: 6 × 1
  json       
  <list>     
1 <list [30]>
2 <list [30]>
3 <list [30]>
4 <list [26]>
5 <list [30]>
6 <list [30]>

`unnest_longer()`

repos |>
  unnest_longer(col = json)

# A tibble: 176 × 1
   json             
   <list>           
 1 <named list [68]>
 2 <named list [68]>
 3 <named list [68]>
 4 <named list [68]>
 5 <named list [68]>
 6 <named list [68]>
 7 <named list [68]>
 8 <named list [68]>
 9 <named list [68]>
10 <named list [68]>
# ℹ 166 more rows

One row per repo

`unnest_wider()`

repos |>
  unnest_longer(col = json) |>
  unnest_wider(col = json)

# A tibble: 176 × 68
         id name  full_name owner        private html_url description fork  url   forks_url keys_url
      <int> <chr> <chr>     <list>       <lgl>   <chr>    <chr>       <lgl> <chr> <chr>     <chr>   
 1 61160198 after gaborcsa… <named list> FALSE   https:/… Run Code i… FALSE http… https://… https:/…
 2 40500181 argu… gaborcsa… <named list> FALSE   https:/… Declarativ… FALSE http… https://… https:/…
 3 36442442 ask   gaborcsa… <named list> FALSE   https:/… Friendly C… FALSE http… https://… https:/…
 4 34924886 base… gaborcsa… <named list> FALSE   https:/… Do we get … FALSE http… https://… https:/…
 5 61620661 cite… gaborcsa… <named list> FALSE   https:/… Test R pac… TRUE  http… https://… https:/…
 6 33907457 clis… gaborcsa… <named list> FALSE   https:/… Unicode sy… FALSE http… https://… https:/…
 7 37236467 cmak… gaborcsa… <named list> FALSE   https:/… port of cm… TRUE  http… https://… https:/…
 8 67959624 cmark gaborcsa… <named list> FALSE   https:/… CommonMark… TRUE  http… https://… https:/…
 9 63152619 cond… gaborcsa… <named list> FALSE   https:/… <NA>        TRUE  http… https://… https:/…
10 24343686 cray… gaborcsa… <named list> FALSE   https:/… R package … FALSE http… https://… https:/…
# ℹ 166 more rows
# ℹ 57 more variables: collaborators_url <chr>, teams_url <chr>, hooks_url <chr>,
#   issue_events_url <chr>, events_url <chr>, assignees_url <chr>, branches_url <chr>,
#   tags_url <chr>, blobs_url <chr>, git_tags_url <chr>, git_refs_url <chr>, trees_url <chr>,
#   statuses_url <chr>, languages_url <chr>, stargazers_url <chr>, contributors_url <chr>,
#   subscribers_url <chr>, subscription_url <chr>, commits_url <chr>, git_commits_url <chr>,
#   comments_url <chr>, issue_comment_url <chr>, contents_url <chr>, compare_url <chr>, …

One column per list element

What are all these columns?

repos |>
  unnest_longer(col = json) |>
  unnest_wider(col = json) |>
  names()

 [1] "id"                "name"              "full_name"         "owner"            
 [5] "private"           "html_url"          "description"       "fork"             
 [9] "url"               "forks_url"         "keys_url"          "collaborators_url"
[13] "teams_url"         "hooks_url"         "issue_events_url"  "events_url"       
[17] "assignees_url"     "branches_url"      "tags_url"          "blobs_url"        
[21] "git_tags_url"      "git_refs_url"      "trees_url"         "statuses_url"     
[25] "languages_url"     "stargazers_url"    "contributors_url"  "subscribers_url"  
[29] "subscription_url"  "commits_url"       "git_commits_url"   "comments_url"     
[33] "issue_comment_url" "contents_url"      "compare_url"       "merges_url"       
[37] "archive_url"       "downloads_url"     "issues_url"        "pulls_url"        
[41] "milestones_url"    "notifications_url" "labels_url"        "releases_url"     
[45] "deployments_url"   "created_at"        "updated_at"        "pushed_at"        
[49] "git_url"           "ssh_url"           "clone_url"         "svn_url"          
[53] "homepage"          "size"              "stargazers_count"  "watchers_count"   
[57] "language"          "has_issues"        "has_downloads"     "has_wiki"         
[61] "has_pages"         "forks_count"       "mirror_url"        "open_issues_count"
[65] "forks"             "open_issues"       "watchers"          "default_branch"

Pick a few columns

repos |>
  unnest_longer(col = json) |>
  unnest_wider(col = json) |>
  select(id, full_name, owner, description)

# A tibble: 176 × 4
         id full_name               owner             description                                   
      <int> <chr>                   <list>            <chr>                                         
 1 61160198 gaborcsardi/after       <named list [17]> Run Code in the Background                    
 2 40500181 gaborcsardi/argufy      <named list [17]> Declarative function argument checks          
 3 36442442 gaborcsardi/ask         <named list [17]> Friendly CLI interaction in R                 
 4 34924886 gaborcsardi/baseimports <named list [17]> Do we get warnings for undeclared imports fro…
 5 61620661 gaborcsardi/citest      <named list [17]> Test R package and repo for the r-hub CI      
 6 33907457 gaborcsardi/clisymbols  <named list [17]> Unicode symbols for CLI applications, with fa…
 7 37236467 gaborcsardi/cmaker      <named list [17]> port of cmake to r                            
 8 67959624 gaborcsardi/cmark       <named list [17]> CommonMark parsing and rendering library and …
 9 63152619 gaborcsardi/conditions  <named list [17]> <NA>                                          
10 24343686 gaborcsardi/crayon      <named list [17]> R package for colored terminal output         
# ℹ 166 more rows

Unnest another column

repos |>
  unnest_longer(col = json) |>
  unnest_wider(col = json) |>
  select(id, full_name, owner, description) |>
  unnest_wider(owner)

Error in `unnest_wider()`:
! Can't duplicate names between the affected columns and the original data.
✖ These names are duplicated:
  ℹ `id`, from `owner`.
ℹ Use `names_sep` to disambiguate using the column name.
ℹ Or use `names_repair` to specify a repair strategy.

Resolve duplicate names

repos |>
  unnest_longer(col = json) |>
  unnest_wider(col = json) |>
  select(id, full_name, owner, description) |>
  unnest_wider(owner, names_sep = "_")

# A tibble: 176 × 20
       id full_name owner_login owner_id owner_avatar_url owner_gravatar_id owner_url owner_html_url
    <int> <chr>     <chr>          <int> <chr>            <chr>             <chr>     <chr>         
 1 6.12e7 gaborcsa… gaborcsardi   660288 https://avatars… ""                https://… https://githu…
 2 4.05e7 gaborcsa… gaborcsardi   660288 https://avatars… ""                https://… https://githu…
 3 3.64e7 gaborcsa… gaborcsardi   660288 https://avatars… ""                https://… https://githu…
 4 3.49e7 gaborcsa… gaborcsardi   660288 https://avatars… ""                https://… https://githu…
 5 6.16e7 gaborcsa… gaborcsardi   660288 https://avatars… ""                https://… https://githu…
 6 3.39e7 gaborcsa… gaborcsardi   660288 https://avatars… ""                https://… https://githu…
 7 3.72e7 gaborcsa… gaborcsardi   660288 https://avatars… ""                https://… https://githu…
 8 6.80e7 gaborcsa… gaborcsardi   660288 https://avatars… ""                https://… https://githu…
 9 6.32e7 gaborcsa… gaborcsardi   660288 https://avatars… ""                https://… https://githu…
10 2.43e7 gaborcsa… gaborcsardi   660288 https://avatars… ""                https://… https://githu…
# ℹ 166 more rows
# ℹ 12 more variables: owner_followers_url <chr>, owner_following_url <chr>, owner_gists_url <chr>,
#   owner_starred_url <chr>, owner_subscriptions_url <chr>, owner_organizations_url <chr>,
#   owner_repos_url <chr>, owner_events_url <chr>, owner_received_events_url <chr>,
#   owner_type <chr>, owner_site_admin <lgl>, description <chr>

Relational data stored as nested lists

An API of Ice and Fire

chars <- tibble(json = got_chars)
chars

# A tibble: 30 × 1
   json             
   <list>           
 1 <named list [18]>
 2 <named list [18]>
 3 <named list [18]>
 4 <named list [18]>
 5 <named list [18]>
 6 <named list [18]>
 7 <named list [18]>
 8 <named list [18]>
 9 <named list [18]>
10 <named list [18]>
# ℹ 20 more rows

One row per POV character

Widen the data frame

chars |>
  unnest_wider(col = json)

# A tibble: 30 × 18
   url                 id name  gender culture born  died  alive titles aliases father mother spouse
   <chr>            <int> <chr> <chr>  <chr>   <chr> <chr> <lgl> <list> <list>  <chr>  <chr>  <chr> 
 1 https://www.ana…  1022 Theo… Male   "Ironb… "In … ""    TRUE  <chr>  <chr>   ""     ""     ""    
 2 https://www.ana…  1052 Tyri… Male   ""      "In … ""    TRUE  <chr>  <chr>   ""     ""     "http…
 3 https://www.ana…  1074 Vict… Male   "Ironb… "In … ""    TRUE  <chr>  <chr>   ""     ""     ""    
 4 https://www.ana…  1109 Will  Male   ""      ""    "In … FALSE <chr>  <chr>   ""     ""     ""    
 5 https://www.ana…  1166 Areo… Male   "Norvo… "In … ""    TRUE  <chr>  <chr>   ""     ""     ""    
 6 https://www.ana…  1267 Chett Male   ""      "At … "In … FALSE <chr>  <chr>   ""     ""     ""    
 7 https://www.ana…  1295 Cres… Male   ""      "In … "In … FALSE <chr>  <chr>   ""     ""     ""    
 8 https://www.ana…   130 Aria… Female "Dorni… "In … ""    TRUE  <chr>  <chr>   ""     ""     ""    
 9 https://www.ana…  1303 Daen… Female "Valyr… "In … ""    TRUE  <chr>  <chr>   ""     ""     "http…
10 https://www.ana…  1319 Davo… Male   "Weste… "In … ""    TRUE  <chr>  <chr>   ""     ""     "http…
# ℹ 20 more rows
# ℹ 5 more variables: allegiances <list>, books <list>, povBooks <list>, tvSeries <list>,
#   playedBy <list>

Some of the columns

characters <- chars |>
  unnest_wider(col = json)
characters |>
  select(id, name, gender, culture, born, died, alive)

# A tibble: 30 × 7
      id name               gender culture    born                                     died    alive
   <int> <chr>              <chr>  <chr>      <chr>                                    <chr>   <lgl>
 1  1022 Theon Greyjoy      Male   "Ironborn" "In 278 AC or 279 AC, at Pyke"           ""      TRUE 
 2  1052 Tyrion Lannister   Male   ""         "In 273 AC, at Casterly Rock"            ""      TRUE 
 3  1074 Victarion Greyjoy  Male   "Ironborn" "In 268 AC or before, at Pyke"           ""      TRUE 
 4  1109 Will               Male   ""         ""                                       "In 29… FALSE
 5  1166 Areo Hotah         Male   "Norvoshi" "In 257 AC or before, at Norvos"         ""      TRUE 
 6  1267 Chett              Male   ""         "At Hag's Mire"                          "In 29… FALSE
 7  1295 Cressen            Male   ""         "In 219 AC or 220 AC"                    "In 29… FALSE
 8   130 Arianne Martell    Female "Dornish"  "In 276 AC, at Sunspear"                 ""      TRUE 
 9  1303 Daenerys Targaryen Female "Valyrian" "In 284 AC, at Dragonstone"              ""      TRUE 
10  1319 Davos Seaworth     Male   "Westeros" "In 260 AC or before, at King's Landing" ""      TRUE 
# ℹ 20 more rows

List-columns in ASOIAF

chars |>
  unnest_wider(col = json) |>
  select(id, where(is.list))

# A tibble: 30 × 8
      id titles    aliases    allegiances books     povBooks  tvSeries  playedBy 
   <int> <list>    <list>     <list>      <list>    <list>    <list>    <list>   
 1  1022 <chr [2]> <chr [4]>  <chr [1]>   <chr [3]> <chr [2]> <chr [6]> <chr [1]>
 2  1052 <chr [2]> <chr [11]> <chr [1]>   <chr [2]> <chr [4]> <chr [6]> <chr [1]>
 3  1074 <chr [2]> <chr [1]>  <chr [1]>   <chr [3]> <chr [2]> <chr [1]> <chr [1]>
 4  1109 <chr [1]> <chr [1]>  <NULL>      <chr [1]> <chr [1]> <chr [1]> <chr [1]>
 5  1166 <chr [1]> <chr [1]>  <chr [1]>   <chr [3]> <chr [2]> <chr [2]> <chr [1]>
 6  1267 <chr [1]> <chr [1]>  <NULL>      <chr [2]> <chr [1]> <chr [1]> <chr [1]>
 7  1295 <chr [1]> <chr [1]>  <NULL>      <chr [2]> <chr [1]> <chr [1]> <chr [1]>
 8   130 <chr [1]> <chr [1]>  <chr [1]>   <chr [4]> <chr [1]> <chr [1]> <chr [1]>
 9  1303 <chr [5]> <chr [11]> <chr [1]>   <chr [1]> <chr [4]> <chr [6]> <chr [1]>
10  1319 <chr [4]> <chr [5]>  <chr [2]>   <chr [1]> <chr [3]> <chr [5]> <chr [1]>
# ℹ 20 more rows

Choose your own adventure

Match character’s title to their name

select(
  .data = characters,
  name,
  title = titles
)

# A tibble: 30 × 2
   name               title    
   <chr>              <list>   
 1 Theon Greyjoy      <chr [2]>
 2 Tyrion Lannister   <chr [2]>
 3 Victarion Greyjoy  <chr [2]>
 4 Will               <chr [1]>
 5 Areo Hotah         <chr [1]>
 6 Chett              <chr [1]>
 7 Cressen            <chr [1]>
 8 Arianne Martell    <chr [1]>
 9 Daenerys Targaryen <chr [5]>
10 Davos Seaworth     <chr [4]>
# ℹ 20 more rows

Match character’s title to their name

select(
  .data = characters,
  name,
  title = titles
) |>
  unnest_longer(col = title)

# A tibble: 59 × 2
   name              title                                                 
   <chr>             <chr>                                                 
 1 Theon Greyjoy     "Prince of Winterfell"                                
 2 Theon Greyjoy     "Lord of the Iron Islands (by law of the green lands)"
 3 Tyrion Lannister  "Acting Hand of the King (former)"                    
 4 Tyrion Lannister  "Master of Coin (former)"                             
 5 Victarion Greyjoy "Lord Captain of the Iron Fleet"                      
 6 Victarion Greyjoy "Master of the Iron Victory"                          
 7 Will              ""                                                    
 8 Areo Hotah        "Captain of the Guard at Sunspear"                    
 9 Chett             ""                                                    
10 Cressen           "Maester"                                             
# ℹ 49 more rows

Every appearance per book/season

select(
  .data = characters,
  name,
  books,
  tvSeries
)

# A tibble: 30 × 3
   name               books     tvSeries 
   <chr>              <list>    <list>   
 1 Theon Greyjoy      <chr [3]> <chr [6]>
 2 Tyrion Lannister   <chr [2]> <chr [6]>
 3 Victarion Greyjoy  <chr [3]> <chr [1]>
 4 Will               <chr [1]> <chr [1]>
 5 Areo Hotah         <chr [3]> <chr [2]>
 6 Chett              <chr [2]> <chr [1]>
 7 Cressen            <chr [2]> <chr [1]>
 8 Arianne Martell    <chr [4]> <chr [1]>
 9 Daenerys Targaryen <chr [1]> <chr [6]>
10 Davos Seaworth     <chr [1]> <chr [5]>
# ℹ 20 more rows

Every appearance per book/season

select(
  .data = characters,
  name,
  books,
  tvSeries
) |>
  pivot_longer(
    cols = c(books, tvSeries),
    names_to = "media",
    values_to = "value"
  )

# A tibble: 60 × 3
   name              media    value    
   <chr>             <chr>    <list>   
 1 Theon Greyjoy     books    <chr [3]>
 2 Theon Greyjoy     tvSeries <chr [6]>
 3 Tyrion Lannister  books    <chr [2]>
 4 Tyrion Lannister  tvSeries <chr [6]>
 5 Victarion Greyjoy books    <chr [3]>
 6 Victarion Greyjoy tvSeries <chr [1]>
 7 Will              books    <chr [1]>
 8 Will              tvSeries <chr [1]>
 9 Areo Hotah        books    <chr [3]>
10 Areo Hotah        tvSeries <chr [2]>
# ℹ 50 more rows

Every appearance per book/season

select(
  .data = characters,
  name,
  books,
  tvSeries
) |>
  pivot_longer(
    cols = c(books, tvSeries),
    names_to = "media",
    values_to = "value"
  ) |>
  unnest_longer(col = value)

# A tibble: 179 × 3
   name             media    value            
   <chr>            <chr>    <chr>            
 1 Theon Greyjoy    books    A Game of Thrones
 2 Theon Greyjoy    books    A Storm of Swords
 3 Theon Greyjoy    books    A Feast for Crows
 4 Theon Greyjoy    tvSeries Season 1         
 5 Theon Greyjoy    tvSeries Season 2         
 6 Theon Greyjoy    tvSeries Season 3         
 7 Theon Greyjoy    tvSeries Season 4         
 8 Theon Greyjoy    tvSeries Season 5         
 9 Theon Greyjoy    tvSeries Season 6         
10 Tyrion Lannister books    A Feast for Crows
# ℹ 169 more rows

Fix the OMDB API function

Rectangled version

omdb_api <- function(title) {
  # print a message to track progress
  message(str_glue("Scraping {title}..."))

  # define request
  req <- request(base_url = "http://www.omdbapi.com/") |>
    # throttle to avoid overloading server
    req_throttle(rate = 15 / 60) |>
    # create query
    req_url_query(
      apikey = Sys.getenv("omdb_key"),
      t = title
    )

  # perform request
  req_results <- req |>
    req_perform()

  # extract results
  req_json <- req_results |>
    resp_body_json()

  # convert to data frame
  req_df <- tibble(result = list(req_json)) |>
    unnest_wider(col = result)

  return(req_df)
}

Test that it works

# titles of films
sharknados <- c(
  "Sharknado",
  "Sharknado 2",
  "Sharknado 3",
  "Sharknado 4",
  "Sharknado 5"
)

# iterate over titles and query API
sharknados_df <- map(.x = sharknados, .f = omdb_api) |>
  list_rbind()
sharknados_df

# A tibble: 5 × 25
  Title      Year  Rated Released Runtime Genre Director Writer Actors Plot  Language Country Awards
  <chr>      <chr> <chr> <chr>    <chr>   <chr> <chr>    <chr>  <chr>  <chr> <chr>    <chr>   <chr> 
1 Sharknado  2013  Not … 11 Jul … 86 min  Acti… Anthony… Thund… Ian Z… When… English  United… 1 win…
2 Sharknado… 2014  TV-14 30 Jul … 95 min  Acti… Anthony… Thund… Ian Z… Fin … English  United… N/A   
3 Sharknado… 2015  TV-14 22 Jul … 93 min  Acti… Anthony… Thund… Ian Z… A mo… English  United… 2 nom…
4 Sharknado… 2016  TV-14 31 Jul … 95 min  Acti… Anthony… Thund… Ian Z… Fin,… English  United… 1 nom…
5 Sharknado… 2017  TV-14 06 Aug … 93 min  Acti… Anthony… Thund… Ian Z… With… English  United… 1 nom…
# ℹ 12 more variables: Poster <chr>, Ratings <list>, Metascore <chr>, imdbRating <chr>,
#   imdbVotes <chr>, imdbID <chr>, Type <chr>, DVD <chr>, BoxOffice <chr>, Production <chr>,
#   Website <chr>, Response <chr>

Application exercise

`ae-12`

Instructions

Go to the course GitHub org and find your ae-12 (repo name will be suffixed with your GitHub name).
Clone the repo in Positron, run renv::restore() to install the required packages, open the Quarto document in the repo, and follow along and complete the exercises.
Render, commit, and push your edits by the AE deadline – end of the day

Wrap up

Recap

Lists are an efficient data structure in R for storing hierarchical data
unnest_longer() and unnest_wider() are useful for working with list-columns
Examine the list structure with str() or View() to understand how elements are stored within it

Rectangling data

Announcements

Announcements

Learning objectives

Lists and list-columns

Lists

Named lists

Structure of lists

Nested lists

List-columns

What’s up with the Sharknado results?

Sharknado results from OMDB API

Sharknado

Rectangling messy data

Rectangling and {tidyr}

Unnesting very wide data

gh_repos

unnest_longer()

unnest_wider()

What are all these columns?

Pick a few columns

Unnest another column

Resolve duplicate names

Relational data stored as nested lists

An API of Ice and Fire

Widen the data frame

Some of the columns

List-columns in ASOIAF

Choose your own adventure

Match character’s title to their name

Match character’s title to their name

Every appearance per book/season

Every appearance per book/season

Every appearance per book/season

Fix the OMDB API function

Rectangled version

Test that it works

Application exercise

ae-12

Wrap up

Recap

Zapdos

`gh_repos`

`unnest_longer()`

`unnest_wider()`

`ae-12`