AE 11: Querying the OMDB API with httr2
Suggested answers
Packages
We will use the following packages in this application exercise.
- tidyverse: For data import, wrangling, and visualization.
- httr2: For querying APIs.
- jsonlite: For some formatting
Writing an API function
If an R package has not already been written for an application programming interface (API), you can write your own function to query the API. In this application exercise, we will write a function to query the Open Movie Database.
Create a request
The first step in querying an API is to create a request. A request is an object that contains the information needed to query the API. The request()
function creates a request object. The base_url
argument specifies the base URL of the API. The req <-
syntax assigns the request object to the variable req
.
omdb_req <- request(base_url = "http://www.omdbapi.com/")
omdb_req
<httr2_request>
GET http://www.omdbapi.com/
Body: empty
Perform a dry run
The req_dry_run()
function performs a dry run of the request. A dry run is a test run of the request that does not actually query the API. It is useful for testing the request before actually querying the API.
omdb_req |>
req_dry_run()
GET / HTTP/1.1
Host: www.omdbapi.com
User-Agent: httr2/1.0.1 r-curl/5.2.2 libcurl/8.7.1
Accept: */*
Accept-Encoding: gzip
Determine the shape of an API request
In order to submit a request, we need to define the shape of the request, or the exact URL used to submit the request. The URL of the request is the base URL of the API plus the path of the request.
APIs typically have three major components to the request:
- The base URL for the web service (here it is
http://www.omdbapi.com/
). - The resource path which is the complete destination of the web service endpoint (OMDB API does not have a resource path).
- The query parameters which are the parameters passed to the web service endpoint.
In order to create your request you need to read the documentation to see exactly how these components need to be specified.
Your turn: Use the OMDB documentation to determine the shape of the request for information on Sharknado.
http://www.omdbapi.com/?apikey=your-key&t=Sharknado
Generate the query
Store your API key
In order to access the OMDB API you need an API key. If you do not have one, use the example key provided on Canvas.
Your turn: Store your API key in .Renviron
so you can access it in your code. Once you have saved the file, restart your R session to ensure the new environment variable is loaded.
# from the console run:
usethis::edit_r_environ()
# in .Renviron add:
omdb_key="your-key"
omdb_req |>
req_url_query(
apikey = Sys.getenv("omdb_key"),
t = "Sharknado"
) |>
req_dry_run()
GET /?apikey=your-key&t=Sharknado HTTP/1.1
Host: www.omdbapi.com
User-Agent: httr2/1.0.1 r-curl/5.2.2 libcurl/8.7.1
Accept: */*
Accept-Encoding: gzip
Fetch the response
The req_perform()
function fetches the response from the API. The response is stored as a response object.
sharknado <- omdb_req |>
req_url_query(
apikey = Sys.getenv("omdb_key"),
t = "Sharknado"
) |>
req_perform()
sharknado
<httr2_response>
GET http://www.omdbapi.com/?apikey=your-key&t=Sharknado
Status: 200 OK
Content-Type: application/json
Body: In memory (893 bytes)
What did we get?
The HTTP response contains a number of useful pieces of information.
Status code
The status code is a number that indicates whether the request was successful. A status code of 200 indicates success. A status code of 400 or 500 indicates an error. resp_status()
retrieves the numeric HTTP status code, whereas resp_status_desc()
retrieves a brief textual description of the status code.
HTTP status codes
Hopefully all you receive is a 200 code indicating the query was successful. If you get something different, the error code is useful in debugging your code and determining what (if anything) you can do to fix it
Code | Status |
---|---|
1xx | Informational |
2xx | Success |
3xx | Redirection |
4xx | Client error (you did something wrong) |
5xx | Server error (server did something wrong) |
Body
The body of the response contains the actual data returned by the API. The body is a string of characters.
You can extract the body in various forms using the resp_body_*()
family of functions. The resp_body_string()
function retrieves the body as a string.
sharknado |>
resp_body_string() |>
prettify()
{
"Title": "Sharknado",
"Year": "2013",
"Rated": "Not Rated",
"Released": "11 Jul 2013",
"Runtime": "86 min",
"Genre": "Action, Adventure, Comedy",
"Director": "Anthony C. Ferrante",
"Writer": "Thunder Levin",
"Actors": "Ian Ziering, Tara Reid, John Heard",
"Plot": "When a freak hurricane swamps Los Angeles, nature's deadliest killer rules sea, land, and air as thousands of sharks terrorize the waterlogged populace.",
"Language": "English",
"Country": "United States",
"Awards": "1 win & 2 nominations",
"Poster": "https://m.media-amazon.com/images/M/MV5BNTNkOTA0NjYtM2VjOC00ZmYzLWI1NzQtMjY4NTU2MGNkZmU0XkEyXkFqcGc@._V1_SX300.jpg",
"Ratings": [
{
"Source": "Internet Movie Database",
"Value": "3.3/10"
},
{
"Source": "Rotten Tomatoes",
"Value": "77%"
}
],
"Metascore": "N/A",
"imdbRating": "3.3",
"imdbVotes": "53,662",
"imdbID": "tt2724064",
"Type": "movie",
"DVD": "N/A",
"BoxOffice": "N/A",
"Production": "N/A",
"Website": "N/A",
"Response": "True"
}
JSON
Here the result is actually formatted as using JavaScript Object Notation (JSON), so we can use resp_body_json()
to extract the data and store it as a list object in R.
sharknado |>
resp_body_json()
$Title
[1] "Sharknado"
$Year
[1] "2013"
$Rated
[1] "Not Rated"
$Released
[1] "11 Jul 2013"
$Runtime
[1] "86 min"
$Genre
[1] "Action, Adventure, Comedy"
$Director
[1] "Anthony C. Ferrante"
$Writer
[1] "Thunder Levin"
$Actors
[1] "Ian Ziering, Tara Reid, John Heard"
$Plot
[1] "When a freak hurricane swamps Los Angeles, nature's deadliest killer rules sea, land, and air as thousands of sharks terrorize the waterlogged populace."
$Language
[1] "English"
$Country
[1] "United States"
$Awards
[1] "1 win & 2 nominations"
$Poster
[1] "https://m.media-amazon.com/images/M/MV5BNTNkOTA0NjYtM2VjOC00ZmYzLWI1NzQtMjY4NTU2MGNkZmU0XkEyXkFqcGc@._V1_SX300.jpg"
$Ratings
$Ratings[[1]]
$Ratings[[1]]$Source
[1] "Internet Movie Database"
$Ratings[[1]]$Value
[1] "3.3/10"
$Ratings[[2]]
$Ratings[[2]]$Source
[1] "Rotten Tomatoes"
$Ratings[[2]]$Value
[1] "77%"
$Metascore
[1] "N/A"
$imdbRating
[1] "3.3"
$imdbVotes
[1] "53,662"
$imdbID
[1] "tt2724064"
$Type
[1] "movie"
$DVD
[1] "N/A"
$BoxOffice
[1] "N/A"
$Production
[1] "N/A"
$Website
[1] "N/A"
$Response
[1] "True"
Convert to data frame
For data analysis purposes, we prefer that the data be stored as a data frame. The as_tibble()
function converts the list object to a tibble.
sharknado |>
resp_body_json() |>
as_tibble()
# A tibble: 2 × 25
Title Year Rated Released Runtime Genre Director Writer Actors Plot Language
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Shar… 2013 Not … 11 Jul … 86 min Acti… Anthony… Thund… Ian Z… When… English
2 Shar… 2013 Not … 11 Jul … 86 min Acti… Anthony… Thund… Ian Z… When… English
# ℹ 14 more variables: Country <chr>, Awards <chr>, Poster <chr>,
# Ratings <list>, Metascore <chr>, imdbRating <chr>, imdbVotes <chr>,
# imdbID <chr>, Type <chr>, DVD <chr>, BoxOffice <chr>, Production <chr>,
# Website <chr>, Response <chr>
Write a function to query the OMDB API
Sharknado proved so popular that four sequels were made. Let’s write a function to query the OMDB API for information on any of the Sharknado films.
Your turn: Your function should:
- Take a single argument (the title of the film)
- Print a message using the
message()
function to track progress - Use throttling to ensure we do not overload the server and exceed any rate limits. Add
req_throttle()
to the request pipeline to limit the rate to 15 requests per minute. - Return a tibble with the information from the API
omdb_api <- function(title) {
# print a message to track progress
message(str_glue("Scraping {title}..."))
# define request
req <- request(base_url = "http://www.omdbapi.com/") |>
# throttle to avoid overloading server
req_throttle(rate = 15 / 60) |>
# create query
req_url_query(
apikey = Sys.getenv("omdb_key"),
t = title
)
# perform request
req_results <- req |>
req_perform()
# extract results
req_df <- req_results |>
resp_body_json() |>
as_tibble()
return(req_df)
}
Once you have written your function, test it out by querying the API for information on “Sharknado”. Then apply an iterative operation to query the API for information on all five Sharknado films and store it in a single data frame.
# test function
omdb_api(title = "Sharknado")
Scraping Sharknado...
# A tibble: 2 × 25
Title Year Rated Released Runtime Genre Director Writer Actors Plot Language
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Shar… 2013 Not … 11 Jul … 86 min Acti… Anthony… Thund… Ian Z… When… English
2 Shar… 2013 Not … 11 Jul … 86 min Acti… Anthony… Thund… Ian Z… When… English
# ℹ 14 more variables: Country <chr>, Awards <chr>, Poster <chr>,
# Ratings <list>, Metascore <chr>, imdbRating <chr>, imdbVotes <chr>,
# imdbID <chr>, Type <chr>, DVD <chr>, BoxOffice <chr>, Production <chr>,
# Website <chr>, Response <chr>
# titles of films
sharknados <- c(
"Sharknado", "Sharknado 2", "Sharknado 3",
"Sharknado 4", "Sharknado 5"
)
# iterate over titles and query API
sharknados_df <- map(.x = sharknados, .f = omdb_api) |>
list_rbind()
Scraping Sharknado...
Waiting 4s for throttling delay ■■■■■■■■
Waiting 4s for throttling delay ■■■■■■■■■■■■■■■
Waiting 4s for throttling delay ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
Scraping Sharknado 2...
Waiting 4s for throttling delay ■■■■■■■■
Waiting 4s for throttling delay ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
Waiting 4s for throttling delay ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
Scraping Sharknado 3...
Waiting 4s for throttling delay ■■■■■■■■■■■■■■■■■■■■■
Waiting 4s for throttling delay ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
Scraping Sharknado 4...
Waiting 4s for throttling delay ■■■■■■■■■■■■■■
Waiting 4s for throttling delay ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
Scraping Sharknado 5...
Waiting 4s for throttling delay ■■■■■■■■
Waiting 4s for throttling delay ■■■■■■■■■■■■■■■■■■■■■■■■■■■■
Waiting 4s for throttling delay ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
sharknados_df
# A tibble: 10 × 25
Title Year Rated Released Runtime Genre Director Writer Actors Plot
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Sharknado 2013 Not … 11 Jul … 86 min Acti… Anthony… Thund… Ian Z… When…
2 Sharknado 2013 Not … 11 Jul … 86 min Acti… Anthony… Thund… Ian Z… When…
3 Sharknado 2:… 2014 TV-14 30 Jul … 95 min Acti… Anthony… Thund… Ian Z… Fin …
4 Sharknado 2:… 2014 TV-14 30 Jul … 95 min Acti… Anthony… Thund… Ian Z… Fin …
5 Sharknado 3:… 2015 TV-14 22 Jul … 93 min Acti… Anthony… Thund… Ian Z… A mo…
6 Sharknado 3:… 2015 TV-14 22 Jul … 93 min Acti… Anthony… Thund… Ian Z… A mo…
7 Sharknado 4:… 2016 TV-14 31 Jul … 95 min Acti… Anthony… Thund… Ian Z… Fin,…
8 Sharknado 4:… 2016 TV-14 31 Jul … 95 min Acti… Anthony… Thund… Ian Z… Fin,…
9 Sharknado 5:… 2017 TV-14 06 Aug … 93 min Acti… Anthony… Thund… Ian Z… With…
10 Sharknado 5:… 2017 TV-14 06 Aug … 93 min Acti… Anthony… Thund… Ian Z… With…
# ℹ 15 more variables: Language <chr>, Country <chr>, Awards <chr>,
# Poster <chr>, Ratings <list>, Metascore <chr>, imdbRating <chr>,
# imdbVotes <chr>, imdbID <chr>, Type <chr>, DVD <chr>, BoxOffice <chr>,
# Production <chr>, Website <chr>, Response <chr>
Acknowledgments
- These exercises draw substantially on the httr2 vignettes and reference documentation.
sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.4.1 (2024-06-14)
os macOS Sonoma 14.6.1
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2024-10-17
pandoc 3.4 @ /usr/local/bin/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
! package * version date (UTC) lib source
cli 3.6.3 2024-06-21 [1] RSPM (R 4.4.0)
P codetools 0.2-20 2024-03-31 [?] CRAN (R 4.4.1)
P colorspace 2.1-0 2023-01-23 [?] CRAN (R 4.3.0)
curl 5.2.2 2024-08-26 [1] RSPM (R 4.4.0)
P digest 0.6.35 2024-03-11 [?] CRAN (R 4.3.1)
P dplyr * 1.1.4 2023-11-17 [?] CRAN (R 4.3.1)
P evaluate 0.24.0 2024-06-10 [?] CRAN (R 4.4.0)
P fansi 1.0.6 2023-12-08 [?] CRAN (R 4.3.1)
P fastmap 1.2.0 2024-05-15 [?] CRAN (R 4.4.0)
P forcats * 1.0.0 2023-01-29 [?] CRAN (R 4.3.0)
P generics 0.1.3 2022-07-05 [?] CRAN (R 4.3.0)
P ggplot2 * 3.5.1 2024-04-23 [?] CRAN (R 4.3.1)
P glue 1.7.0 2024-01-09 [?] CRAN (R 4.3.1)
P gtable 0.3.5 2024-04-22 [?] CRAN (R 4.3.1)
P here 1.0.1 2020-12-13 [?] CRAN (R 4.3.0)
P hms 1.1.3 2023-03-21 [?] CRAN (R 4.3.0)
P htmltools 0.5.8.1 2024-04-04 [?] CRAN (R 4.3.1)
P htmlwidgets 1.6.4 2023-12-06 [?] CRAN (R 4.3.1)
P httpuv 1.6.15 2024-03-26 [?] CRAN (R 4.3.1)
P httr2 * 1.0.1 2024-04-01 [?] CRAN (R 4.3.1)
P jsonlite * 1.8.8 2023-12-04 [?] CRAN (R 4.3.1)
P knitr 1.47 2024-05-29 [?] CRAN (R 4.4.0)
P later 1.3.2 2023-12-06 [?] CRAN (R 4.3.1)
P lifecycle 1.0.4 2023-11-07 [?] CRAN (R 4.3.1)
P lubridate * 1.9.3 2023-09-27 [?] CRAN (R 4.3.1)
P magrittr 2.0.3 2022-03-30 [?] CRAN (R 4.3.0)
P munsell 0.5.1 2024-04-01 [?] CRAN (R 4.3.1)
P pillar 1.9.0 2023-03-22 [?] CRAN (R 4.3.0)
P pkgconfig 2.0.3 2019-09-22 [?] CRAN (R 4.3.0)
P promises 1.3.0 2024-04-05 [?] CRAN (R 4.3.1)
P purrr * 1.0.2 2023-08-10 [?] CRAN (R 4.3.0)
P R6 2.5.1 2021-08-19 [?] CRAN (R 4.3.0)
P rappdirs 0.3.3 2021-01-31 [?] CRAN (R 4.3.0)
P Rcpp 1.0.12 2024-01-09 [?] CRAN (R 4.3.1)
P readr * 2.1.5 2024-01-10 [?] CRAN (R 4.3.1)
renv 1.0.7 2024-04-11 [1] CRAN (R 4.4.0)
P rlang 1.1.4 2024-06-04 [?] CRAN (R 4.3.3)
P rmarkdown 2.27 2024-05-17 [?] CRAN (R 4.4.0)
P rprojroot 2.0.4 2023-11-05 [?] CRAN (R 4.3.1)
P rstudioapi 0.16.0 2024-03-24 [?] CRAN (R 4.3.1)
P scales 1.3.0.9000 2024-05-07 [?] Github (r-lib/scales@c0f79d3)
P sessioninfo 1.2.2 2021-12-06 [?] CRAN (R 4.3.0)
P stringi 1.8.4 2024-05-06 [?] CRAN (R 4.3.1)
P stringr * 1.5.1 2023-11-14 [?] CRAN (R 4.3.1)
P tibble * 3.2.1 2023-03-20 [?] CRAN (R 4.3.0)
P tidyr * 1.3.1 2024-01-24 [?] CRAN (R 4.3.1)
P tidyselect 1.2.1 2024-03-11 [?] CRAN (R 4.3.1)
P tidyverse * 2.0.0 2023-02-22 [?] CRAN (R 4.3.0)
P timechange 0.3.0 2024-01-18 [?] CRAN (R 4.3.1)
P tzdb 0.4.0 2023-05-12 [?] CRAN (R 4.3.0)
P utf8 1.2.4 2023-10-22 [?] CRAN (R 4.3.1)
P vctrs 0.6.5 2023-12-01 [?] CRAN (R 4.3.1)
withr 3.0.1 2024-07-31 [1] RSPM (R 4.4.0)
P xfun 0.45 2024-06-16 [?] CRAN (R 4.4.0)
P yaml 2.3.8 2023-12-11 [?] CRAN (R 4.3.1)
[1] /Users/soltoffbc/Projects/info-5001/course-site/renv/library/macos/R-4.4/aarch64-apple-darwin20
[2] /Users/soltoffbc/Library/Caches/org.R-project.R/R/renv/sandbox/macos/R-4.4/aarch64-apple-darwin20/f7156815
P ── Loaded and on-disk path mismatch.
──────────────────────────────────────────────────────────────────────────────