Application Programming Interfaces

Lecture 13

Dr. Benjamin Soltoff

Cornell University
INFO 5001 - Fall 2024

October 10, 2024

Announcements

Announcements

  • Project proposals
  • Lab 04 this week
  • Homework 04 next week
  • Exam in two weeks

Methods for obtaining data online

Methods for obtaining data online

  • Click and download
  • Install and play
  • API query
  • Web scraping

Click and download

  • readr::read_csv()
  • downloader package or curl

Application programming interface (API)

Application programming interface (API)

  • Representational State Transfer (REST)
  • Uniform Resource Location (URL)
  • HTTP methods
    • GET
    • POST

Application programming interface (API)

RESTful queries

  1. Submit request to server via URL
  2. Return result in a structured format
  3. Parse results into a local format

Install and play packages

  • Packages with R functions written for existing APIs
  • Useful because
    • Reproducible
    • Up-to-date (ideally)
    • Provenance (can blame someone if something goes wrong)
    • Ease of access

Using APIs with existing R packages

manifestoR

  • Collects and organizes political party manifestos from around the world
  • Over 1000 parties from 1945 until today in over 50 countries on five continents
  • manifestoR

API authentication

  • Verifying user access
  • Different methods
    • Username/password (out of favor)
    • Key/token (e.g. GitHub and usethis)
    • OAuth (e.g. what googlesheets4 tried to do)

API keys

  • Random string of alphanumeric characters unique to each user
  • Access restrictions based on needs
  • Obtain key
  • Store in .Rprofile or .Renviron
  • Never store directly in an R script or Quarto document

Storing API keys

Edit with usethis::edit_r_profile()

# in .Rprofile
options(this_is_my_key = "value")

# later, in the R script:
key <- getOption("this_is_my_key")

Edit with usethis::edit_r_environ()

# in .Renviron
this_is_my_key=value

# later, in the R script
key <- Sys.getenv("this_is_my_key")

Load library and set API key

library(manifestoR)

# retrieve API key stored in .Rprofile
mp_setapikey(key = getOption("manifesto_key"))

Retrieve the database

(mpds <- mp_maindataset())
Connecting to Manifesto Project DB API... 
Connecting to Manifesto Project DB API... corpus version: 2024-1 
# A tibble: 5,151 × 175
   country countryname oecdmember eumember edate        date party partyname    
     <dbl> <chr>            <dbl>    <dbl> <date>      <dbl> <dbl> <chr>        
 1      11 Sweden               0        0 1944-09-17 194409 11220 Communist Pa…
 2      11 Sweden               0        0 1944-09-17 194409 11320 Social Democ…
 3      11 Sweden               0        0 1944-09-17 194409 11420 People’s Par…
 4      11 Sweden               0        0 1944-09-17 194409 11620 Right Party  
 5      11 Sweden               0        0 1944-09-17 194409 11810 Agrarian Par…
 6      11 Sweden               0        0 1948-09-19 194809 11220 Communist Pa…
 7      11 Sweden               0        0 1948-09-19 194809 11320 Social Democ…
 8      11 Sweden               0        0 1948-09-19 194809 11420 People’s Par…
 9      11 Sweden               0        0 1948-09-19 194809 11620 Right Party  
10      11 Sweden               0        0 1948-09-19 194809 11810 Agrarian Par…
# ℹ 5,141 more rows
# ℹ 167 more variables: partyabbrev <chr>, parfam <dbl>, candidatename <chr>,
#   coderid <dbl>, manual <dbl>, coderyear <dbl>, testresult <dbl>,
#   testeditsim <dbl>, pervote <dbl>, voteest <dbl>, presvote <dbl>,
#   absseat <dbl>, totseats <dbl>, progtype <dbl>, datasetorigin <dbl>,
#   corpusversion <chr>, total <dbl>, peruncod <dbl>, per101 <dbl>,
#   per102 <dbl>, per103 <dbl>, per104 <dbl>, per105 <dbl>, per106 <dbl>, …

Download manifestos

Census data with tidycensus

  • API to access data from US Census Bureau
    • Decennial census
    • American Community Survey
  • Returns tidy data frames with (optional) sf geometry
  • Search for variables with load_variables()

Store API key

library(tidycensus)
census_api_key("YOUR API KEY GOES HERE")

Obtain data

usa_inc <- get_acs(
  geography = "state",
  variables = c(medincome = "B19013_001"),
  year = 2022
)
usa_inc
# A tibble: 52 × 5
   GEOID NAME                 variable  estimate   moe
   <chr> <chr>                <chr>        <dbl> <dbl>
 1 01    Alabama              medincome    59609   377
 2 02    Alaska               medincome    86370  1083
 3 04    Arizona              medincome    72581   450
 4 05    Arkansas             medincome    56335   422
 5 06    California           medincome    91905   277
 6 08    Colorado             medincome    87598   508
 7 09    Connecticut          medincome    90213   730
 8 10    Delaware             medincome    79325  1227
 9 11    District of Columbia medincome   101722  1569
10 12    Florida              medincome    67917   259
# ℹ 42 more rows

Visualize data

Application exercise

Writing an API function

ae-11

  • Go to the course GitHub org and find your ae-11 (repo name will be suffixed with your GitHub name).
  • Clone the repo in RStudio, run renv::restore() to install the required packages, open the Quarto document in the repo, and follow along and complete the exercises.
  • Render, commit, and push your edits by the AE deadline – end of the day

Recap

  • Web APIs (application programming interface): website offers a set of structured HTTP requests that return JSON or XML files
  • Use pre-written packages in R to access APIs when available
  • Use httr2 to write your own API functions
  • Store API keys securely in .Rprofile or .Renviron