AE 09: Scraping articles from the Cornell Review

Application exercise

Packages

We will use the following packages in this application exercise.

  • tidyverse: For data import, wrangling, and visualization.
  • rvest: For scraping HTML files.
  • lubridate: For formatting date variables.
  • robotstxt: For verifying if we can scrape a website.
library(tidyverse)
library(rvest)
library(lubridate)
library(robotstxt)

Data scraping

This will be done in the scrape-cornell-review.R R script. Save the resulting data frame in the data folder.