Importing and recoding data

Lecture 8

Dr. Benjamin Soltoff

Cornell University
INFO 5001 - Fall 2025

September 18, 2025

Announcements

Announcements

  • Homework 03
  • Quiz 01

Popstars’ height comparison

Learning objectives

  • Identify common methods for reading data from a file
  • Clean and wrangle data frames to facilitate analysis tasks

Data “wrangling”

A screenshot of a New York Times article.

A screenshot of 'Data Carpentry' by David Mimno.

Reading data into R

  • Local data files
  • Databases
  • Web scraping
  • Application programming interfaces (APIs)

Reading rectangular data

Application exercise

Powerball Lottery

Powerball Lottery

Powerball Lottery

ae-06

Note

  • Go to the course GitHub org and find your ae-06 (repo name will be suffixed with your GitHub name).
  • Clone the repo in Positron, run renv::restore() to install the required packages, open the Quarto document in the repo, and follow along and complete the exercises.
  • Render, commit, and push your edits by the AE deadline – end of the day

Wrap up

Recap

  • Simplify your life – get the data in as simple a format as possible
  • Ensure all data cleaning is reproducible. Do not replace your raw data files.