Data types and classes

Lecture 8

Dr. Benjamin Soltoff

Cornell University
INFO 5001 - Fall 2023

2023-09-18

Announcements

Announcements

  • Lab 02 due today
  • Homework 02 due Wednesday
  • No class Wednesday

Types and classes

Types and classes

  • Type is how an object is stored in memory, e.g.,

    • double: a real number stored in double-precision floating point format.
    • integer: an integer (positive or negative)
  • Class is metadata about the object that can determine how common functions operate on that object, e.g.,

    • factor
    • date
    • date-time

Types of vectors

You’ll commonly encounter:

  • logical
  • integer
  • double
  • character

You’ll less commonly encounter:

  • list
  • NULL
  • complex
  • raw

We don’t typically think of them as lists, but data frames are lists where every element is a vector and they are all the same length.

Coercing vectors into different types

Base R functions

  • as.logical()
  • as.integer()
  • as.double()
  • as.character()

readr functions

  • parse_logical()
  • parse_integer()
  • parse_double()
  • parse_character()

Coercing vectors into different types

x <- c("3", "5", "alpha")

as.double(x)
Warning: NAs introduced by coercion
[1]  3  5 NA
parse_double(x)
Warning: 1 parsing failure.
row col expected actual
  3  -- a double  alpha
[1]  3  5 NA
attr(,"problems")
# A tibble: 1 × 4
    row   col expected actual
  <int> <int> <chr>    <chr> 
1     3    NA a double alpha 
y <- c("$23", "$17.67", "$123,000")

as.numeric(y)
[1] NA NA NA
parse_number(y)
[1]     23.00     17.67 123000.00

Types of functions

Functions have types too, but you don’t need to worry about the differences in the context of doing data science.

typeof(mean) # regular function
[1] "closure"
typeof(`$`) # internal function
[1] "special"
typeof(sum) # primitive function
[1] "builtin"

Factors

A factor is a vector that can contain only predefined values. It is used to store categorical data.

x <- factor(c("a", "b", "b", "a"))
x
[1] a b b a
Levels: a b
typeof(x)
[1] "integer"
attributes(x)
$levels
[1] "a" "b"

$class
[1] "factor"

Other classes

Just a couple of examples…

Date:

today <- Sys.Date()
today
[1] "2023-09-18"
typeof(today)
[1] "double"
attributes(today)
$class
[1] "Date"

Date-time:

now <- as.POSIXct("2023-09-18 9:55", tz = "EST")
now
[1] "2023-09-18 09:55:00 EST"
typeof(now)
[1] "double"
attributes(now)
$class
[1] "POSIXct" "POSIXt" 

$tzone
[1] "EST"

Application exercise

ae-06

  • Go to the course GitHub org and find your ae-06 (repo name will be suffixed with your GitHub name).
  • Clone the repo in RStudio Workbench, open the Quarto document in the repo, and follow along and complete the exercises.
  • Render, commit, and push your edits by the AE deadline – end of tomorrow.

Recap of AE

  • Vectors have types and classes. In general we don’t need to be too concerned with this distinction, but make sure you use an appropriate type/class for a variable (e.g. don’t store year as a character type)
  • forcats is a powerful package for working with factors
  • Check out the tidyverse and other packages for working with different classes of vectors
    • lubridate for date and date-time objects
    • forecast and zoo for time-series objects
    • stringr for character strings
    • sf for spatial objects

Ariel the cat

My orange tabby cat named Ariel.