An introduction to LLMs

Lecture 23

Dr. Benjamin Soltoff

Cornell University
INFO 5001 - Fall 2024

November 19, 2024

Announcements

Announcements

  • Homework 06
  • Project draft

Language models

Language model

A language model estimates the probability of a token or sequence of tokens occurring within a longer sequence of tokens.

When I hear rain on my roof, I _______ in my kitchen.
Probability Token(s)
9.4% cook soup
5.2% warm up a kettle
3.6% cower
2.5% nap
2.2% relax

Context matters

Context is helpful information before or after the target token.

  • \(n\)-grams
  • Recurrent neural networks (RNNs)
  • Transformer

Large language models (LLMs)

What’s a transformer?

A transformer is a deep learning model that uses attention to weigh the influence of different parts of the input sequence on each other.

  • Encoder
  • Decoder

Self-attention

Self-attention is a mechanism that allows each position in the input sequence to attend to all positions in the input sequence.

How much does each other token of input affect the interpretation of this token?

Self-attention is learned through the training of the encoder and decoder. These models typically contain hundreds of billions or trillions of parameters (weights).

Generating output

LLMs are functionally similar to auto-complete mechanisms.

Given the current token, what is the next most likely token?

My dog, Max, knows how to perform many traditional dog tricks.
___ (masked sentence)
Probability Word(s)
3.1% For example, he can sit, stay, and roll over.
2.9% For example, he knows how to sit, stay, and roll over.

Generating output

  • Sufficiently large LLMs can generate probabilities for paragraphs/essays/code blocks.
  • Responses are probabilistic - the model generates a distribution of possible outputs and probabilistically selects the most likely one.
  • Inherently an element of randomness to this probabilistic draw, hence LLMs do not always generate the same output for the same input.
  • Size of context window matters

Types of inputs/outputs LLMs can accommodate

  • Human language (i.e. text)
  • Code
  • Images
  • Audio
  • Video

Challenges with LLMs

Training

  • Requires an enormous volume of data
  • Extremely time-intensive
  • Consumes enormous computational resources and electricity

Inference/prediction

  • Hallucinations
  • Biases
  • Unethical usage

Foundational LLMs

Foundational LLMs

LLMs trained on enough inputs to generate a wide range of outputs across many domains.

Aka base LLMs or pre-trained LLMs.

Examples of foundational LLMs

LLM Developer Inputs Outputs Access
GPT OpenAI Text, image, data Text Proprietary
DALL·E OpenAI Text Image Proprietary
Gemini Google Text, image, audio, video Text, image Proprietary
Gemma Google Text Text Open
Llama Meta Text Text Open
Claude Anthropic Text, audio, image, data Text, computer control Proprietary
Ministral Mistral Text, image Text Proprietary/open
Phi Microsoft Text Text Open
BERT Google Text Text Open

Accessing foundational LLMs

Use an application programming interface!

Application exercise

ae-20

  • Go to the course GitHub org and find your ae-20 (repo name will be suffixed with your GitHub name).
  • Clone the repo in RStudio, run renv::restore() to install the required packages, open the Quarto document in the repo, and follow along and complete the exercises.
  • Render, commit, and push your edits by the AE deadline – end of the day

Wrap-up

Recap

  • Language models and LLMs are extremely complicated models that can generate text, code, images, audio, and video.
  • Foundational LLMs are pre-trained models that can be used to generate a wide range of outputs.
  • They do not engage in reasoning like humans and are not capable of understanding things the way a human would.
  • Improving LLM performance can be done through prompt engineering.