Improving LLM output

Lecture 23

Dr. Benjamin Soltoff

Cornell University
INFO 5001 - Fall 2025

November 18, 2025

Announcements

Announcements

  • Project 01 draft

Learning objectives

  • Define prompt engineering
  • Identify best practices for prompt engineering
  • Apply prompt engineering techniques to improve LLM outputs
  • Define Retrieval-Augmented Generation (RAG)
  • Explain how RAG improves LLM outputs
  • Implement a simple RAG system

Application exercise

ae-21

Instructions

  • Go to the course GitHub org and find your ae-21 (repo name will be suffixed with your GitHub name).
  • Clone the repo in Positron, run renv::restore() to install the required packages, open the Quarto document in the repo, and follow along and complete the exercises.
  • Render, commit, and push your edits by the AE deadline – end of the day

🔓 Decrypt the .Renviron.secret.Renviron

  1. Run secret.R
  2. The special phrase is:
    info-5001

Prompt engineering

⌨️ 12_plot-image-1

Instructions

  1. {ellmer} lets you show the model your plots!

  2. Create a basic penguins scatter plot and ask Claude 4 Sonnet to interpret it.

  3. How does it do?

03:00

⌨️ 13_plot-image-2

Instructions

  1. Replace the scatter plot with random noise.

  2. Show this new plot to Claude 4 Sonnet and ask it to interpret it. How does it do this time?

  3. Work with your neighbor to see if you can improve the prompt to get a better answer.

  4. Share your best prompt with the class on this discussion post.

07:00

Prompt engineering

Three questions to ask yourself

  1. Did you use the best models?

  2. Did you clearly explain what you want the model to do in the system prompt?

  3. Did you provide examples of what you want?

System prompt vs. user prompt

Short answer: put instructions and background knowledge in the system prompt.

More tips

Use LLMs to help draft or improve your prompts.

E.g., this input to Claude’s prompt generator:

Make a data science agent that can run R data analysis code via a tool. Make the agent maniacally focused on data quality issues, such as missing data, misspelled categorical values, inconsistent data types, outlier values, impossible values (like negative physical dimensions), etc.

Generates this prompt:

You are a meticulous data science agent with an obsessive focus on data quality. 
You have been given access to an R code execution tool that allows you to run R analysis code. 
Your primary mission is to identify and address data quality issues before conducting any requested analysis.

<dataset_description>
{{DATASET_DESCRIPTION}}
</dataset_description>

<analysis_request>
{{ANALYSIS_REQUEST}}
</analysis_request>

You have access to the following function to execute R code:

<function>
<function_name>run_r_code</function_name>
<function_description>Executes R code and returns the output</function_description>
<required_argument>code (str): The R code to execute</required_argument>
<returns>str: The output from executing the R code, including any plots, summaries, or error messages</returns>
<example_call>run_r_code(code="summary(mtcars)")</example_call>
</function>

## Your Data Quality Obsessions

You must be maniacally focused on identifying and documenting these data quality issues:

1. **Missing Data**: Check for NA, NULL, empty strings, spaces-only strings
2. **Data Type Inconsistencies**: Variables that should be numeric but contain text, dates stored as strings, etc.
3. **Categorical Value Issues**: Misspellings, inconsistent capitalization, extra whitespace, similar values that should be the same
4. **Impossible/Illogical Values**: Negative values for physical dimensions, ages over 150, dates in the future when they shouldn't be, etc.
5. **Outliers**: Values that are technically possible but suspiciously extreme
6. **Duplicate Records**: Exact duplicates or near-duplicates that might indicate data entry errors
7. **Inconsistent Formatting**: Mixed date formats, inconsistent decimal places, mixed units
8. **Range Violations**: Values outside expected or logical ranges

## Workflow

1. **MANDATORY FIRST STEP**: Conduct a comprehensive data quality assessment before any analysis
2. **Document Issues**: Create a detailed inventory of all data quality problems found
3. **Propose Solutions**: Suggest specific remediation steps for each issue
4. **Clean Data**: Implement cleaning steps where appropriate
5. **Verify Cleaning**: Confirm that cleaning steps worked as intended
6. **Conduct Analysis**: Only after data quality is addressed, proceed with the requested analysis
7. **Final Validation**: Double-check that results make sense given the data quality context

## R Code Patterns to Use

For your data quality checks, use comprehensive R code such as:
- `summary()`, `str()`, `head()`, `tail()` for initial exploration
- `is.na()`, `complete.cases()` for missing data
- `duplicated()` for duplicate detection
- `table()`, `unique()` for categorical variable inspection
- `range()`, `quantile()` for outlier detection
- `class()`, `typeof()` for data type verification

Use <scratchpad> tags to plan your data quality assessment strategy before executing any code. 
Think through what specific issues might be present given the dataset description and what R code you'll need to detect them.

Your final response should include:
1. A comprehensive data quality report with specific issues found
2. The R code used for assessment and cleaning
3. Documentation of any data cleaning steps taken
4. The requested analysis results
5. Caveats about how data quality issues might affect interpretation

Remember: You are OBSESSED with data quality. Do not proceed with analysis until you have thoroughly investigated and documented data quality issues. 
If you find serious data quality problems, spend significant effort addressing them before moving to the analysis phase.

Begin your response with <scratchpad> tags to plan your data quality assessment approach, then use function calls to execute R code. 
Provide your final comprehensive response covering both data quality findings and the requested analysis.

More tips

  • Use Markdown headings and XML tags to give structure to your prompts.
  • Use variables to insert dynamic content into your prompts–BUT be aware of prompt injection!
Your task is to provide feedback on a research paper summary.
Here is a summary of a medical research paper:
<summary>
{{SUMMARY}}
</summary>

Here is the research paper:
<paper>
{{RESEARCH_PAPER}}
</paper>

Review this summary for accuracy, clarity, and completeness on
a graded A-F scale.

More tips

Get large prompts out of the code and into separate files.

  • Easier to read (both locally and on GitHub)

  • Easier to read diffs in version control

  • We will do this in one of our exercises later

More tips

(Advanced) Force the model to say things out loud.

E.g., “Use no more than three rounds of tool calls” => “Before answering, note how many tool calls you have made inside tags. If you have made three, stop and answer.”

More tips

See Anthropic’s Prompt Engineering Overview and OpenAI’s OpenAI Cookbook are excellent, and contain lots of tips and examples.

Google’s Prompt Design Strategies may also be useful.

⌨️ 14_quiz-game-1

Instructions

  1. Your job: teach the model to play a quiz game with you:

  2. The user picks a theme from a short list provided by the model.

  3. They then answer multiple choice questions on that theme.

  4. After each question, tell the user if they were right or wrong and why. Then go to the next question.

  5. After 5 questions, end the round and tell the user they won, regardless of their score. Then, start a new round.

  6. Share your best prompt with the class on this discussion post.

12:00

⌨️ 15_coding-assistant

Instructions

  1. Use Claude 3.7 Sonnet to write a function that gets the weather. The first time, use Claude on its own.

  2. Do some basic research for Claude about how to use a specific package to get the weather.

  3. How does Claude do with the same task now?

06:00

Augmented Generation

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG)

How do we find relevant documents?

Answer: word vector embeddings → turn words into vectors

🤴 - 🧔‍♂️ + 💁‍♀️ = ❓

🤴 - 🧔‍♂️ = 👑
👑 + 💁‍♀️ = ❓

🤴 - 🧔‍♂️ = 👑
👑 + 💁‍♀️ = 👸

OpenAI: text-embedding-3-small

embed_openai("dplyr::left_join")
#> [-0.0384574,  0.00796838,  0.04896307, ..., -0.01687562, 0.00051399,  0.01020856]
embed_openai("LEFT JOIN")
#> [-0.0114895,  0.01873610,  0.04436858, ...,  0.0055124, 0.01100459, -0.00588281],
embed_openai("suitcase")
#> [ 0.01323017, -0.00844115, -0.02530578, ..., -0.00054488, -0.0285338, -0.02933492]

Two ways that users encounter RAG

  1. Every prompt you send gets passed through a RAG system and is augmented

  2. The LLM can decide when to call the RAG system

In R…

⌨️ 16_rag

Instructions

Follow the steps in the 16_rag exercise, which are roughly:

  1. First, you’ll create a vector database from R for Data Science (R4DS)

  2. Test out the vector database with a simple query.

  3. Attach a retrieval tool to a chat client and try it in a Shiny app.

15:00

Wrap-up

Recap

  • Prompt engineering is the art and science of crafting effective prompts to get the desired output from LLMs
  • Use the best models, clear system prompts, and examples to improve results
  • Prompt engineering is an iterative process that often requires experimentation and refinement
  • Many techniques and best practices exist to help you get the most out of LLMs
  • RAG improves LLM outputs by adding relevant context
  • RAG systems use vector embeddings to find relevant documents
  • You can implement RAG in R with ragnar

Acknowledgments