Project description
Important dates
- Proposal due Thu, Oct 10th
- Exploration due Thu, Oct 31st 🎃
- Draft product due Thu, Nov 21st
- Peer review due Fri, Nov 22nd
- Presentation + slides due on Fri, Dec 6th
- Final report + product due on Mon, Dec 9th
The details will be updated as the project date approaches.
Introduction
TL;DR: Create something related to data science.
This is intentionally vague – part of the challenge is to design a project that showcases best your interests and strengths.
One requirement is that your project should feature some element that you had to learn on your own. This could be a package you use that we didn’t teach in class (e.g., a package for building interactive web applications) or a workflow (e.g., making a package) or anything else.
If you’re not sure if your “new” thing counts, just ask!
Ideas
Identify a goal for your project that leverages the skills you develop in this class. Some possible ideas include:
- Develop educational content introducing and presenting a technical topic from statistics or mathematics (e.g. gradient descent, neural networks, decision trees) and publish as a Quarto website
- Create online tutorials for a specific R package or data science technique using Web Assembly and Quarto Live
- Build a Shiny web application for visualizing and exploring a complex dataset
- Create an R package that provides enhanced functionality for ggplot2
- Build an R package to provide a straightforward interface to an API
- Construct a chatbot and build an API to provide programmatic access
- Develop a machine learning model and deploy it as an API using plumber
Most importantly, be prepared to brainstorm a bunch of ideas and discard them until you settle on a topic that everyone in the team is happy with and feels like a good choice for showcasing what you’ve learned in the class and how you can use that to learn something new and implement for your project.
The project is very open ended. Neatness, coherency, and clarity will count. All analyses must be done in RStudio, using R, and all components of the project must be reproducible.
You will work on the project with your lab teams.
Deliverables
The four primary deliverables for the final project are
- A project proposal with three ideas.
- A final report that explains the process and results.
- A reproducible product in a format based upon the type of project you propose (e.g. R package, interactive web application, custom-built API), with one required draft along the way.
- A presentation with slides.
There will be additional submissions throughout the semester to facilitate completion of the final product and presentation.
The files in your repository are organized as a Quarto Project. This enables easy rendering of all Quarto documents within the project folder with a single command, as well as the ability to share YAML configurations across multiple documents. To render the project go to the Build tab in RStudio, and click on “Render”.
Teams
Projects will be completed in teams of 3-5 students. Every team member should be involved in all aspects of planning and executing the project. Each team member should make an equal contribution to all parts of the project. The scope of your project is based on the number of contributing team members on your project. If you have 4 contributing team members, we will expect a larger project than a team of 3 contributing team members.
Some lab section meetings will be devoted to work on the project, so all teams will be formed within each lab section (i.e. only students in your lab section can be your team members). The course staff will assign students to teams. To facilitate this process, we will provide a short survey identifying study and communication habits. Once teams are assigned, they cannot be changed.
Team conflicts
Conflict is a healthy part of any team relationship. If your team doesn’t have conflict, then your team members are likely not communicating their issues with each other. Use your team contract (written at the beginning of the project) to help keep your team dynamic healthy.
When you have conflict, you should follow this procedure:
Refer to the team contract and follow it to address the conflict.
If you resolve the conflict without issue, great! Otherwise, update the team contract and try to resolve the conflict yourselves.
If your team is unable to resolve your conflict, please contact soltoffbc@cornell.edu and explain your situation.
We’ll ask to meet with all the group members and figure out how we can work together to move forward.
Please do not avoid confrontation if you have conflict. If there’s a conflict, the best way to handle it is to bring it into the open and address it.
Project grade adjustments
Remember, do not do the work for a slacking team member. This only rewards their bad behavior. Simply leave their work unfinished. (We will not increase your grade during adjustments for doing more than your fair share.)
Your team will initially receive a final grade assuming that all team members contributed to your project. If you have a 5-person team, but only 3 persons contributed, your team will likely receive a lower grade initially because only 3 persons worth of effort exists for a 5-person project. About a week after the initial project grades are released, adjustments will be made to each individual team member’s group project grade.
We use your project’s Git history (to view the contributions of each team member) and the peer evaluations to adjust each team members’ grades. Both adjustments to increase or decrease your grade are possible based on each individual’s contributions.
For example, if you have a 4-person team, but only 3 contributing members, the 3 contributing members may have their grades increased to reflect the effort of only 3 contributing members. The non-contributing member will likely have their grade decreased significantly.
I am serious about every member of the team equitably contributing to the project. Students who fail to contribute equitably may receive up to a 100% deduction on their project grade.
Please be patient for the grade adjustments. The adjustments take time to do them fairly. Please know that the instructor handles this entire process himself, and I take it very seriously. If you think your initial group project grade is unfair, please wait for your grade adjustment before you contact us.
The slacking team member
Please do not cover for a slacking/freeloading team member. Please do not do their work for them! This only rewards their bad behavior. Simply leave their work unfinished. (We will not increase your grade during adjustments for doing more than your fair share.)
Remember, we have your Git history. We can see who contributes to the project and who doesn’t. If a team member rarely commits to Git and only makes very small commits, we can see that they did not contribute their fair share.
All students should make their project contributions through their own GitHub account. Do not commit changes to the repository from another team member’s GitHub account. Your Git history should reflect your individual contributions to the project.
Proposal
There are two main purposes of the project proposal:
- To help you think about the project early, so you can get a head start on finding data, reading relevant literature, thinking about the questions you wish to answer, etc.
- To ensure that the topic you wish to analyze, methods you plan to use, and the scope of your analysis are feasible and will allow you to be successful for this project.
Identify 3 topics you’re interested in potentially using for the project. At least two of the three topics must utilize real-world data. If you’re unsure where to find data, you can use the list of potential data sources in the Tips + Resources section as a starting point. It may also help to think of topics you’re interested in investigating and find datasets on those topics.
Write the proposal in the proposal.qmd
file in your project repo.
You must use one of the topics in the proposal for the final project, unless instructed otherwise when given feedback.
Criteria for datasets
The datasets should meet the following criteria:
- At least 500 observations
- At least 8 columns
- At least 6 of the columns must be useful and unique explanatory variables.
- Identifier variables such as “name”, “social security number”, etc. are not useful explanatory variables.
- If you have multiple columns with the same information (e.g. “state abbreviation” and “state name”), then they are not unique explanatory variables.
- You may not use data that has previously been used in any course materials, or any derivation of data that has been used in course materials.
You may not use data from a secondary data archive. In plainest terms, do not use datasets you find from Kaggle or the UCI Machine Learning Repository. Your data should come from your own collection process (e.g. API or web scraping) or the primary source (e.g. government agency, research group, etc.).
Please ask a member of the course staff if you’re unsure whether your dataset meets the criteria.
If you set your hearts on a dataset that has fewer observations or variables than what’s suggested here, that might still be okay; use these numbers as guidance for a successful proposal, not as minimum requirements.
Questions for project mentor
Include specific, relevant questions you have for the project mentor about your proposed topics. These questions should be about the feasibility of the project, the quality of the data, the potential for interesting analysis, etc.
Resources for datasets
You can find data wherever you like, but here are some recommendations to get you started. You shouldn’t feel constrained to datasets that are already in a tidy format, you can start with data that needs cleaning and tidying, scrape data off the web, or collect your own data.
- Awesome public datasets
- CDC
- Chicago Open Data Portal
- Data.gov
- Data is Plural
- Election Studies
- European Statistics
- FiveThirtyEight
- General Social Survey
- Goodreads
- Google Dataset Search
- Harvard Dataverse
- International Monetary Fund
- IPUMS survey data from around the world
- Los Angeles Open Data
- National Weather Service
- NHS Scotland Open Data
- NYC OpenData
- Open access to Scotland’s official statistics
- Pew Research
- Project Gutenberg
- Reddit posts and/or comments
- Sports Reference
- Statistics Canada
- The National Bureau of Economic Research
- UK Government Data
- UNICEF Data
- United Nations Data
- United Nations Statistics Division
- US Census Data
- World Bank Data
- Youth Risk Behavior Surveillance System (YRBSS)
Proposal components
For each topic, include the following:
Problem or question
What is the problem you will solve?
For each topic, include the following:
- A well formulated objective. (You may include more than one idea if you want to receive feedback on different ideas for your project. However, one per topic is required.)
- Statement on why this topic is important.
- Identify the types of variables you will use. Categorical? Quantitative?
- What will be the major product(s)? A published website? An interactive web application a la Shiny? An R package? A deployable API?
Introduction and data
For each dataset (if one is provided):
Identify the source of the data.
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
Write a brief description of the observations.
Address ethical concerns about the data, if any.
Glimpse of data
For each dataset (if one is provided):
- Place the file containing your data in the
data
folder of the project repo. - Use the
skimr::skim()
function to provide a glimpse of the dataset.
Exploration
Settle on a single idea and state your objective(s) clearly. You will carry out most of your data collection and cleaning, compute some relevant summary statistics, and show some plots of your data as applicable to your objective(s).
Write up your explanation in the explore.qmd
file in your project repo. It should include the following sections:
- Objective(s). State the question(s) you are answering or the problem(s) you are solving clearly.
- Data collection and cleaning.1 Have an initial draft of your data cleaning appendix. Document every step that takes your raw data file(s) and turns it into the analysis-ready data set that you would submit with your final project. Include text narrative describing your data collection (downloading, scraping, surveys, etc) and any additional data curation/cleaning (merging data frames, filtering, transformations of variables, etc). Include code for data curation/cleaning, but not collection.2
- Data description. Have an initial draft of your data description section. Your data description should be about your analysis-ready data.
- Data limitations. Identify any potential problems with your dataset.
- Exploratory data analysis. Perform an (initial) exploratory data analysis.
- Questions for reviewers. List specific questions for your project mentor to answer in giving you feedback on this phase.
1 If you are using real-world data. If you are generating synthetic data, define the process here.
2 If you have written code to collect your data (e.g. using an API or web scraping), store this in a separate .qmd
file or .R
script in the repo.
If your project does not make substantial use of real-world data, you should develop your plan for the products. Who is the audience for your product? What functions or features will you need to incorporate? How will you go about designing and implementing these features?
Thorough EDA requires substantial review and analysis of your data. You should not expect to complete this phase in a single day. You should expect to iterate through 20-30 charts, sets of summary statistics, etc., to get a good understanding of your data.
Visualizations are not expected to look perfect at this point since they are mainly intended for you and your team members. Standard expectations for visualizations (e.g. clearly labeled charts and axes, optimized color palettes) are not necessary at the EDA stage.
- Questions for reviewers. List specific questions for your project mentor to answer in giving you feedback on this phase.
Draft
The purpose of the draft and peer review is to give you an opportunity to get early feedback on your analysis. Therefore, the draft and peer review will focus primarily on the exploratory analysis and initial drafts of the final product(s).
Write the draft write-up in the report.qmd
file in your project repo. Be sure to explicitly identify how to access the draft product (e.g. a link to a published web page or Shiny app).
You should have a functional product at this stage, but it is okay to have some incompleteness or partial components. If you have made more progress by this point, then you are likely to receive higher quality feedback.
Peer review
Critically reviewing others’ work is a crucial part of the scientific process, and INFO 5001 is no exception. You will be assigned two teams to review. This feedback is intended to help you create a high quality final project, as well as give you experience reading and constructively critiquing the work of others.
During the peer feedback process, you will be provided read-only access to your partner team’s GitHub repo. You will provide your feedback in the form of GitHub issues to your partner team’s GitHub repo.
Peer review process and questions are outlined in the relevant lab instructions.
Peer reviews will be graded on the extent to which they comprehensively and constructively address the components of the reviewee’s team’s report. Specifics of peer review grading are also outlined in the relevant lab instructions.
Final product
You will create a functioning, working end-product constructed using a reproducible workflow. The form of your product will vary depending on your objectives. Examples of potential products include (but are not limited to):
- A multi-page website constructed using Quarto
- Shiny web application
- R package with published documentation site
- Application programming interface (API) constructed using Plumber and deployed publicly
Regardless of format, the product should be accessible to a public audience. That means users should be able to access the content through a web interface (and not have to clone your Git repo to access and run files).
Your final product must be reproducible. All team members should contribute to the GitHub repository, with regular meaningful commits.
Your final product will be evaluated based on degree of difficulty and execution. You will receive feedback during the proposal stage as to the perceived level of difficulty of your project.
Report
Your written report must be completed in the report.qmd
file.
Before you finalize your write up, make sure the printing of code chunks is off with the option echo: false
in the YAML.
The report should be between 1000-2000 words. There is no expectation that you get close to the upper limit, anywhere in that range is fine as long as you have clearly explained yourself. The limits are provided to help you, not to set stressful expectations.
Be selective in what you include in your final write-up. The goal is to write a cohesive narrative that demonstrates a thorough and comprehensive workflow. This includes (but is not limited to) addressing the following items.
Feel free to add additional sections and/or structure to your report where necessary. We will take that into account when we grade.
Introduction
Identify the project motivation, data, and objectives. What is the context of the work? What problem are you trying to solve? What are your main conclusions?
Justification of approach
Describe the product(s). What did your team create? Who is the intended audience? How will the product(s) meet their needs?
Data description
If using real-world data, describe it. A good model for this is presented in Gebru et al, 2018. Answer any relevant questions from sections 3.1-3.5 of the Gebru et al article, especially the following questions:
- What are the observations (rows) and the attributes (columns)?
- Why was this dataset created?
- Who funded the creation of the dataset?
- What processes might have influenced what data was observed and recorded and what was not?
- What preprocessing was done, and how did the data come to be in the form that you are using?
- If people are involved, were they aware of the data collection and if so, what purpose did they expect the data to be used for?
Design process
Summarize your design process for the product(s). Explain the key design challenges you encountered in creating the main product(s). What were the most important considerations your team faced in designing and constructing the final product?
Limitations
Assess the limitations of your work. What hurdles did you fail to overcome? If you had the opportunity to do this again, how would you improve on your product(s)?
Acknowledgments
Recognize any people or online resources that you found helpful. These can be tutorials, software packages, Stack Overflow questions, peers, and data sources. Showing gratitude is a great way to feel happier! But it also has the nice side-effect of reassuring us that you’re not passing off someone else’s work as your own. Crossover with other courses is permitted and encouraged, but it must be clearly stated, and it must be obvious what parts were and were not done for 5001. Copying without attribution robs you of the chance to learn, and wastes our time investigating.
Appendicies
You are welcome to include an appendix with additional work at the end of the written report document; however, grading will largely be based on the content in the main body of the report. You should assume the reader will not see the material in the appendix unless prompted to view it in the main body of the report. The appendix should be neatly formatted and easy for the reader to navigate. It is not included in the 1000-2000 word limit.
You should submit your appendix(-ces) in the appendices.qmd
file in your project repo.
- At minimum, you should have an appendix for your data cleaning. Submit an updated version of your data cleaning description from phase II that describes all data cleaning steps performed on your raw data to turn it into the analysis-read dataset submitted with your final project. When rendered, it should output the dataset you submit as part of your project (e.g. written as a .csv file).
- (Optional) Other appendices. You will almost certainly feel that you have done a lot of work that didn’t end up in the final report. We want you to edit and focus, but we also want to make sure that there’s a place for work that didn’t work out or that didn’t fit in the final presentation. You may include any analyses you tried but were tangential to the final direction of your main report. Graders may briefly look at these appendices, but they also may not. You want to make your final report interesting enough that the graders don’t feel the need to look at other things you tried. “Interesting” doesn’t necessarily mean that the results in your final report were all statistically significant; it could be that your results were not significant but you were able to interpret them in an interesting and informed way.
Organization + formatting
While not a separate written section, you will be assessed on the overall presentation and formatting of the written report. A non-exhaustive list of criteria include:
- The report neatly written and organized with clear section headers and appropriately sized figures with informative labels.
- Numerical results are displayed with a reasonable number of digits, and all visualizations are neatly formatted.
- All citations and links are properly formatted.
- If there is an appendix, it is reasonably organized and easy for the reader to find relevant information.
- All code, warnings, and messages are suppressed.
- The main body of the written report (not including the appendix) is no longer than 10 pages.
Presentation + slides
Slides
In addition to the written report, your team will also create an oral presentation that summarizes and showcases your project. Using a slide presentation, you will introduce your objective(s) and dataset, showcase visualizations, and discuss the primary outcomes. These slides should serve as a brief visual addition to your written report and will be graded for content and quality.
Your presentation will be created using Quarto, which allows you to write slides using the same reproducible document structure you’re used to.
The slide deck should have no more than 6 content slides + 1 title slide. Here is a suggested outline as you think through the slides; you do not have to use this exact format for the 6 slides.
TODO
- Title Slide
- Slide 1: Introduce the topic and motivation
- Slide 2: Introduce the data
- Slide 3: Highlights from EDA
- Slide 4-5: Inference/modeling/other analysis
- Slide 6: Conclusions + future work
Presentation
Presentations will take place in class during the last lab of the semester. The presentation must be no longer than 10 minutes.
Evaluation
Presentations will be evaluated by the course staff (15 points) and by your peers in your lab section (5 points). Students will receive access to a Google Form where they will provide (confidential) feedback on their peer groups’ presentations. Students will evaluate their own presentations.
Reproducibility + organization
All written work should be reproducible, and the GitHub repo should be neatly organized.
- Points for reproducibility + organization will be based on the reproducibility of the entire repository and the organization of the project GitHub repo.
- The repo should be neatly organized as described above, there should be no extraneous files, all text in the README should be easily readable.
Teamwork
Every team member should make an equal contribution to all parts of the project. Every team member should have an equal experience designing, coding, testing, etc.
At the completion of the project, you will be asked to fill out a survey where you rate the contribution and teamwork of each team member by assigning a contribution percentage for each team member. Working as a team is every team member’s responsibility.
If you are suggesting that an individual did less than half the expected contribution given your team size (e.g., for a team of four students, if a student contributed less than 12.5% of the total effort), please provide some explanation. If any individual gets an average peer score indicating that they underperformed on the project, we will conduct further analysis and their overall project grade may be adjusted accordingly.
Overall grading
Total | 150 pts |
---|---|
Project proposal | 10 pts |
Exploration | 15 pts |
Draft | 10 pts |
Peer review | 5 pts |
Final report | 20 pts |
Final product(s) | 60 pts |
Slides + presentation | 15 pts |
Slides + presentation (peer) | 5 pts |
Reproducibility + organization | 10 pts |
Evaluation criteria
Project proposal
Category | Less developed projects | Typical projects | More developed projects |
---|---|---|---|
Dataset ideas | Fewer than three topics are included. Topic ideas are vague and impossible or excessively difficult to collect. |
Three topic ideas are included and all or most datasets could feasibly be collected or accessed by the end of the semester. Each dataset is described alongside a note about availability with a source cited. |
Three topic ideas are included and all or most datasets could feasibly be collected or accessed by the end of the semester. Each dataset is described alongside a note about availability with (possibly multiple) sources cited. Each dataset could reasonably be part of a data science project, driven by an interesting research question. |
Questions for reviewers | The questions for reviewers are vague or unclear. | The questions for reviewers are specific to the datasets and are based on group discussions between team members. | The questions for reviewers are specific to the datasets and are based on group discussions between team members. Questions for reviewers look toward the next stage of the project. |
Exploration
Category | Less developed projects | Typical projects | More developed projects |
Objective(s) | Objective is not clearly stated or significantly limits potential analysis. | Clearly states the objective(s), which have moderate potential for relevant impact. | Clearly states complex objective(s) that leads to significant potential for relevant impact. |
Data cleaning | Data is minimally cleaned, with little documentation and description of the steps undertaken. | Completes all necessary data cleaning for subsequent analyses. Describes cleaning steps with some detail. |
Completes all necessary data cleaning for subsequent analyses. Describes all cleaning steps in full detail, so that the reader has an excellent grasp of how the raw data was transformed into the analysis-ready dataset. |
Data description | Simple description of some aspects of the dataset, little consideration for sources. The description is missing answers to applicable questions detailed in the “Datasheets for Datasets” paper. |
Answers all relevant questions in the “Datasheets for Datasets” paper. | All expectations of typical projects + credits and values data sources. |
Data limitations | The limitations are not explained in depth. There is no mention of how these limitations may affect the meaning of results. |
Identifies potential harms and data gaps, and describes how these could affect the meaning of results. | Creatively identifies potential harms and data gaps, and describes how these could affect the meaning of results, and the impact of results on people. It is evident that significant thought has been put into the limitations of the collected data. |
Exploratory data analysis | Motivation for choice of analysis methods is unclear. Does not justify decisions to either confirm / update objective and data description. |
Sufficient plots (20-30) and summary statistics to identify typical values in single variables and connections between pairs of variables. Uses exploratory analysis to confirm/update objectives and data description. |
All expectations of typical projects + analysis methods are carefully chosen to identify important characteristics of data. |
Draft
Category | Less developed projects | Typical projects | More developed projects |
Functional prototype | Product is non-functional or broken. | Product is reasonably functional. It need not be perfect or without errors, but is mostly working and includes most substantive parts. | Product is functional and performs without errors. All major components have been incorporated. Still lacks polish and finishing touches. |
Progress | It is unclear whether or not the project will be completed by the deadline. | The team has made progress on the project at this point and is on track to finish by the deadline. | The team has made substantial progress on the project at this point and is on track to finish ahead of the deadline. |
Reproducibility | Source code is unclear. Project files are missing or hard to find. Project files cannot be rendered. |
Source code is easy to read, properly formatted, and appropriately documented. Project files are generally organized in the repository and easy to find. Project files generally render with minimal errors. |
All expectations of typical projects + all required files are provided. Project files (e.g. Quarto, Shiny apps, R scripts) render without issues and reproduce the necessary outputs. |
Peer review
- Peer review issues open
- Reviews are constructive, actionable, and sufficiently thorough
Final report
TODO
Category | Less developed projects | Typical projects | More developed projects |
Introduction | Less focused and organized. They may jump to technical details without explaining why results are important. Research questions are not clearly stated and/or results are not clearly summarized at the end of the introduction. |
Provides background information and context. Introduces key terms and data sources. Outlines research question(s). Ends with a brief summary of findings. |
All expectations of typical projects + clearly describes why the setting is important and what is at stake in the results of the analysis. Even if the reader doesn’t know much about the subject, they know why they care about the results of your analysis. |
Justification of approach | |||
Limitations | The limitations are not explained in depth. There is no mention of how these limitations may affect the meaning of results. |
Identifies potential harms and data gaps, and describes how these could affect the meaning of results. | Creatively identifies potential harms and data gaps, and describes how these could affect the meaning of results, as well as the impact of results on people. |
Final product(s)
TODO
Category | Less developed projects | Typical projects | More developed projects |
Design + visualization | |||
Functionality | |||
Code | |||
Impact |
Slides + presentation
Category | Less developed projects | Typical projects | More developed projects |
---|---|---|---|
Time management | Only some members speak during the presentation. Team does not manage time wisely (e.g. runs out of time, finishes early without adequately presenting their project). | All members speak during the presentation. Team does not exceed the five minute limit. | Team maximally uses their five minutes. Clearly communicates their objectives and outcomes from the project. |
Professionalism | Presentation is slapped together or haphazard. Seems like independent pieces of work patched together. | Presentation appears to be rehearsed. There is cohesion to the presentation. | All elements of typical projects + everyone says something meaningful about the project. |
Slides | Slides contain excessive text and/or content. Team relies too heavily on slides for their presentation. |
Slides are well-organized. Slides are used as a tool to assist the oral presentation. |
All elements of typical projects + graphics and tables follow best-practices (e.g. all text is legible, appropriate use of color and legends). Slides are not crammed full of text. |
Creativity/originality | Project meets the minimum requirements but not much else. Project is incomplete or does not meet the team’s objectives. |
Project appears carefully thought out. Time and effort seem to have gone into the planning and implementation of the project. | All elements of typical projects + project goes above and beyond the minimum requirements. Addresses a truly important social issue or noteworthy goal. |
Content | Ojective is unclear. Product(s) do not clearly address the research question. Limitations are glossed over or ignored entirely. |
Objective is stated. Product(s) address the objective. Limitations are noted. |
Objective is clearly stated. Product(s) clearly address the objective. Limitations are carefully considered and articulated. |
Slides + presentation (peer)
Content: Is/are the objective(s) clearly articulated and can the product(s) accomplish it/them?
Content: Did the team effective meet the objective(s)?
Creativity and critical thought: Is the project carefully thought out? Are the limitations carefully considered? Does it appear that time and effort went into the planning and implementation of the project?
Slides: Are the slides well organized, readable, not full of text, featuring figures with legible labels, legends, etc.?
Professionalism: How well did the team present? Does the presentation appear to be well practiced? Are they reading off of a script? Did everyone get a chance to say something meaningful about the project?
Reproducibility + organization
Category | Less developed projects | Typical projects |
---|---|---|
Reproducibility | Required files are missing. Quarto files do not render successfully (except for if a package needs to be installed). | All required files are provided. Project files (e.g. Quarto, Shiny apps, R scripts) render without issues and reproduce the necessary outputs. |
Data documentation | Codebook is missing. No local copies of data files. | All datasets are stored in a data folder, a codebook is provided, and a local copy of the data file is used in the code where needed. |
File readability | Documents lack a clear structure. There are extraneous materials in the repo and/or files are not clearly organized. | Documents (Quarto files and R scripts) are well structured and easy to follow. No extraneous materials. |
Issues | Issues have been left open, or are closed mostly without specific commits addressing them. | All issues are closed, mostly with specific commits addressing them. |
Late work policy
There is no late work accepted on this project. Be sure to turn in your work early to avoid any technological mishaps.