02:00
Lecture 1
Cornell University
INFO 5001 - Fall 2023
2023-08-21
Dr. Benjamin Soltoff
Lecturer in Information Science
Gates Hall 216
02:00
Data science is an exciting discipline that allows you to turn raw data into understanding, insight, and knowledge.
[A]n interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract or extrapolate knowledge and insights from noisy, structured and unstructured data, and apply knowledge from data across a broad range of application domains1
We’re going to learn to do this in a tidy
way – more on that later!
This is a course on computing applications for data science workflows
Or more like demo for today…
https://info5001.infosci.cornell.edu/
All linked from the course website:
Important
Make sure you can access RStudio Workbench before lab on Friday.
Prepare: Introduce new content and prepare for lectures by completing the readings
Participate: Attend and actively participate in lectures and labs, office hours, team meetings
Practice: Practice applying statistical concepts and computing with application exercises during lecture, graded for completion
Perform: Put together what you’ve learned to analyze real-world data
Category | Percentage |
---|---|
Homework | 30% |
Project | 30% |
Labs | 15% |
Exam | 15% |
Application Exercises | 10% |
See course syllabus for how the final letter grade will be determined.
I want this course to be accessible to students with all abilities. Please feel free to let me know if there are circumstances affecting your ability to participate in class.
Only work that is clearly assigned as team work should be completed collaboratively.
Homeworks must be completed individually. You may not directly share answers / code with others, however you are welcome to discuss the problems in general and ask for advice.
Exams must be completed individually. You may not discuss any aspect of the exam with peers.
We are aware that a huge volume of code is available on the web, and many tasks may have solutions posted
Unless explicitly stated otherwise, this course’s policy is that you may make use of any online resources (e.g. RStudio Community, StackOverflow, generative AI such as ChatGPT or Copilot etc.) but you must explicitly cite where you obtained any code you directly use or use as inspiration in your solution(s).
Any recycled code that is discovered and is not explicitly cited will be treated as plagiarism, regardless of source.
All code must be written by you, the human being.
Ask if you’re not sure if something violates a policy!