DS 210 - Programming for Data Science
This course builds on DS110 (Python for Data Science) by expanding on programming language, systems, and algorithmic concepts introduced in the prior course. The course begins by exploring the different types of programming languages and introducing students to important systems level concepts such as computer architecture, compilers, file systems, and using the command line. It introduces Rust, a safe and high performance compiled language, and shows how to use it to implement and understand a number of fundamental data structures, algorithms, and systems concepts.
While DS110 focuses on writing small, standalone Python scripts for data science, DS210 aims to expose students to designing and implementing larger programs and software packages, as well as testing, optimizing, and evaluating these programs.
Prerequisites: CDS 110 or equivalent
Learning Objectives
By the end of this course students will:
- Be familiar with basics of computer organization and how it affects correctness and performance of programs, including the basic computer structure, memory management and safety, and basic concurrency and synchronization.
- Understand how to implement, evaluate, and optimize high performance code.
- Become more comfortable designing and implementing moderately complex software packages.
- Understand how to evaluate and improve the quality of code with respect to readability, maintenance, performance, and modularity.
- Learn basic data structures and algorithmic concepts, e.g., vectors, linked lists, stacks, and hashmaps.
Why are these concepts important?
Jobs/careers: Given the increasing competitiveness of the job market, it is crucial to have a strong technical background in building good quality and performant programs and software to acquire and succeed in data science, data engineering, and software engineering jobs at reputable employers. This course teaches fundamental concepts and practical skills crucial for such employers and jobs. Students who do not master the material in this course will struggle to succeed in such jobs (or even get their careers started).
Technical interviews: Technical interviews almost always include coming up with an effective solution for a data structure or algorithmic problem and implementing it effectively in clean, good quality code. This course helps students practice this skill.
The CDS curriculum: DS 210 plays an important role in the CDS curriculum, with many higher-level courses depending on it as a prerequisite. For many students, it is the only course where they will encounter computer organization and systems concepts that are nonetheless crucial for their success in other practical-oriented upper level courses and practicums.
Student growth and technical background: A data scientist or engineer that does not understand the basics of how a computer works cannot effectively understand why their programs behave the way that they do. The material in this course will help students move past the stage where the data science tools they use are mystery boxes. Instead, students will have a greater understanding of how and why they work, and the behind-the-scenes reasons for why they are designed the way that they are. Thus, helping students use these tools (and learn new ones) more effectively.
Lectures and Discussions
Lectures: Tuesdays, Wednesdays, and Thursdays 1:00pm – 3:30pm, 808 Commonwealth Ave FLR 123
Discussions: Tuesdays and Thursdays 4:00pm – 5:00pm, 590 Commonwealth Ave SCI 115
Consistently attending and participating in both lectures and discussions is expected and constitutes a sizable part of your grade in this course.
Course Content Overview
- Part 1: Why Rust and why should you care? Foundations: command line, git, and Rust basics syntax and features. (Weeks 1-2)
- Part 2: Core Rust concepts. Memory management. Data structures and algorithms. (Weeks 3-4)
- Midterm (~Week 4)
- Part 3: Advanced Rust. Parallelism and concurrency. Data science with Rust. (Weeks 5-6)
- Final exam (Week 6)
For a complete list of modules and topics that will be kept up-to-date as we go through the term, see the lectures schedule and the list of deadlines.
Course Format
Lectures: our lectures will involve extensive hands-on practice. Each class includes:
- Interactive presentations of new concepts, including live coding and visualizations
- Small-group exercises and problem-solving activities
- Discussion and Q&A
Because of this active format, regular attendance and participation is important and counts for a significant portion of your grade (20%).
Discussions: the instructor will work with you on programming exercises and provide project support, and will be used for code reviews.
The discussions count towards the attendance portion of your grade.
Pre-work: we will generally assign short, light readings or small exercises ahead of class to help you prepare for in-class activities, and may include short quizzes (graded to completion) to help us keep track of the class’s progress. We will also periodically ask for feedback and reflections on the course between lectures.
Mini Projects: a key goal of this course is to have you write significant code, both in size and complexity. The only way to master programming is with practice. You can find a tentative schedule of our assignments and their deadlines at this page.
These constitute 50% of your grade:
-
Mini projects: we will have 5 mini projects throughout the course. These are individual assignments that span multiple weeks and are due one week at a time.
-
Code reviews/oral examination: We will also conduct code reviews/oral examination with every student about their solution during the discussion section immediately following their due date. This process will mimic code review practices in industry and offer students feedback about how they improve their solutions and programming skills. They also serve as a check to ensure students only hand in code they have written (and understood) themselves.
Find more details about the mini projects, code reviews, and their policies here.
Exams: One midterm and a cumulative final exam covering the concepts we see in class and in the mini projects. The exams constitute 30% of the final grade.
The course emphasizes learning through practice and receiving feedback on assignments.
Time Commitment
In a typical week, students are expected to attend the three lectures and the one to two discussion sections weekly (~9 hours) and allocate ~0.5-1 hours for class pre-work. This is a coding heavy course, and students will need to allocate 6-8 hours per week on average to work on the mini projects.
The best way to practice for the exams is by doing the mini projects and engaging in the lecture pre-work and in-class activities. However, we will adjust the assigned workload during the week of the midterm and final to allow students some time to review the lecture notes and material, if they so wish.
In total, we expect students to allocate 16-20 hours per week to DS 210 between attending lectures, discussion, and projects.
This is a programming heavy course that pushes students to practice and improve their programming skills. We believe this is crucial to students’ careers, growth, and job prospects. In return, the teaching staff commits to dedicating themselves to helping the students in and outside of class, and providing them with feedback, resources, and guidance to ensure they succeed.
We are on your side as you battle programming, computers, and the Rust compiler.