DS 210 A1 - Programming for Data Science
This course builds on DS110 (Python for Data Science) by expanding on programming language, systems, and algorithmic concepts introduced in the prior course. The course begins by exploring the different types of programming languages and introducing students to important systems level concepts such as computer architecture, compilers, file systems, and using the command line. It introduces Rust, a safe and high performance compiled language, and shows how to use it to implement and understand a number of fundamental data structures, algorithms, and systems concepts.
While DS110 focuses on writing small, standalone Python scripts for data science, DS210 aims to expose students to designing and implementing larger programs and software packages, as well as testing, optimizing, and evaluating these programs.
Prerequisites: CDS 110 or equivalent
Learning Objectives
By the end of this course students will:
- Be familiar with basics of computer organization and how it affects correctness and performance of programs, including the basic computer structure, memory management and safety, and basic concurrency and synchronization.
- Understand how to implement, evaluate, and optimize high performance code.
- Become more comfortable designing and implementing moderately complex software packages.
- Understand how to evaluate and improve the quality of code with respect to readability, maintainance, performance, and modularity.
- Learn basic data structures and algorithmic concepts, e.g., vectors, linked lists, stacks, and hashmaps.
Why are these concepts important?
Jobs/careers: Given the increasing competativness of the job market, it is crucial to have a strong technical background in building good quality and performant programs and software to acquire and succeed in data science, data engineering, and software engineering jobs at reputable employers. This course teaches fundemental concepts and practical skills crucial for such employers and jobs. Students who do not master the material in this course will struggle to succeed in such jobs (or even get their careers started).
Technical interviews: Technical interview almost always include coming up with an effective solution for a data structure or algorithmic problem and implementing it effectively in clean, good quality code. This course helps student practice this skill.
The CDS curiculumn: DS 210 plays an important role in the CDS curiculumn, with many higher-level courses depending on it as a pre-requisites. For many students, it is the only course where they will encounter computer organization and systems concepts that are nontheless crucial for their success in other practical-oriented upper level courses and practicums.
Student growth and technical background: A data scientist or engineer that does not understand the basics of how a computer works cannot effectively understand why their programs behave the way that they do. The material in this course will help students move past the stage where the data science tools they use are mystery boxes. Instead, students will have a greater understanding of how and why they work, and the behind-the-scenes reasons for why they are designed the way that they are. Thus, helping student use these tools (and learn new ones) more effectively.
Lectures and Discussions
A1 Lectures: Mondays, Wednesdays, and Fridays 12:20pm -1:10pm, WED 130 (2 Silber Way)
Section A Discussions:
- A2: Wednesdays, 1:25pm – 2:25pm, FLR 122 (808 Commonwealth Ave)
- A3: Wednesdays, 2:30pm – 3:20pm, IEC B10 (888 Commonwealth Ave)
- A4: Wednesdays, 3:35pm – 4:25pm, IEC B10 (888 Commonwealth Ave)
Note: There are two sections of this course, they cover similar materials, however, their schedules, homework, and discussion sections are different. They are not interchangeable. You must attend the lecture and discussion section you are register for!
Consistently attending and participating in both lectures and discussions is expected and constitute a sizable part of your grade in this course.
Course Content Overview
- Part 1: Why Rust and why should you care? Foundations: command line, git, and Rust basics syntax and features. (Weeks 1-3)
- Part 2: Core Rust concepts. Evaluating code quality and performance. (Weeks 4-5)
- Midterm 1 (~Week 5)
- Part 3: Memory management. Data structures and algorithms. (Weeks 6-10)
- Midterm 2 (~Week 10)
- Part 4: Advanced Rust. Parallelism and Concurrency. (~Weeks 11-13)
- Part 5: Data Science & Rust in Practice (~Weeks 14-15)
- Final exam during exam week
For a complete list of modules and topics that will be kept up-to-date as we go through the term, see the lectures schedule and the list of homework and exam deadlines.
Course Format
Lectures: our lectures will involve extensive hands-on practice. Each class includes:
- Interactive presentations of new concepts, including live coding and visualizations
- Small-group exercises and problem-solving activities
- Discussion and Q&A
Because of this active format, regular attendance and participation is important and counts for a significant portion of your grade (15%).
Discussions: our TAs will work with you on technical interview-style programming exercises in small groups, provide homework support, and will be used occasionally for oral code reviews for homework solutions.
The discussions count towards the attendance portion of your grade.
Pre-work: we will generally assign short, light readings or small exercises ahead of class to help you prepare for in-class activities, and may include short quizes (graded to completion) to help us keep track of the class's progress. We will also periodically ask for feedback and reflections on the course between lectures.
Homework/Mini Projects: a key goal of this course is to have you write significant code, both in size and complexity. The only way to master programming is with practice. You can find a tentative schedule of our assignments and their deadlines at this page.
These are split into these categories. Together they constitute 40% of your grade:
-
Homework: we will have small weekly assignments for the first 4 weeks. These will help you get set up and familiarize yourself with the tools we will use throughout the course.
-
Mini projects: after the first 4 weeks, the course will move towards mini projects. There are group assignments for groups of 2-3 students. They span multiple weeks and contain multiple parts that build on each other, each part will be due one week at a time.
-
Code reviews/oral examination: We will also conduct code reviews/oral examination with every group about their solution during the disucssion section immediately following their due date. This process will mimic code review practices in industry and offer students feedback about how they improve their solutions and programming skills. They also serve as a check to ensure students only hand in code they have written (and understood) themselves and that group members are collaborating effectively.
-
Corrections/feedback: Students will have the option to submit revised solutions in up to two mini projects of their choosing after their code reviews. This allows them to address feedback given by the teaching staff during review, earning up to 50% of the missed grade. This mimics the process for improving, approving, and merging code in the industry.
Find more details about the mini projects, code reviews, and their policies here.
Exams Two midterms and a cumulative final exam covering the concepts we see in class and in the homework and mini projects. They also include short hand-coding problems (which we will practice in class!). The exams constitute 30% of the final grade.
The course emphasizes learning through practice, with opportunities for corrections and growth after receiving feedback on assignments.
Time Commitment
In a typical week, students are expected to attend the three lectures and the disucssion section (~3.5 hours) and allocate 1-1.5 hours for class pre-work. This is a coding heavy course, and students will need to allocate 7-8 hours per week on average to work on the assignment homework and mini projects.
The best way to practice for the exams is by doing the homework and mini projects and engaging in the lecture pre-work and in-class activities. However, we will adjust the assigned workload during the week of exams to allow students some time to review the lecture notes and material, if they so wish.
In total, we expect students to allocate 12-13 hours per week to DS 210 between attending lectures, discussion, and assignments.
This is a programming heavy course that push students to practice and improve their programming skills. We believe this is crucial to the student's careers, growth, and job prospects. In return, the teaching staff commits to dedicating themselves to helping the students in and outside of class, and providing them with feedback, resources, and guidance to ensure they succeed.
We are on your side as you battle programming, computers, and the Rust compiler.