Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Comparing and Evaluating Programs

Lecture 10: Friday, February 13, 2026, and
Lecture 11: Tuesday, February 17, 2026.
Code examples

What Makes a Good Program – How To Evaluate Different Programs That Do The Same Thing

One problem can usually be solved in many many different ways. These differences may stem from following an entirely different algorithm for how to solve the problem, structuring and organizing the code in a different way, or even the unique “artistic” style of the program author.

So, if you are looking at different programs that aim to solve the same problem, how do you know which one is better? In other words, how do you chose which one to use? This also applies to when you are developing the solution yourself. You may brainstorm different competing approaches, but you will eventually have to choose one to implement.

There are several metrics by which you can judge a program. Sometimes, these metrics may be at odds, and it is up to you to prioritize some over others based on the context of the problem at hand. These metrics include:

  1. Correctness: Does the program actually solve the problem? Does it solve it correctly for all possible inputs? Does it produce accurate results consistently? This is often (but not always) non-negotiable, and in this course, you are always expected to implement correct programs.
  2. Performance: How fast does the program run? Faster programs are often desirable because they save money! However, this is not always clear cut to evaluate: some programs may be faster for certain kinds or sizes of inputs while being slower for others.
  3. Memory consumption: How much memory/space does the program consume? This often has implications on program performance, but can be important on its own if dealing with really large inputs that strain the available memory on a computer.
  4. Readability: Is the program easy to understand by someone else if they read it? Is the main idea behind and why it works clear? This is often important in the real world where programs need to be developed and maintained by a team of different programmers.
  5. Ease of Development: How complex is it to implement this program? How long would it take the programmer to implement, optimize, test, and document the program to a satisfactory degree?
  6. Ease of Maintenance or Updating: How easy is it to update the solution if the problem is changed slightly or if the circumstances change a little? Can the solution be generalized to other similar problems or other contexts?

Some of these metrics are more objectives than others. E.g., you can design experiments to benchmark exactly how long a program takes to execute in different setups and workloads (performance), but two reasonable programmers may disagree on how readable or how much effort is required to develop a particular program.

We will explore this further using two examples.

Example 1 - Contains Duplicate

Remember the contains duplicate problem that you had to solve for homework 2. In this problem, we are given a vector with n elements, and we need to discover if some element(s) in it are duplicated. For example, the vector [1, 2, 3, 1, 5] has a duplicate (1) but [1, 3, 2, 5, 4] does not.

The Solutions

There are different ways of solving this problem, we will explore three approaches which you can find in the courses’ code repository.

Solution 1

Our first idea is to use sorting to order the elements of the vector. For example, if our input is [1, 2, 3, 1, 5], we can sort it so that it becomes [1, 1, 2, 3, 5].

Note that after sorting, any duplicates would appear consecutively in the vector. Thus, we can simply check if any consecutive elements are the same.

#![allow(unused)]
fn main() {
fn contains_duplicate(mut numbers: Vec<i32>) -> bool {
    numbers.sort();

    for i in 0..numbers.len() - 1 {
        if numbers[i] == numbers[i+1] {
            return true;
        }
    }
    return false;
}
}

Solution 2

A different idea is to count how many times each element appears in the vector. Essentially, we can construct a hash map that maps each element to how many times it appears in the vector. For example, if the input vector was [1, 2, 3, 1, 5], we would construct the following counts map:

{
  1: 2,
  2: 1,
  3: 1,
  5: 1
}

After constructing this map, finding out whether there are duplicates becomes easy: we can look at all the elements and their counts, and check if any have a count that is greater than 1. Here’s an implementation of this idea below:

#![allow(unused)]
fn main() {
fn contains_duplicate(numbers: Vec<i32>) -> bool {
    // Construct a map from element to its count
    let mut map = HashMap::new();   
    for num in numbers {
        let value_in_map = map.get(&num);
        match value_in_map {
            None => {
                map.insert(num, 1);
            },
            Some(count) => {
                map.insert(num, count + 1);
            }
        }
    }

    // Now that we have counted all of the elements
    // we can check if any have a count that is greater
    // than 1.
    for (num, count) in map {
        if count > 1 {
            return true;
        }
    }
    
    return false;
}
}

Solution 3

We can improve solution 2 by using the following observation: we actually do not need the full count of every element, just whether the element has a count greater than 1. A different way of phrasing it is that while we are iterating through the vector, we just need to know if the current element is one we have seen before or not, and not the exact count of how many times we have seen it.

We can use a HashSet to realize this. Unlike a HashMap, a HashSet stores a set of keys without associating them with any values. Inserting an element into a HashSet as well as retrieving an element from it are both fast, constant-time operations.

You can think of the HashSet<T> as similar to a HashMap<T, bool> that maps elements that exist in the set to true, and does not contain any mappings for elements outside the set.

We can use HashSet to come up with the code below:

#![allow(unused)]
fn main() {
fn contains_duplicate(numbers: Vec<i32>) -> bool {
    let mut set = HashSet::new();
    
    for num in numbers {
        if set.contains(&num) {
            return true;
        }
        set.insert(num);
    }
    
    return false;
}
}

Comparing The Solutions

Performance (Analytical)

Solution 1’s performance depends on sorting. Most good sorting algorithms take O(n log(n)) steps to complete their sort, where n is the size of the vector. The loop following the sort simply goes through the vector once from start to end, and thus takes roughly n steps. So in total, this program needs roughly O(n log(n)) steps.

Solution 2 and Solution 3 on the other hand take O(n) steps: solution two contains two loops: the first one goes over the vector and manipulates the HashMap. Inserting and retrieving elements from a HashMap can each be done in constant time (i.e., roughly one step), thus this loop has roughly O(n) steps. The second loop is also similar.

Solution 3 contains just one simple loop with n steps, since inserting and retrieving elements from a HashSet is also constant time (~ one step).

So, analytically, solution 1 takes O(n log(n)) while solutions 2 and 3 take O(n). So it appears that solutions 2 and 3 are equivalent and solution 1 is slowed.

Performance (Empirical)

The code we provide on GitHub benchmarks the solutions to see how long they take to execute with a vector with 10,000,000 elements and no duplicates. Surprisingly, we find that solution 1 is a little faster than solution 3, and much faster than solution 2. How come?

While the analytical analysis we did above with the big O notation is often very helpful, it is a rough analysis and skips over important details. Specifically, it glosses over what the “steps” actually entail and treats them as the same.

Sorting entails performing a bunch of comparisons (i.e. <) and swaps. These are really fast and simple operations that a computer can execute blazingly quickly. Inserting into a HashMap is more complicated: it involves allocating memory, freeing memory, and hashing keys behind the scenes. While that is constant time (i.e., a fixed number of steps that does not depend on the size of the map), it is more complicated than a single <. Furthermore, log(n) is actually quite a small quantity: for n = 10,000,000, log_2(n) is roughly 23. While a factor of 23 can be significant, the simplicity of the operations associated with it in solution 1 still wins over solution 2. With that said, if n gets big enough, solution 2 will eventually win out, as the log(n) factor grows and eventually over takes the complexity of the operations.

Solution 3 and Solution 1 are much closer in terms of runtime. Solution 3 can be optimized a little further by issuing only one HashSet operation: it turns out insert returns whether the element is new or not, and so we can use that and skip the contains() call.

Furthermore, Solution 3 has a subtle performance advantage: it will stop as soon as it detects a duplicate. imagine if the input vector is very large, but starts with two duplicates immediately. Solution 3 will detect that after two iterations and immediately return, while solutions 1 and 2 will need to complete sorting the entire vector and counting all of its elements, respectively, before getting to detecting the duplicates.

So, which of these solutions is actually faster? Well, that depends on the parameters of the problem:

  1. How big is the vector? If the vector is incredibly large, solution 1 will eventually become slower, but otherwise, it is faster than solution 2.
  2. Do we expect to have a lot of duplicates in our inputs? If so, solution 3 will be faster as it can stop as soon as it detects a duplicate!
  3. Are our input vectors small or medium in size (e.g. millions of elements)? Do they have no or few duplicates? Then, Solution 1 is probably fastest.

Readability, Ease of Development and Maintenance

Which of these programs is simplest to understand? Solution 1 and 3 are both much shorter, so it is easier to understand what they do. However, To understand why solution 1 works, you need to understand how sorting affects duplicates and make them appear in consecutive locations.

What about scenarios where the problem is changed slightly? Say, we want to instead find if an element appears 3 times or more, rather than just duplicates (count of 2)? Solution 3 will no longer, as it does not have the ability to track how many times an element appears. Solution 1 could still work, but now, it will need to check whether any 3 consecutive elements are the same. For example, by changing the loop to:

#![allow(unused)]
fn main() {
for i in 0..numbers.len() - 2 {
  if numbers[i] == numbers[i+1] && numbers[i+1] == numbers[i+2] {
    return true;
  }
}
}

What if we need to find if an element appears 100 times? It is still possible to do that with solution 1, but it can get really hairy: you either need to copy that condition above over 100 times, or use a nested loop to check each 100 elements. This also makes it much slower!

On the other hand, solution 2 is really easy to update to match these new requirements. You can simply change one line in its last loop as follows:

#![allow(unused)]
fn main() {
for (num, count) in map {
  if count > 100 {
    return true;
  }
}
}

We will see another example of this below.

Example 2 - Majority Element

Let’s move on to the majority element problem, which you also had to solve in homework 2.

In this problem, we are given a vector with n elements, and we need to return its majority element. The majority element is the element who appears more than half the times in the vector.

For example, [1, 2, 3, 1, 1, 1] has majority element 1, and [1, 3, 3, 2, 3] has majority element 3. By definition, the majority element has to appear more than n/2 times.

On the other hand, vector [1, 2, 1, 3, 1, 2] does not have a majority element. Sure, 1 is the most common element in it, but it is not the majority as it appears exactly n/2 times, not greater than n/2 times.

The problem states that you can assume that a majority element exists in every vector you are given.

You can find our solutions here.

Solution 1

We can adapt solution 2 from contains duplicate above, which highlights how flexible that approach is.

#![allow(unused)]
fn main() {
fn majority_element(numbers: Vec<i32>) -> i32 {
    let n = numbers.len();

    // Construct a map from element to its count
    let mut map = HashMap::new();   
    for num in numbers {
        let value_in_map = map.get(&num);
        match value_in_map {
            None => {
                map.insert(num, 1);
            },
            Some(count) => {
                map.insert(num, count + 1);
            }
        }
    }

    // Now that we have counted all of the elements
    // we can check if any have a count that is greater
    // than 1.
    for (num, count) in map {
        if count > n / 2 {
            return num;
        }
    }
    
    panic!("no majority element");
}
}

Notice how we only needed to change a couple of lines!

Solution 2

A different approach is to rely on sorting. If a majority element exists (which the problem says we can assume), and we sort the vector, then the majority will be guaranteed to appear at the mid point of the vector.

For example, if you sort [1, 2, 3, 1, 1, 1], you will get [1, 1, 1, *1*, 2, 3]. Similarly, if you sort [1, 3, 3, 2, 3], you will get [1, 2, *3*, 3, 3]. We highlight the middle element (i.e. the element at index n/2) using stars.

Notice that in both cases, the middle element was the majority element.

With some more thinking, you can even mathematically prove this: the majority element appears more than n/2 times. When you sort the vector, all these occurrences appear consecutively. So they must occupy a contiguous chunk of size greater than n/2. There is no way to fit n/2 contiguous elements in a vector of size n without spilling over the mid point!

We can code this solution as follows:

#![allow(unused)]
fn main() {
fn majority_element(mut numbers: Vec<i32>) -> i32 {
    numbers.sort();
    return numbers[numbers.len() / 2];
}
}

Solution 3

Finally, we consider a radically different solution. The majority element is, by definition, very common. So, if we pick a random element, chances are, that element is the majority element.

So, we can pick a random element, check if it is the majority element, return it if it is, and otherwise, repeat the process with a new random element, and so on.

The odds of us getting unlucky and not choosing the majority element after k tries become increasingly smaller and smaller as we make more tries. In fact, you can even prove that the probability of getting unlucky after k tries is at most 0.5^k, since the probability of selecting the majority element at any step is at least 0.5.

To give you a sense of how this probability grows, when k = 1, the probability is less than 1/2. For k = 3, the probability is less than 1/8. For k = 10, the probability is less than 0.001. For k = 100, the probability is so so tiny that I would bet my arm, leg, and favorite guitar against it happening.

We can code this solution up as follows:

#![allow(unused)]
fn main() {
fn majority_element(numbers: &Vec<i32>) -> &i32 {
    loop {
        // println!("Guessing again");
        let random_number = helpers::random_element(v);

        let mut count_of_random_number = 0;
        for i in numbers {
            if random_number == i {
                count_of_random_number += 1;
            }
        }

        if count_of_random_number > numbers.len() / 2 {
            return random_number;
        }
    }
}
}

Comparing Solutions

Performance

Solution 1 and 2 take O(n) and O(n log(n)) steps, for similar reasons to example 1. Solution 3 takes roughly O(k * n) where k is the number of guesses taken until the majority element is found. Realistically, k will be quite small (around ~2 on average), and importantly, it does not depend on how large the vector is.

Run the provide code using these commands and see which are solutions are fastest and by how much

cargo run --bin sol1 --release
cargo run --bin sol2 --release
cargo run --bin sol3 --release

Readability and Understandability

Which of these solutions is easiest to understand? Both in terms of what the code does, and in terms of why it works.

Imagine, you were not provided with the explanation about why sorting results in the majority element showing up in the mid point, or the explanation about the probability of finding the majority element by guessing randomly. Would it be difficult for you to understand these concepts on your own?

Ease of Maintenance

Imagine that the problem was updated in the future so that you cannot assume that the majority element exist. Instead, that you must return a special value, e.g., -1 or None, when a majority element does not exist (or produce an error). How easy is it to modify each of these solutions to do that?

What if the problem is changes to finding the most common element, even if it is not the majority element?

Consider these metrics and pick your favorite solution. As with most things that have to do with taste, this is subjective and there might be many acceptable answers, but you have to be ready to provide your justifications for why that is the case!