Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Module 14: Enums and Box

Lecture 32: Friday, April 17, 2026.

In this module, we will learn how we can use Enums and Rust Boxes!

Motivation

In the previous module, we saw how in Python, everything is a reference! One consequence of this is that Python supports lists (and other collections) that contain elements of different types.

For example, the list x below contains a string and a number.

x = ['hello', 10]
print(x[0])
print(x[1])

How is this related to Python references? Well, remember FastVec, and specifically, how FastVec implement get(<index>) to retrieve an element at the given index.

Rust vectors and Python lists allocate a chunk of memory on the heap that they can dynamically resize to add or remove new elements. They keep track of the base pointer / address of this region (i.e., the address of the first element). Let’s call that address base_addr.

Python lists, Rust’s Vec, and even FastVec support accessing an element at some index i in a fast constant-time manner (i.e. O(1)). Let’s think about what a vector does when we execute something like x[i]:

  1. It retrieves base_addr, the address of element 0.
  2. It adds an offset to the base_addr to find the address of element i.
  3. It dereferences the resulting address to retrieve element i.

Imagine if the vector x contains elements of type u8. Each of these element is one byte in size. So, the address of element i would be base_addr + i . On the other hand, if the elements were of type u64, they would be each 8 bytes in size, and so the address of element i would be base_addr + (i * 8).

More generally, if a vector contains elements of a type whose size is b bytes, the formula of the address of the ith element would be base_addr + (i * b).

But, what if the elements were of different types, as in our Python example before? Well, the formula no longer works, cause we do not know what the size is! This is one of the reasons Rust does not allow mixing different arbitrary types inside vectors.

In Python however, the list merely stores a reference to the elements, and while the elements may have different types (and thus different sizes), the references all have the same size – 8 bytes!

So, what can we do in Rust if we want to store elements of different types in a vector? We need to find a way to overcome the issue of having different element sizes! We have two approaches:

  1. Store the elements separately on the heap and only store a reference to them inside the vector itself – similar to Python. This is the Box approach.
  2. Pad the size of each element to the size of the maximum type we want to store in the vector. This is the Enum approach.

Enums

Start by reading sections 6, 6.1, 6.2, and 6.3 in the Rust book.

Rust Enums provide a way for us to say that a value is one from some predefined set of values. E.g., similar to how a bool is either true or false.

This also allows us to say that a value may be one of several types! For example, the enum below can hold a string or a u64.

#[derive(Debug)]
enum StringOrNumber {
  StringCase(String),
  NumberCase(u64)
}

fn main() {
  let x: StringOrNumber = StringOrNumber::StringCase(String::from("hello"));
  let y: StringOrNumber = StringOrNumber::NumberCase(10);
  println!("{:?}", x);
  println!("{:?}", y);
}

Exercise: edit the code above to support a third case for a boolean value!

The enum above may hold a string or a u64. However, in either case, instances of this enum have type StringOrNumber, as we saw above with x and y. It is the same type! This means we can use this enum to have a vector that mixes strings and numbers!

#[derive(Debug)]
enum StringOrNumber {
  StringCase(String),
  NumberCase(u64)
}

fn main() {
  let v = vec![
    StringOrNumber::StringCase(String::from("hello")),
    StringOrNumber::NumberCase(10),
    StringOrNumber::StringCase(String::from("bye"))
  ];
  println!("{:?}", v);
}

Matching on Enums

The one complexity with this approach is that we have to match on every element in that vector to discover whether it is a StringCase or a NumberCase in order to be able to use it.

For example, imagine we wanted to write a function that combines all the elements of a vector of strings and numbers into one big string.

#[derive(Debug)]
enum StringOrNumber {
  StringCase(String),
  NumberCase(u64)
}

fn combine_to_string(v: &Vec<StringOrNumber>) -> String {
    let mut result = String::from("");
    for i in 0..v.len() {
        let e: &StringOrNumber = &v[i];
        match e {
            StringOrNumber::StringCase(the_string) => {
                result += the_string;
            },
            StringOrNumber::NumberCase(the_number) => {
                result += &the_number.to_string();
            }
        }
    }
    return result;
}

fn main() {
  let v = vec![
    StringOrNumber::StringCase(String::from("hello")),
    StringOrNumber::NumberCase(10),
    StringOrNumber::StringCase(String::from("bye"))
  ];
  println!("{}", combine_to_string(&v));
}

Notice how after we retrieve the ith element in the loop and stored it in variable e, we needed to manually match on the possible cases for our enum to be able to retrieve the String or u64 inside e.

This can get a little verbose at times, but it prevents accidental mistakes, since it forces programmers to handle all the possible cases!

How Does This Work?

So, how come that using an Enum this way allowed us to overcome the issue with element sizes in the vector?

Well, let’s look at the size of a StringOrNumber.

#[derive(Debug)]
enum StringOrNumber {
  StringCase(String),
  NumberCase(u64)
}

fn main() {
    println!("The size of a StringOrNumber is {} bytes",
             size_of::<StringOrNumber>());
    println!("The size of a String is {} bytes",
             size_of::<String>());
    println!("The size of a u64 is {} bytes",
             size_of::<u64>());

    let x: StringOrNumber = StringOrNumber::StringCase(String::from("hello"));
    let y: StringOrNumber = StringOrNumber::NumberCase(10);
    println!("Size of x {} bytes", size_of_val(&x));
    println!("Size of y {} bytes", size_of_val(&y));
}

Notice how the size of StringOrNumber is the size of its biggest case! Even when we store a u64 inside the enum (as in y), its size is still padded to match the size of String!

So, with an enum, all the elements in the vector get padded to the maximum size of all their types. Meaning that we can use the formula from before: base_addr + (i * b).

Boxes

Read chapter 15 and 15.1 in the Rust book.

In Rust, a Box allows storing data on the heap while maintaining ownership and permissions for it. A Box is often called a smart pointer. Unlike raw pointers, which may dangle, a Box owns the data on the heap that it points to, and ensures that this data never dangles.

fn main() {
    // Regular variable on the stack.
    let x: String = String::from("hello");
    println!("address of x on the stack {:p}", &x);

    // Now, we use a box to move the data from the
    // stack to the heap!
    // The box is made out of two parts:
    // 1. the String stored on the heap,
    // 2. the address of that String (i.e. a pointer) stored on the stack.
    let b: Box<String> = Box::new(x);
    println!("Address of the box {:p}", &b);
    println!("Address of what's inside the box {:p}", &(*b));

    println!("Size of box (on stack) is {} bytes", size_of_val(&b));
    println!("Size of string inside box (on heap) is {} bytes", size_of_val(&(*b)));
}

Why Are Boxes Useful?

Box allows us to store data on the heap and keep a pointer to it. This has many uses:

  1. If we want to have self-referential data types, e.g, structs or enums that store instances of themselves inside of them. We can use a Box to put the inner instances on the heap, and only store a pointer to it inside the struct. The Condition enum in project 3 is an example of this!
  2. If we want to work with data of dynamic or unknown size: everything on the stack needs to have a fixed, known size at compile time. If we have some data whose exact size is not know ahead of time or may vary, we should store it on the heap using a Box!

How Can We Use a Box to Mix Different Types in a Vec?

Box itself is a generic type. Specifically, when we use it, we must specific what type of data it points to on the heap.

fn main() {
    let b1: Box<String> = Box::new(String::from("hello"));
    let b2: Box<u64> = Box::new(10);
}

So, a box that points to a string and a box that points to a number have different types, and we cannot directly combine them in a vector (what would the type of that vector be? Vec<Box<String>> or Vec<Box<u64>>? Neither work).

However, there is a workaround!

Remember our previous example with an enum. We wanted to have a vector that could store string and numbers with the goal of being able to combine all of them together to form one big string.

If that’s all we need to do, then, we can use Box by thinking creatively and outside the box.

It does not matter to us what exact type the Box points to in this case, what matters is that it is a type that can be turned to a string (so that we can combine it with the other elements).

Fortunately, there is a trait that describes this behavior (or contract)! The ToString trait. So, we can view the elements as boxes that points to some dynamic type that’s unknown ahead of time, but that implements the ToString trait.

We can describe that to Rust by saying Box<dyn ToString>: a Box that points to some dynamic data that implements ToString:

  1. dyn stands for dynamic, and indicates that the exact data type is dynamic: it may not be known ahead of time and may depend on user inputs or other runtime data.
  2. Earlier versions of Rust accept using Box<ToString>. But this is unrecommended. ToString is not an actual type, it is merely a trait. By explicitly using dyn, we make that distinction clear.
fn combine_to_string(v: &Vec<Box<dyn ToString>>) -> String {
    let mut result = String::from("");
    for i in 0..v.len() {
        // we do not know what the type is exactly here,
        // but we know that e refers to some element of some
        // type that implements ToString!
        let e: &Box<dyn ToString> = &v[i];
        result += &e.to_string();
    }
    return result;
}

fn main() {
    // All of the above boxes point to data whose type implements
    // ToString! So, we can refer to all of them as `Box<dyn ToString>`
    let v: Vec<Box<dyn ToString>> = vec![
        Box::new(String::from("hello")),
        Box::new(10),
        Box::new(String::from("byte"))
    ];

    println!("{}", combine_to_string(&v));
}

Why Do We Need a Box For dyn?

Can’t we just say Vec<dyn ToString>?

No!

dyn ToString is not a real, fixed type. It is a dynamic quantity. Crucially, since dyn ToString can be anything (as long as it implements ToString), there is not way for Rust to know what size it will be. Because the size of dyn ToString is unknown at compile time, the vector formula we describe above would not work if the type was Vec<dyn ToString>.

fn main() {
    println!("Size of dyn ToString {}", size_of::<dyn ToString>());
}

It also means that Rust cannot create regular stack-allocated variables of type dyn ToString, since Rust would not know how much space they would need on the stack, or how to delete them when they go out of scope!

fn main() {
    let x: dyn ToString = String::from("hello");
}

Thus, the only way to use a dynamic typed object, such as dyn ToString, is to store it on the heap, where memory can be dynamically allocated. Hence, using a Box.

The technical term for things like dyn ToString (or dyn with any other traits) is a Trait Object Type.

Which Do You Prefer?

There are several tradeoffs between going the Enum route and going the Box + dyn trait route:

  1. Memory: Enums pad all their instances to the maximum size of all their cases. If your Enums may contain types that differ significantly in their size, this may waste a lot of memory. A Box does not pad elements, but it requires storing an additional 8 bytes to keep track of the address/pointer to the heap data.
  2. Speed: Box allocates, access, and deletes data on the heap, which is a little slower than direct access. It may also have some implications on the memory access patterns (and thus caching behavior) of the program.
  3. Complexity: Enums require verbose match statements. A Box with a dyn trait avoids the match statement, but only exposes the behavior or contract codified by the trait in question.

Post-Reading Reflections

  • What if the shared behavior between the different types is more complicated, e.g., you want to be able to use multiple functions that do custom logic? What if no builtin trait offers all this behavior?

  • What if you do not know all the possible types you may want to use or mix into your vector? Alternatively, what if you know that you constantly need to support new types?