Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Mini Project 3: Build your own Vec!

In this assignment, you will:

  1. Build two versions of vector, we will start with an easier but slower one, and then implement a more complex but much faster one.
  2. Get familiar with memory management including manual memory allocations and pointers.

Part 1: SlowVec

Your task in part 1 of the project is to complete the implementation of SlowVec (located in project_3_vec/slow_vec/src/lib.rs). Our very first implementation of our own vector type.

This implementation will be a little slower, but it is easier to implement, so we will start with it.

You will implement a faster version in part 2.

Step 0 - GitHub Repository Setup

You should already have a fork of our GitHub repository from an earlier project. Synchronize your fork with our repository to receive the latest code, then pull main to your local machine. You should confirm that your fork contains project_3_vec.

Create a branch called vec off of main. You will do all of your work on this branch and submit it at the end.

Step 1 - Getting Started

Open the slow_vec folder using VSCode and navigate to slow_vec/src/main.rs.

main.rs

First, run the main function inside that file using VSCode or using the following command:

cd project_3_vec/slow_vec
cargo run --bin main

You will notice the program prints some output and then crashes with an error. This is because you have not completed the implementation of this Part 1 yet!

Let’s understand what this file contains.

fixed_sized_array(): The fixed_sized_array() function (line 9) demonstrates the FixedSizeArray type does and how it works. FixedSizeArray is fully implemented for you, you do not have to change it. Although you are welcome to read its source code (located under project_3_vec/fixed/src/lib.rs) if you are curious.

FixedSizeArray<i32> provides 4 important functions:

  1. ::allocate(n: usize): creates a new FixedSizedArray that can store exactly n elements. It cannot store any more!
  2. .put(element: i32, i: usize): stores the given element at index i in the array.
  3. .get(i: usize) -> &i32: returns a reference to the element located at index i.
  4. .move_out(i: usize) -> i32: moves the element at index i out of the array and return it.

get returns &i32 (the element stays in the array), while move_out returns i32 (the element is moved out and removed).

Lines 17–18 call get twice; line 21 calls move_out. Try uncommenting line 22 to call move_out twice. What happens? What if you call get after move_out?

Finally, slow_vec_push and slow_vec_remove test out .push and .remove

slow_vec_basics(): This function (line 33) demonstrates the SlowVec type:

  1. Line 35 creates a new variable of type SlowVec<i32>. You will have to complete the implementation of this type in the following steps. For now, you just need to get a sense of what its API looks like.
  2. Line 35 already puts 10, 20, and 30 in that SlowVec.
  3. Notice that you can print the content of SlowVec using println!.
  4. Line 38 to 40 iterates of the SlowVec and prints out its elements one at a time.

Behind the scenes, SlowVec is implemented using FixedSizeArray. But it’s implementation is missing two key functions push() and remove(), you will have to implement those in the next step.

slow_vec_push() and slow_vec_remove() demo pushing and removing elements from SlowVec.

They currently produce errors because you do not have push() and remove() implemented. When you do, go back and run these functions to make sure everything is correct!

lib.rs

After you are done running and tinkering with main.rs, let us look at project_3_vec/slow_vec/src/lib.rs.

Lines 11-13 define the SlowVec type using a struct! This is Rust’s way of defining custom types. Feel free to refer back to the Rust book for information about structs!

SlowVec<T> is generic over its element type. For example:

let slow_vec: SlowVec<bool> = SlowVec::from_vec(vec![true, false]);
println!("{slow_vec}");
let element = slow_vec.get(0);  // element has type &bool

The SlowVec struct provides a bunch of methods (lines 16-70). Many of them are provided to you. You do not need to modify these functions and you can assume they are correct:

  1. new(): creates a SlowVec that contains an empty FixedSizeArray of size 0.
  2. into_vec(...) and from_vec(...): transform between SlowVec and Rust’s built in Vec. You do not need to use or understand these functions, we provide them mostly for testing and tinkering.
  3. len(...): returns the current length of the SlowVec.
  4. clear(...): clears all the elements in the SlowVec, resetting its contents to an empty FixedSizeArray.
  5. get(...): retrieves a reference to the element at index i. It does so by simply calling get on the underlying FixedSizeArray.

Step 2 - Implementation

Implement both the push function (line 62) and the remove function.

The trick to both functions is similar. The fixed field is of type FixedSizeArray, so we cannot change its size. We cannot add more elements to it nor remove any elements from it.

Thus, we have to create a new FixedSizeArray with the new desired length, move over the elements from the old fixed to it, as well as adding the new element (in the case of push) or skipping the removed element (in the case of remove) to it, and then replace self.fixed with it.

So, the solution to both functions have a similar structure:

// Create a new FixedSizeArray of a different length
// If pushing, length should be old length + 1
// If removing, length should be old length - 1
// Look at the code in `lib.rs`, is there some function
// that can tell us what the old length is?
let tmp = FixedSizeArray::allocate(<<<<new length>>>>);

// loop over self.fixed and move over its elements to tmp
// either skip the one that should be removed (in case of remove)
// or add the new element to the end of tmp (in case of push)
...

// get rid of the old fixed field and replace it with tmp!
self.fixed = tmp;

You can test out your implementation using slow_vec_push() and slow_vec_remove() in main.rs.

Step 3 - Testing and Experimenting

After finishing both solutions, run the tests:

cargo test -- --test-threads=1

If you pass all the tests, then your solutions are correct and you will get full credit. If you fail some tests, go to the corresponding file under tests/, read the failing test, and try to find out what went wrong so you can fix it.

Also look at the content of src/memory.rs and then run it using this command:

cargo run --bin memory

This program confirms that SlowVec does not mismanaged memory, e.g. by losing track of data or forgetting to remove elements in memory. You have not used any features that corrupts memory. So this is actually guaranteed by Rust.

However, in part 2, you will use some unsafe Rust features that may corrupt memory. These features allows us to implement a faster vector. However, they are dangerous, and if you make mistakes in that implementation, your vector may lose elements or forget to clean them up. memory.rs will come back and help you find and fix such errors in step 2.

Part 2: FastVec

Your task in part 2 of the project is to complete the implementation of FastVec (located in project_3_vec/fast_vec/src/lib.rs).

This implementation will be much faster than part 1, but it will be more complicated to implement.

Step 0 - Background

Before you get to coding, you need to understand our strategy for how to make FastVec be (significantly) faster than SlowVec. This strategy relies on two parts:

  1. Doubling the size of the vector when pushing to a full vector.
  2. Manually managing memory (instead of relying on FixedSizeArray).

Doubling Vector Size on Resize.

The main idea behind push with SlowVec is to create a new FixedSizeArray, whose length is one plus the old length, then moves over all the previous elements to the new, bigger array. This makes it so push always takes O(n) steps, where n is the current length of your vector. However, with Rust’s builtin Vec (and equivalent in other languages, such as a list in Python), push is O(1) (on average). How come?

These faster implementations distinguish two notions:

  1. The length of the vector: the number of elements currently in the vector,
  2. The capacity of the vector: how many elements the vector can have at most before it needs to be resized.

In SlowVec, these two were the same, thus, for every push, the SlowVec needed to be resized. In FastVec, you will notice that the struct contains len and capacity fields. In fact, if you create a new FastVec using FastVec::new(), you will notice it starts with capacity 1 (and len 0, because it is empty). This allows you to push one element “for free”, as the FastVec already has space for it.

The key idea is that when the capacity equals the len, the vector is full, and any future push will need to make it bigger so that it can add a new element to it. In that case, we double the capacity (i.e., the size), rather than just increasing it by 1 (as in SlowVec).

For example, consider starting with a fresh new FastVec, and then issuing 8 pushes to it:

  • FastVec::new returns an empty vector with capacity 1: len = 0, capacity = 1
  • First push: the vector has enough capacity to simply put the element in, without resizing. This push is free, i.e., takes one step. len = 1, capacity = 1.
  • Second push: len equals capacity, the vector is full, we resize it so that capacity = 2*1 = 2, and move all previous elements to the new resized vector (similar to SlowVec), then add the new element at the end. This takes two steps (one to move the previous element, one to add the new element). len = 2, capacity = 2.
  • Third push: len equals capacity, the vector is full, we resize it so that capacity = 2*2 = 4, and move all previous elements. This takes three steps. len = 3, capacity = 4.
  • Fourth push: len < capacity, we simply add the new element at the end. This takes one step. len = 4, capacity = 4.
  • Fifth push: len equals capacity, vector is full, resize it to capacity = 4*2 = 8. This takes 5 steps. len = 5, capacity = 8.
  • Sixth, seventh, and eight pushes: len < capacity for each of these pushes, so they each take only one step, after all of them, len = 8 and capacity = 8.

Using this strategy, we pushed 8 elements, and it took 1+2+3+1+5+1+1+1 = 15 steps in total. Which averages out to less than 2 steps per element. In fact, if we use this strategy to push infinitely many elements, the average converges to 1 step per element.

If you want a more visual demonstration of what this looks like, take a look at this helpful video.

Manually Managing Memory

We will not use FixedSizeArray in this part, and instead we will directly manage the underlying memory using pointers. This is more complex to implement (and can be dangerous if not implemented correctly as it can result in various memory corruptions and problems!). However, it gives you way more control about when and how to allocate memory, including allocating memory without using it directly, but rather, saving some of it free for the future.

Using the provided memory allocator library (MALLOC)

To help you with memory management, we provide a memory allocator library as part of the stencil. lib.rs already uses it:

use malloc::MALLOC;

The library provides you with two important functions that you will need to use:

// Allocate new memory with the given size (in bytes).
// Returns a pointer to this newly allocated memory.
unsafe fn malloc(bytes: usize) -> *mut u8

// Free a previously allocated region of memory
// You must pass a pointer than was returned by malloc
// and free will free the corresponding memory (all of it).
// free remembers how many bytes that allocation was, and will
// automatically free all these bytes.
unsafe fn free(ptr: *mut u8)

Here is an example:

// Allocate four bytes
let ptr1: *mut u8 = unsafe { MALLOC.malloc(4) };
// We can now do things with these four bytes
// When we are done, we should free them.
unsafe { MALLOC.free(ptr1) };

Look closely at the type of ptr1, it is *mut u8. u8 represents a single byte in Rust. So, *mut u8 is a mutable pointer to byte(s). But what if we want to deal with other types that are not one byte, for example, an i32? Fortunately, everything boils down to bytes on a computer, because everything is simply zeros and ones. We just need to make sure we have allocated the correct size for that type.

An i32 is 4 bytes in size, so the above exam allocates enough memory for exactly one i32. However, it is not a good idea to try to memorize the sizes of different types. Furthermore, we may not even know what the type is (e.g., FastVec like SlowVec is generic over any type the user of the vector chooses, denoted by T). Fortunately, Rust has a helpful function we can use that tells us the size of a type.

let size_of_i32 = size_of::<i32>();
let size_of_t = size_of::<T>();

Now, we can allocate pointers to these other, more helpful types! We have to use the correct size (in bytes) and then cast the resulting pointer to our desired type using the as keyword:

let ptr1: *mut i32 = unsafe { MALLOC.malloc(size_of_i32) as *mut i32 };
let ptr2: *mut T = unsafe { MALLOC.malloc(size_of_t) as *mut T };

Now, what if we want to allocate memory for more than one element? It’s all bytes: we just need to make sure that we have enough of them.

// ptr3 can hold up to 10 elements of type i32s.
let ptr3: *mut i32 = unsafe { MALLOC.malloc(size_of_i32 * 10) as *mut i32 };
// ptr4 can hold up to n elements of type T.
let ptr4: *mut T = unsafe { MALLOC.malloc(size_of_t * n) as *mut T };

Using the pointers after they are allocated

Now, we know how to allocate and free pointers. But how do we use them?

There are three ways to use a pointer:

  1. Writing to a pointer with ptr::write(): this overwrites the data at the pointer with the given data. It will not free or destruct the old data, it will simply overwrite it.
  2. Moving data out of a pointer with ptr::read(): this will move out the data pointed to by the pointer and destroy/free that data.
  3. Getting a reference to data pointer to by a pointer: &*[name of your ptr]: this returns a reference to the data, it does not move out nor destroy it.

The difference: &* borrows the data (the pointer remains valid, you can read again), while ptr::read moves it out (the slot is empty until written again).

Importantly, to use these operations correctly, you need to make sure these conditions are upheld:

  1. Never use ptr::read or &* on a pointer before writing with ptr::write() to that pointer (since there is no data yet).
  2. Never use ptr::read or &* on a pointer after ptr::read() (since the data is moved out), until you write to it again.
  3. If a pointer has data, you must first use ptr::read() to move it out, before using ptr::write() to write new data to it.

Rust cannot help you ensure that these conditions are met. Instead, you must do so yourself by thinking hard about your code. In fact, all of these operations are unsafe. These are special Rust operations that Rust allows us to execute to give us flexibility, but Rust cannot guarantee that they will not create problems (the technical term is undefined behavior). Instead, we must ensure that they will not create problems by carefully thinking about the code and ensuring it is correct.

To use these features, we must explicitly use the unsafe keyword, to tell Rust that we know what we are doing. If we do not use unsafe, the Rust compiler will give an error about these operations.

let my_pointer: *mut i32 = unsafe { MALLOC.malloc(size_of::<i32>()) as *mut i32 };

// This tells Rust we know what we are doing and we want to use unsafe
// operations.
unsafe {
    // First write.
    ptr::write(my_pointer, 5);  // Now, my_pointer points to 5.

    // Let us say we want to overide the data with 10.
    // We should first move the previous value out!
    // Then write to it.
    let _old_data = ptr::read(my_pointer);
    ptr::write(my_pointer, 10);


    let v: &i32 = &*my_pointer;
    println!("{v}");  // will print 10

    let v2: i32 = ptr::read(my_pointer);  // the data has been moved out of my_pointer
    // You should not try to read data from my_pointer again
    // (although it might still work with i32, but with more complicated types, it will not!)
    println!("{v2}");  // will print 10
}

What about when the pointer points to many elements? We can use add() to select which element we want to look at.

let my_pointer_2: *mut i32 = unsafe { MALLOC.malloc(size_of::<i32>() * 10) as *mut i32 };

unsafe {
    // This points to element at index 2, i.e. the third element.
    let ptr_2: *mut i32 = my_pointer_2.add(2);
    ptr::write(ptr2, 33);
    let this_is_33: i32 = ptr::read(ptr2);

    // This points to the element at index 9, i.e., the last element.
    let ptr_9: *mut i32 = my_pointer_2.add(9);
    ptr::write(ptr_9, 99);
    println!("{}", &*ptr_9);

    // add() is unsafe because if you misuse it, you might create problem,
    // what do you think will happen with this code?
    let ptr_10: *mut i32 = my_pointer_2.add(10);
    println1("{}", &*ptr_10);
}

main.rs and memory.rs

Just like in part 1, we provide fast_vec/src/main.rs and fast_vec/src/memory.rs to serve as a playground for you to experiment with MALLOC and with dealing with pointers. First, navigate to the fast_vec folder:

cd project_3_vec/fast_vec

Then you can run them using these commands:

cargo run --bin main
cargo run --bin memory

Take a look at main.rs, and specifically, the malloc_and_ptr() function. Run it and try to understand what it outputs. Feel free to modify it or add more code to it to experiment with other cases.

When you are confident that you understand the basics of MALLOC and pointers, move on to the next step.

Step 1 - Implement get

Implement get on your vec branch.

Start by assuming that the vector already has enough elements pushed to it. How would you use the pointer operations described in the background above to retrieve the element at index i from self.ptr_to_data?

Hint: You only want to get the data, you do not want to move out or destroy it. Users of your vector should be able to get the data again in the future if they want to. Consider using &*. Hint: You will need to use unsafe and you will need to use self.ptr_to_data.add([your index]).

You can try out your implementation using main.rs to see if it works.

After you make the basic version work, consider what would happen if the callers provided a bad index. E.g., if i is greater than or equal to self.len. Rather than returning random/garbage data or causing memory issues, it is best you create an error to inform users that they are using get wrong and providing a bad index.

Consider adding this code below to the top of your function implementation.

if i >= self.len {
    panic!("FastVec: get out of bounds");
}

After you are done, run the following test, and if it passes, move on to the next part.

cargo test get_strings -- --test-threads=1

Step 2 - Implement push and remove.

Implement remove and push on your vec branch.

Implement remove. Unlike in part 1, you do not need to resize the vector, since FastVec can support having a capacity different than its length. Instead, you can simply remove the element at the given index, and then move every element after it one step backwards. For example, assume you had a vector with [1, 3, 5, EMPTY] and remove(1) is called, you can implement your code such that the vector becomes [1, 5, EMPTY, EMPTY] without having to allocate new memory or resize the vector.

Consider what should happen if callers provide an index that is outside the length of the vector. Similar to get, you should panic in this case using this error message:

panic!("FastVec: remove out of bounds");

Hint: Remember to first move out the element to be removed using ptr::read. Hint: For every index j greater than the index you want to remove, you should read the element at index j using ptr::read, then write it to index j-1 using ptr::write.

Implement push. You should use the size doubling strategy we described above in the background.

Hint: Use MALLOC.malloc to allocate new memory of twice the size. Hint: Move over all the elements from the previous pointer to the new pointer using ptr::read and ptr::write Hint: Do not forget to write the new element using ptr::write and to update self.ptr_to_data, self.len, and self.capacity.

Run the tests using

cargo test -- --test-threads=1

If your implementation of both push and remove is correct, all the tests should pass except one test called clear_tracker (leave that one for the last steps).

If other tests fail, look at their code (you can find it under tests/, or by using grep -R "[name of test]").

Step 3 - Fix bug in clear

You will notice that the clear_tracker test fails even when you implement both push and remove correctly. This is because we have intentionally left a bug in the clear function we provided. You need to fix this bug.

Look at the output of running memory.rs. You will notice that some data is not free-ed even after calling clear(). You need to make sure that data is free-ed.

Hint: remember the rules of using pointers above. How can you destroy the data the pointer is pointing to? Hint: remember ptr::read()?

Step 4 - Run the provided experiment

Use the following command to run the provided experiment:

cargo run --bin experiment --release

This will produce a plot under plot.png in the fast_vec folder. Look at the plot. In this plot, the x-axis represents the size of the vector, and the y-axis represents the time needed to push one element to the vector with that size. For FastVec, the time is reported in microseconds, for SlowVec the time is in hundreds of microseconds.

Make sure you understand what you are seeing. E.g., which of the two implementations is faster, and what trends you see as vector sizes grow.

Step 5 - Submission

Make sure all of your code is in the vec branch and that you have pushed it to your fork on GitHub.

Please make sure your code compiles and passes all the tests and that you are able to run the experiment successfully.

When you are done, open a pull request from vec to main. Your instructor will review it.

Also submit your repository via Gradescope.