Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Module 9: References Basics

In this module, we will learn about:

  1. Using references as opposed to pointers.
  2. Passing data by reference, by move, and by copy.

References

References are really similar to pointers: they are also based on addresses. However, unlike pointers, they are safe to use!

fn main() {
    let x1: i32 = 10;
    let ref_x1: &i32 = &x1;
    println!("address of x1 {:p}", &x1);
    println!("ref_x1 refers to address {:p}", ref_x1);
    println!("ref_x1 refers to values {}", ref_x1);
}

Pointers can be created in many ways:

  1. They can be created by taking the address of an existing value/variable.
  2. They can be created using malloc.
  3. They can be created by manipulating other pointers (e.g. using ptr.add()).

By contrast, references can only be created by taking the address of an existing value or variable! This means that, unlike pointers, we know that references are guaranteed to start as valid address referring to valid memory.

But even if a reference is valid in the beginning, how can Rust know it remains valid over time? To demonstrate this, let’s consider the following code.

fn main() {
    let mut v: Vec<String> = vec![String::from("str1"), String::from("str2")];
    // pointer to the first element.
    let ptr0: *const String = &v[0] as *const String;

    // We are inserting many elements to the vector.
    // This causes the vector to resize and changes the location
    // of its elements in memory.
    for i in 0..10 {
        v.push(format!("str{}", i));
    }

    // Now, the old ptr address is no longer valid.
    println!("address of first element used to be {:p}", ptr0);
    println!("address of first element became {:p}", &v[0]);

    unsafe {
        println!("dereferncing the pointer");
        println!("{}", *ptr0);
        println!("program done!");
    }
}

The above code is unsafe and quite dangerous. The pointer ptr0 becomes dangling after pushing new elements to the vector. Meaning that dereferencing it is dangerous. Indeed, the code crashes when attempting to dereference it.

By contrast, look at the code below. Rust realizes this code is potentially dangerous and does not let us compile it! Specifically, it realizes that after creating ref0, but before using it, the vector is mutated using push, which causes dangerous behavior.

Try to run the code and look at the compilation error.

fn main() {
    let mut v: Vec<String> = vec![String::from("str1"), String::from("str2")];
    // reference to the first element.
    let ref0: &String = &v[0];

    // We are inserting many elements to the vector.
    // This causes the vector to resize and changes the location
    // of its elements in memory.
    for i in 0..10 {
        v.push(format!("str{}", i));
    }

    // Now, the old ptr address is no longer valid.
    println!("{}", ref0);
}

This gives us the first rule of Rust references: we are not allowed to have mutable and const references active at the same time! Rust checks for this as it keeps track of the duration during which a reference is active, i.e., where it is last used.

Contrast the above code with the following one. Here, we print ref0 before mutating the variable. Rust correctly realizes that ref0 is no longer used after printing. So, it is no longer active.

This means we have no active references to v, and can mutate it by pushing.

fn main() {
    let mut v: Vec<String> = vec![String::from("str1"), String::from("str2")];
    // reference to the first element.
    let ref0: &String = &v[0];
    println!("{}", ref0);

    for i in 0..10 {
        v.push(format!("str{}", i));
    }

    println!("done");
}

Rust also does not allow having more than one mutable reference active at the same time, for the same reason. But, it allows having many const references at the same time, since none of them modify the data and are all safe.

fn main() {
    let mut x1: i32 = 10;
    let r1: &i32 = &x1;
    let r2: &i32 = &x1;
    println!("r1 refers to {}", r1);
    println!("r2 refers to {}", r2);
    // This code run because all referens are const.
    // Change one or both references to a mut reference
    // and see what happens!
    // e.g.,
    // let r1: &mut x1 = &mut x1;
}

Rust also ensures that the data that a references refers to remains alive for as long as the reference is active.

We will learn a system based on permissions to help us understand how and why Rust does these checks about references in the next module.

Passing Data by Reference

One of the most popular uses of reference is to use them to simplify and speed up passing data to functions as parameters.

For example, consider the following helper function that returns the index of the mid point of a vector.

fn midpoint(v: &Vec<i32>) -> usize {
    return v.len() / 2;
}

use std::time::{Instant};

fn main() {
    // Make a big vector with 1,000,000 elements.
    let mut my_vec = Vec::with_capacity(1000000);
    for i in 0..1000000 {
        my_vec.push(i);
    }

    let time = Instant::now();
    let mid = midpoint(&my_vec); // pass by ref
    println!("Took {:?}", time.elapsed());
    println!("Mid point element is {}", my_vec[mid]);
}

Compare how long this took to the case where we pass the vector by cloning it.

fn midpoint2(v: Vec<i32>) -> usize {
    return v.len() / 2;
}

use std::time::{Instant};

fn main() {
    // Make a big vector with 1,000,000 elements.
    let mut my_vec = Vec::with_capacity(1000000);
    for i in 0..1000000 {
        my_vec.push(i);
    }

    let time = Instant::now();
    let mid = midpoint2(my_vec.clone()); // pass by clone/copy
    println!("Took {:?}", time.elapsed());
    println!("Mid point element is {}", my_vec[mid]);
}

Passing by ref is much faster: it does not create a copy of the element of the vector. It merely passes the address of that vector to the function. On the other hand, clone() copies the elements of the vector one by one and puts them in a new vector, and passes it to the function. Which takes a lot more time and space.

Alternatively, we can try to pass the vector by move.

fn midpoint3(v: Vec<i32>) -> usize {
    return v.len() / 2;
}

use std::time::{Instant};

fn main() {
    // Make a big vector with 1,000,000 elements.
    let mut my_vec = Vec::with_capacity(1000000);
    for i in 0..1000000 {
        my_vec.push(i);
    }

    let time = Instant::now();
    let mid = midpoint3(my_vec); // pass by move
    println!("Took {:?}", time.elapsed());
    println!("Mid point element is {}", my_vec[mid]);
}

Try to compile the above code, you will notice that the compiler produces an error. specifically, that my_vec can no longer be used after calling midpoint3, because it has been moved! Moving passes ownership of the data over to the function completely. Moving does not create a new copy of the data, so its performance is close to passing by ref, at the same time, it allows the function to have full control and ownership of the data, unlike a reference.

We can fix the above compiler error by changing the function slightly, e.g., so that it returns the mid element.

fn midpoint3(v: Vec<i32>) -> i32 {
    let mid = v[v.len() / 2];
    return mid;
}

use std::time::{Instant};

fn main() {
    // Make a big vector with 1,000,000 elements.
    let mut my_vec = Vec::with_capacity(1000000);
    for i in 0..1000000 {
        my_vec.push(i);
    }

    let time = Instant::now();
    let mid = midpoint3(my_vec); // pass by move
    println!("Took {:?}", time.elapsed());
    println!("Mid point element is {}", mid);
}

Compared to ref, it may appear like move is slower. In reality, this difference is because move passes ownership of the vector to midpoint3, so, when midpoint3 is complete, the vector goes out of scope and gets freed/destroyed, which takes some time. We can ask Rust to not destroy the vector in order to focus on the time required to pass by move only.

fn midpoint3(v: Vec<i32>) -> i32 {
    let mid = v[v.len() / 2];
    // Do not destroy/free v.
    std::mem::forget(v);
    return mid;
}

use std::time::{Instant};

fn main() {
    // Make a big vector with 1,000,000 elements.
    let mut my_vec = Vec::with_capacity(1000000);
    for i in 0..1000000 {
        my_vec.push(i);
    }

    let time = Instant::now();
    let mid = midpoint3(my_vec); // pass by move
    println!("Took {:?}", time.elapsed());
    println!("Mid point element is {}", mid);
}

Now, the performance of move is comparable to reference.

Finally, remember that you can also by data by mut reference, and not just regular reference.

fn add_0_to_vec(v: &mut Vec<i32>) {
    v.push(0);
}
fn main() {
    let mut v: Vec<i32> = Vec::new();
    add_0_to_vec(&mut v);  // by mut ref
    println!("{:?}", v);
}

In summary, we have the following way to pass data to a function:

  1. pass by copy/clone:
    • pros: gives the function a separate copy of the data it can control and modify without affecting the original data.
    • cons: slow and uses extra memory.
  2. pass by ref:
    • pros: very fast.
    • cons: if const ref, the function cannot modify the data, if mut ref, the function can modify the data and the changes will affect the original data and original function.
  3. pass by move:
    • pros: very fast and gives the function control and ownership of the data.
    • cons: the data is moved to the new function; it cannot be used in the original function again.

Remember these pros and cons! We may ask you about them in the midterm :)

Consider the following exercise questions:

  1. If you are asked to build a function that prints a given vector. Would you choose to pass the vector by clone, ref (const or mut), or move? Why?
  2. If you are asked to build a function that removes all even numbers from a vector, how would you pass the vector and why?
  3. If you are asked to build a function that creates a sorted copy of a vector while keeping the original vector unchanged, how would you pass the vector? why?