Entry API and Collections Deep Dive

About This Module

This module is a deep dive into Rust's standard collections beyond Vec and HashMap. It starts with the Entry API for efficient, single-lookup updates and grouping patterns, then introduces BTreeMap and BTreeSet for sorted iteration, range queries, and applications like histograms and nearest-neighbor lookups. It closes with BinaryHeap for priority queues (including min-heap via Reverse), plus a recap of when to reach for each collection and their operation costs.

Prework

Prework Reading

Please read the following:

Pre-lecture Reflections

  1. What's the difference between using .get() then .insert() vs using the Entry API?
  2. When would you want keys to be sorted (BTreeMap) vs unsorted (HashMap)?
  3. Why can't f64 be used directly as a HashMap/BTreeMap key?

Learning Objectives

By the end of this module, you will be able to:

  • Use the Entry API to efficiently update collections without double lookups
  • Choose between HashMap and BTreeMap based on requirements
  • Use BTreeMap for ordered data, range queries, and percentile calculations

Part 1: Mastering the Entry API

The Double-Lookup Problem

A common pattern when updating HashMaps:

#![allow(unused)]
fn main() {
use std::collections::HashMap;

fn count_words_inefficient(text: &str) -> HashMap<String, usize> {
    let mut counts = HashMap::new();

    for word in text.split_whitespace() {
        // DON'T: This does TWO lookups!
        if counts.contains_key(word) {
            let count = counts.get_mut(word).unwrap();
            *count += 1;
        } else {
            counts.insert(word.to_string(), 1);
        }
    }
    counts
}

let result = count_words_inefficient("rust is awesome rust is fast");
println!("{:?}", result);
}

Problem: We look up the key twice - once to check, once to modify.

The Entry API Solution

#![allow(unused)]
fn main() {
use std::collections::HashMap;

fn count_words_efficient(text: &str) -> HashMap<String, usize> {
    let mut counts = HashMap::new();

    for word in text.split_whitespace() {
        // DO: Single lookup with Entry API!
        *counts.entry(word.to_string()).or_insert(0) += 1;
    }
    counts
}

let result = count_words_efficient("rust is awesome rust is fast");
println!("{:?}", result);
}

Understanding Entry

The .entry() method returns an Entry enum:

#![allow(unused)]
fn main() {
use std::collections::HashMap;

let mut map: HashMap<&str, i32> = HashMap::new();

println!("{:?}", map.entry("key"));

// Entry can be Occupied or Vacant
match map.entry("key") {
    std::collections::hash_map::Entry::Occupied(entry) => {
        println!("Key exists with value: {}", entry.get());
    }
    std::collections::hash_map::Entry::Vacant(entry) => {
        println!("Key doesn't exist, inserting...");
        entry.insert(42); // returns the previous value if the key was already present
    }
}

println!("{:?}", map.entry("key"));
}

Entry API Methods

  • entry: Returns an Entry enum, which can be either Occupied or Vacant
  • insert: Inserts a value into the map, returning the previous value if the key was already present, or the inserted value if the key was not present
  • or_insert: Insert default if vacant, return mutable reference
  • or_insert_with: Insert computed value if vacant (lazy evaluation)
  • or_default: Insert Default::default() if vacant, e.g. 0 for i32, "" for String, etc. (types with Default trait)
  • and_modify: Modify existing value, or insert default
#![allow(unused)]
fn main() {
use std::collections::HashMap;

let mut scores: HashMap<String, Vec<i32>> = HashMap::new();

// or_insert: Insert default if vacant, return mutable reference
scores.entry("Alice".to_string()).or_insert(vec![]).push(95);
scores.entry("Alice".to_string()).or_insert(vec![]).push(87);

// or_insert_with: Insert computed value if vacant (lazy evaluation)
scores.entry("Bob".to_string()).or_insert_with(|| {
    println!("Computing default for Bob...");
    vec![100]  // This only runs if key is vacant
});

// or_default: Insert Default::default() if vacant
let mut counts: HashMap<String, usize> = HashMap::new();
*counts.entry("hello".to_string()).or_default() += 1;

// and_modify: Modify existing value, or insert default
counts.entry("hello".to_string())
    .and_modify(|c| *c += 1)
    .or_insert(1);

println!("Scores: {:?}", scores);
println!("Counts: {:?}", counts);
}

Entry API for Grouping (Split-Apply-Combine)

Perfect for grouping data by categories:

#![allow(unused)]
fn main() {
use std::collections::HashMap;

let data = vec![("A", 10), ("B", 20), ("A", 30), ("B", 40), ("A", 50)];

// Group values by category
let mut groups: HashMap<&str, Vec<i32>> = HashMap::new();

for (category, value) in data {
    groups.entry(category).or_default().push(value);
}

// Now calculate aggregates per group
for (category, values) in &groups {
    let sum: i32 = values.iter().sum();
    let mean = sum as f64 / values.len() as f64;
    println!("{}: values={:?}, sum={}, mean={:.1}", category, values, sum, mean);
}
}

Entry API Works with BTreeMap Too!

#![allow(unused)]
fn main() {
use std::collections::BTreeMap;

let mut sorted_counts: BTreeMap<String, usize> = BTreeMap::new();

for word in "rust is awesome rust is fast".split_whitespace() {
    *sorted_counts.entry(word.to_string()).or_insert(0) += 1;
}

// BTreeMap iterates in sorted key order!
for (word, count) in &sorted_counts {
    println!("{}: {}", word, count);
}
}

Part 2: BTreeMap for Ordered Data

Meet BTreeMap

BTreeMap<K, V> is Rust's ordered map:

  • like a HashMap, it stores key-value pairs,
  • but it always keeps its keys in sorted order.
#![allow(unused)]
fn main() {
use std::collections::BTreeMap;

let mut populations: BTreeMap<&str, u32> = BTreeMap::new();
populations.insert("Boston", 650_000);
populations.insert("Austin", 975_000);
populations.insert("Seattle", 750_000);

for (city, pop) in &populations {
    println!("{city}: {pop}");
}
}

Output (note: alphabetical, regardless of insertion order):

  • Keys must implement Ord (they can be compared), not Hash.
  • Same familiar API as HashMap: insert, get, remove, entry, ...
  • Plus extras that only make sense when data is ordered: range, first_key_value, last_key_value.

How is it Ordered? (A Peek Under the Hood)

A HashMap spreads keys across buckets using a hash function — fast, but the order is effectively random.

A BTreeMap instead stores keys in a B-tree: a balanced, sorted tree structure. That's where the name comes from.

HashMap (unordered buckets):         BTreeMap (sorted B-tree):

  ┌─────┬─────┬─────┬─────┐                 [ M ]
  │ ... │ Bos │ ... │ Aus │                /     \
  └─────┴─────┴─────┴─────┘           [A,B]     [S,T]
      hash(key) → bucket              sorted keys in each node

Practical consequences:

  • Insertions and lookups cost O(log n) instead of O(1) — still very fast, but slightly slower than HashMap for pure point lookups.
  • Iteration, range queries, and min/max are cheap because the data is already sorted.

HashMap vs BTreeMap

FeatureHashMapBTreeMap
LookupO(1)O(log n)
Insert/DeleteO(1)O(log n)
Iteration orderRandomSorted by key
Range queries❌ Not supported✅ Supported
Key requirementHash + EqOrd
MemoryLess predictableMore predictable
  • In HashMaps, memory growth is spiky, requiring rehashing and reallocation of 2X memory.
  • In BTreeMaps, memory growth is smoother, allocating new compound nodes as needed.
    • By storing multiple keys in each node, BTreeMaps have better cache locality than simple binary trees.

When to Use BTreeMap

Use BTreeMap when you need:

  • Sorted iteration over keys
  • Range queries (get all keys between X and Y)
  • Min/max key operations
  • Percentile calculations
  • Keys that don't implement Hash

BTreeMap: Sorted Iteration

#![allow(unused)]
fn main() {
use std::collections::BTreeMap;

let mut temps: BTreeMap<u32, f64> = BTreeMap::new();
temps.insert(2020, 14.9);
temps.insert(2018, 14.7);
temps.insert(2022, 15.1);
temps.insert(2019, 14.8);
temps.insert(2021, 15.0);

// Iteration is always in sorted order by key!
println!("Global temperatures by year:");
for (year, temp) in &temps {
    println!("  {}: {:.1}°C", year, temp);
}

// First and last entries
println!("\nFirst: {:?}", temps.first_key_value());
println!("Last: {:?}", temps.last_key_value());
}

Note the order of years inserted and the order from the iteration.

BTreeMap: Range Queries

One of BTreeMap's killer features:

#![allow(unused)]
fn main() {
use std::collections::BTreeMap;
use std::ops::Bound;

let mut events: BTreeMap<u64, String> = BTreeMap::new();
events.insert(100, "Login".to_string());
events.insert(150, "View page".to_string());
events.insert(200, "Click button".to_string());
events.insert(250, "Submit form".to_string());
events.insert(300, "Logout".to_string());

// Get events in time range [150, 250]
println!("Events from 150-250:");
for (time, event) in events.range(150..=250) {
    println!("  t={}: {}", time, event);
}

// Events before time 200
println!("\nEvents before 200:");
for (time, event) in events.range(..200) {
    println!("  t={}: {}", time, event);
}

// Using Bound for more control
use std::ops::Bound::{Included, Excluded};
println!("\nEvents in (150, 300):");
for (time, event) in events.range((Excluded(150), Excluded(300))) {
    println!("  t={}: {}", time, event);
}
}

BTreeMap for Histogram Bins

A histogram groups numeric values into fixed-width bins and counts how many values fall into each bin.

Why BTreeMap<i64, usize> is a natural fit:

  • Key = bin index (an integer we compute from value / bin_width and throw away the fractional part).
  • Value = count of values in that bin.
  • Bins are displayed in order from smallest to largest — free with BTreeMap.
  • The Entry API makes the "increment this bin's counter" step a one-liner.

We'll build this up as an in-class exercise on the next slides.

For example, for data:

[1.2, 2.5, 2.7, 3.1, 3.8, 4.2, 4.5, 5.0, 5.5]
bin_width = 1.0

the output should be:

Histogram (bin_width=1.0):
  [1.0, 2.0): * 1
  [2.0, 3.0): ** 2
  [3.0, 4.0): ** 2
  [4.0, 5.0): ** 2
  [5.0, 6.0): ** 2

In-Class Exercise: Build a Histogram with BTreeMap

Task: Bin a set of floating-point values into integer-indexed buckets and print the histogram in sorted order.

Use Rust Playground or VSCode to develop your solution. We'll build it in three small steps.

Step 1: Compute the Bin Index

Given a value and a bin width, which bin does it fall into?

The bin index is the largest integer b such that b * bin_width <= value.

Hint: Solve for b in the inequality b * bin_width <= value.

fn bin_index(value: f64, bin_width: f64) -> i64 {
    // TODO: return the bin index for `value`
    todo!()
}

fn main() {
    // Quick checks
    assert_eq!(bin_index(2.7, 1.0), 2);
    assert_eq!(bin_index(4.5, 2.0), 2);   // bin [4.0, 6.0)
    assert_eq!(bin_index(-0.3, 1.0), -1); // negatives round down
    println!("Step 1 OK");
}

Step 2: Count Values per Bin

Use a BTreeMap<i64, usize> and the Entry API to count how many values land in each bin. No if contains_key — one line per value.

use std::collections::BTreeMap;

fn bin_index(value: f64, bin_width: f64) -> i64 {
    // TODO: replace with your solution from Step 1
     todo!()
}

fn build_histogram(data: &[f64], bin_width: f64) -> BTreeMap<i64, usize> {
    let mut bins: BTreeMap<i64, usize> = BTreeMap::new();

    // TODO: use a for loop to iterate over each value in data, and
    // use bin_index to get the bin index for the value, and increment the count for that bin.
    // Hint: use bins.entry(..).or_insert(0) to get the count for the bin, and increment it.
    todo!()
    bins
}

fn main() {
    let data = vec![1.2, 2.5, 2.7, 3.1, 3.8, 4.2, 4.5, 5.0, 5.5];
    let hist = build_histogram(&data, 1.0);
    println!("{:?}", hist);
    // Expected (note sorted by key):
    // {1: 1, 2: 2, 3: 2, 4: 2, 5: 2}
}

Step 3: Pretty-Print the Histogram

Iterate the map and print one row per bin with an ASCII bar. Because BTreeMap iterates in sorted key order, rows come out smallest-bin first with no extra sorting step.

use std::collections::BTreeMap;

fn bin_index(value: f64, bin_width: f64) -> i64 { todo!() }
fn build_histogram(_data: &[f64], _bin_width: f64) -> BTreeMap<i64, usize> { todo!() }

fn print_histogram(hist: &BTreeMap<i64, usize>, bin_width: f64) {
    println!("Histogram (bin_width={:.1}):", bin_width);
    // TODO: for each (bin, count) in hist (in sorted order):
    //   - compute start = bin as f64 * bin_width
    //   - compute end   = start + bin_width
    //   - print "  [start, end): <bar> <count>"
    // Hint: "*".repeat(count) builds the bar.
    todo!()
}

fn main() {
    let data = vec![1.2, 2.5, 2.7, 3.1, 3.8, 4.2, 4.5, 5.0, 5.5];
    let hist = build_histogram(&data, 1.0);
    print_histogram(&hist, 1.0);
}

Bonus: Rerun your code with bin_width = 0.5. Do the labels and counts still make sense? What happens to an empty bin in the middle of the range — does it show up, and should it?

Part 3: BTreeSet for Ordered Sets

BTreeSet<T> is a set of unique values that are sorted.

Example from BTreeSet Documentation

#![allow(unused)]
fn main() {
use std::collections::BTreeSet;

// Type inference lets us omit an explicit type signature (which
// would be `BTreeSet<&str>` in this example).
let mut books = BTreeSet::new();

// Add some books.
books.insert("A Dance With Dragons");
books.insert("To Kill a Mockingbird");
books.insert("The Odyssey");
books.insert("The Great Gatsby");

// Check for a specific one.
if !books.contains("The Winds of Winter") {
    println!("We have {} books, but The Winds of Winter ain't one.",
             books.len());
}

// Remove a book.
books.remove("The Odyssey");

// Iterate over everything.
for book in &books {
    println!("{book}");
}
}

BTreeSet: Range Queries

  • Because a BTreeSet stores its elements in sorted order, we can ask for every element inside some interval in O(log n + k) time (where k is the number of items in the range).
  • One slick use: prefix autocomplete — every string starting with "ban" is exactly the range ["ban", "bao"), because "bao" is the next string after every possible "ban*" in alphabetical order.
#![allow(unused)]
fn main() {
use std::collections::BTreeSet;

// A tiny dictionary. BTreeSet keeps the words in alphabetical order.
let mut dict: BTreeSet<&str> = BTreeSet::new();
for w in ["apple", "app", "apricot", "banana", "band", "bandana",
          "cat", "caterpillar", "dog", "donut"] {
    dict.insert(w);
}

// --- Example 1: autocomplete via a prefix range query ---
// Build the exclusive upper bound by bumping the last character: "ban" -> "bao".
let start = "ban";
let end = "bao";  // bump the last character

println!("Completions for {start:?}:");
for word in dict.range(start..end) {
    println!("  {word}");
}

// --- Example 2: alphabetical "slice" between two words ---
println!("\nFrom \"app\" up to (but not including) \"cat\":");
for word in dict.range("app".."cat") {
    println!("  {word}");
}

// --- Example 3: the k smallest / k largest elements ---
println!("\n3 smallest: {:?}", dict.iter().take(3).collect::<Vec<_>>());
println!("3 largest:  {:?}", dict.iter().rev().take(3).collect::<Vec<_>>());
}

In-Class Exercise: BTreeSet

Task: You have a set of train stations along a line, each at an integer distance marker. You are given a list of rider distances from the start of the line, print the nearest station to each rider.

This is a natural fit for BTreeSet: walk one step left (the largest station <= q) and one step right (the smallest station >= q), then pick whichever is closer. With a HashSet you'd have to scan every station.

Hint: set.range(..=q).next_back() gives you the nearest predecessor, and set.range(q..).next() gives you the nearest successor.

use std::collections::BTreeSet;

fn nearest(stations: &BTreeSet<i32>, q: i32) -> Option<i32> {
    // TODO: find the closest station to q.
    // 1. Get the nearest predecessor (largest value <= q).
    // 2. Get the nearest successor   (smallest value >= q).
    // 3. Return whichever is closer; ties go to the predecessor.
    todo!()
}

fn main() {
    let stations: BTreeSet<i32> = BTreeSet::from([5, 12, 20, 33, 50, 75]);
    let rider_positions = [0, 10, 17, 40, 100];
    for q in rider_positions {
        println!("rider at {q:>3} -> nearest station {:?}", nearest(&stations, q));
    }
}
#![allow(unused)]
fn main() {
// Expected:
// rider at   0 -> nearest station Some(5)
// rider at  10 -> nearest station Some(12)
// rider at  17 -> nearest station Some(20)  // 17-12=5 vs 20-17=3, so 20 wins
// rider at  40 -> nearest station Some(33)
// rider at 100 -> nearest station Some(75)
}

Bonus: What does your function return for an empty BTreeSet? What changes if the rider is standing exactly at a station?

Part 4: BinaryHeap for Priority Queues

BinaryHeap<T> is a priority queue implemented as a binary heap.

A priority queue is a collection that always returns the largest element first.

Example from BinaryHeap Documentation

#![allow(unused)]
fn main() {
use std::collections::BinaryHeap;

let mut heap = BinaryHeap::new();

heap.push(3);
heap.push(1);
heap.push(2);

println!("{:?}", heap.peek());
}

BinaryHeap: Return Smallest with Reverse

  • Reverse<T> is a zero-cost wrapper from std::cmp that flips the ordering of its inner value: Reverse(a) < Reverse(b) exactly when a > b.
  • Since BinaryHeap always pops the largest element by its Ord implementation, pushing Reverse(x) turns it into a min-heap — smallest x comes out first — without writing a custom comparator.
#![allow(unused)]
fn main() {
use std::collections::BinaryHeap;
use std::cmp::Reverse;

let mut heap: BinaryHeap<Reverse<i32>> = BinaryHeap::new();

heap.push(Reverse(3));
heap.push(Reverse(1));
heap.push(Reverse(2));

println!("{:?}", heap.peek());
println!("{:?}", heap.pop());
println!("{:?}", heap.pop());
}

BinaryHeap Example: Top-k in a stream

  • Wrap values in Reverse<T> to turn a max-heap into a min-heap.
  • Keep only the largest k items seen so far in O(k) memory
  • Fill the heap with the first k items
  • For each subsequent item, if it is larger than the smallest item in the heap, replace the smallest item with the new item
  • At the end, the heap will contain the k largest items in the stream
#![allow(unused)]
fn main() {
use std::collections::BinaryHeap;
use std::cmp::Reverse;

let stream = [7, 2, 9, 4, 11, 1, 6, 8, 3, 10, 5];
let k = 3;

// Min-heap of size k via Reverse: root = smallest of the current top-k.
let mut top_k: BinaryHeap<Reverse<i32>> = BinaryHeap::with_capacity(k);

for &x in &stream {
    if top_k.len() < k {  // if the heap is not full, add the item to the heap
        top_k.push(Reverse(x));
    } else if x > top_k.peek().unwrap().0 {  // if the item is larger than the smallest item in the heap, replace the smallest item with the new item
        top_k.pop();               // evict the smallest of the top-k
        top_k.push(Reverse(x));
    }
}

// print the top k items in the heap in descending order

// first iterate over the heap and collect the items into a vector
let mut result: Vec<i32> = top_k.into_iter().map(|Reverse(x)| x).collect();

// but the vector is not sorted
println!("{:?}", result);

// then sort the vector in descending order
result.sort_by(|a, b| b.cmp(a));
println!("{:?}", result);
}

BinaryHeap In-Class Exercise

Task:

  • You're writing triage software for a hospital ER.
  • Each arriving patient has a severity score from 1 (minor) to 10 (critical).
  • Treat the most-critical patient first;
    • if two patients have the same severity, treat whoever arrived earlier.
  • Print patients in the order they should be treated.

This is a classic priority-queue problem. The twist is the tie-breaker: we want highest severity to come out first, but lowest arrival index. We get both behaviors from one max-heap by pushing a tuple (severity, Reverse(arrival)) — tuples compare lexicographically, so severity dominates, and Reverse flips only the arrival component.

Hint: heap.push((sev, Reverse(i), name)); and then while let Some((sev, Reverse(i), name)) = heap.pop() { ... }.

use std::collections::BinaryHeap;
use std::cmp::Reverse;

fn main() {
    let patients = [
        (3, "Alice"), (8, "Bob"),  (5, "Carol"),
        (8, "Dan"),   (2, "Eve"),  (9, "Frank"),
    ];

    let mut er: BinaryHeap<(u32, Reverse<usize>, &str)> = BinaryHeap::new();

    // TODO: push each patient as (severity, Reverse(arrival_index), name).
    // Hint: use patients.iter().enumerate() to get arrival_index.

    // TODO: pop patients one-by-one and print:
    //   "treat {name} (severity {sev}, arrival #{arrival})"
}

Expected output:

treat Frank (severity 9, arrival #5)
treat Bob (severity 8, arrival #1)
treat Dan (severity 8, arrival #3)
treat Carol (severity 5, arrival #2)
treat Alice (severity 3, arrival #0)
treat Eve (severity 2, arrival #4)

Bonus: What happens if you push (severity, arrival, name) without Reverse? Try it and explain the tie-breaking for Bob vs. Dan.

Rust's Standard Collection Library Recap

Rust’s collections can be grouped into four major categories:

  • Sequences: Vec, VecDeque, LinkedList
  • Maps: HashMap, BTreeMap
  • Sets: HashSet, BTreeSet
  • Misc: BinaryHeap

We've covered them all now.

When Should You Use Which Collection?

link

These are fairly high-level and quick break-downs of when each collection should be considered. Detailed discussions of strengths and weaknesses of individual collections can be found on their own documentation pages.

Use a Vec when:

  • You want to collect items up to be processed or sent elsewhere later, and don't care about any properties of the actual values being stored.
  • You want a sequence of elements in a particular order, and will only be appending to (or near) the end.
  • You want a stack.
  • You want a resizable array.
  • You want a heap-allocated array.

Use a VecDeque when:

  • You want a Vec that supports efficient insertion at both ends of the sequence.
  • You want a queue.
  • You want a double-ended queue (deque).

Use a LinkedList when:

  • You want a Vec or VecDeque of unknown size, and can't tolerate amortization.
  • You want to efficiently split and append lists.
  • You are absolutely certain you really, truly, want a doubly linked list.

Use a HashMap when:

  • You want to associate arbitrary keys with an arbitrary value.
  • You want a cache.
  • You want a map, with no extra functionality.

Use a BTreeMap when:

  • You want a map sorted by its keys.
  • You want to be able to get a range of entries on-demand.
  • You're interested in what the smallest or largest key-value pair is.
  • You want to find the largest or smallest key that is smaller or larger than something.

Use the Set variant of any of these Maps when:

  • You just want to remember which keys you've seen.
  • There is no meaningful value to associate with your keys.
  • You just want a set.

Use a BinaryHeap when:

  • You want to store a bunch of elements, but only ever want to process the "biggest" or "most important" one at any given time.
  • You want a priority queue.

Cost of Collection Operations

Link

get(i)insert(i)remove(i)
VecO(1)O(n-i)*O(n-i)
VecDequeO(1)O(min(i, n-i))*O(min(i, n-i))
LinkedListO(min(i, n-i))O(min(i, n-i))O(min(i, n-i))
HashMapO(1)~O(1)~*O(1)~
BTreeMapO(log(n))O(log(n))O(log(n))

Note that where ties occur, Vec is generally going to be faster than VecDeque, and VecDeque is generally going to be faster than LinkedList.

For Sets, all operations have the cost of the equivalent Map operation.

Summary

  • The Entry API is a powerful tool for efficiently updating collections
  • BTreeMap is a sorted map that is efficient for range queries and ordered iteration
  • BTreeSet is a sorted set that is efficient for range queries and ordered iteration
  • BinaryHeap is a priority queue that is efficient for finding the smallest or largest element