Building a Safe API for Vec Splitting in Rust

NOTE: This is an AI generated post from the following interaction. I do not recommend reading this post because it's aweful. A manual rewrite is pending.

How do you safely split a Rust Vec<T> into its initialized elements and uninitialized capacity? This seemingly simple question led to a fascinating exploration of Rust's memory model, type system boundaries, and the art of building sound APIs around unsafe code.

The Problem

When working with vectors in Rust, you sometimes need simultaneous access to both the initialized elements and the reserved but uninitialized capacity. This is particularly useful for high-performance scenarios where you want to extend a vector while maintaining references to existing data.

The challenge? Rust's Vec API doesn't expose this functionality directly, and implementing it requires navigating the boundary between safe and unsafe code. Get it wrong, and you've introduced undefined behavior. Get it right, and you've created a powerful abstraction that maintains Rust's safety guarantees.

The Journey Begins

The initial approach introduced a custom SplitExtend<T> trait with a split_extend() method:

trait SplitExtend<T> {
    fn split_extend(&mut self) -> (&mut [T], Spare<T>);
}

This returns a tuple containing a mutable slice to initialized elements and a Spare<T> wrapper providing access to the reserved capacity. The implementation used a NonNull<Vec<T>> pointer with a SetLenOnDrop guard to track length changes as elements were added.

The problem? This relied on undocumented assumptions about how set_len interacts with existing slice pointers. Time to dig deeper.

Validation and Refinement

Running the code through Miri—Rust's interpreter for detecting undefined behavior—immediately caught issues. The first fix was straightforward: properly wrapping the vector pointer with NonNull::new(vec).unwrap() instead of using raw pointer casts.

But this validation phase sparked deeper questions about the API design. What if instead of storing a NonNull<Vec<T>>, we stored references to the underlying data more directly?

A Better Abstraction

The breakthrough came from thinking about what truly needs to be mutable. Two alternative designs emerged:

Design 1: Store &mut [MaybeUninit<T>] directly in the Spare struct.

struct Spare<'a, T> {
    capacity: &'a mut [MaybeUninit<T>],
    // ... length tracking
}

This avoids repeated pointer arithmetic and reduces the unsafe code surface area. With MaybeUninit::write(), most operations become safe.

Design 2: Store a &mut usize reference to the Vec's length field.

struct Spare<'a, T> {
    length: &'a mut usize,
    capacity: &'a mut [MaybeUninit<T>],
}

This eliminates concerns about set_len behavior entirely—only the initial construction requires unsafe code. But there's a catch: how do we find the length field's offset within Vec's representation?

The Offset Detection Challenge

Rust's Vec<T> internally contains three fields (pointer, capacity, and length), but their order is implementation-defined. To store a &mut usize to the length field, we need to determine its position at runtime.

The first attempt was naive:

// DON'T DO THIS - violates Vec invariants
let fake_vec: Vec<T> = std::mem::transmute([0usize, 1, 2]);
let len = fake_vec.len();

This violates Vec's invariants: the pointer must be valid, and length must not exceed capacity. Instant undefined behavior.

The ZST Insight

The breakthrough came from considering zero-sized types (ZSTs). For ZST vectors, there's no actual allocation—the pointer field is essentially meaningless. This opens up possibilities:

let vec: Vec<ZstType> = Vec::new();
let mut vec = ManuallyDrop::new(vec);
vec.set_len(0);
let before = std::mem::transmute::<_, [usize; 3]>(vec);

vec.set_len(1);
let after = std::mem::transmute::<_, [usize; 3]>(vec);

// Find which field changed
for i in 0..3 {
    if before[i] != after[i] {
        // Found the length field at index i!
    }
}

By using ManuallyDrop and setting different lengths, we can observe which field changes without violating safety invariants.

Handling Non-ZST Types

For non-ZST types, we need a real allocation. The solution:

let mut vec: Vec<T> = Vec::new();
vec.reserve(1); // Ensure we have capacity

// Temporarily set length to 0
let original_len = vec.len();
vec.set_len(0);

let before = std::mem::transmute::<_, [usize; 3]>(vec);
vec.set_len(1);
let after = std::mem::transmute::<_, [usize; 3]>(vec);

// Find which field became 0 then 1
vec.set_len(original_len); // Restore

This approach requires calling reserve(1) first, but that's acceptable—our API requires pre-allocated capacity anyway!

The Final Solution

The complete implementation combines both strategies:

fn detect_length_offset<T>() -> usize {
    if std::mem::size_of::<T>() == 0 {
        // ZST case: use ManuallyDrop
        let vec: Vec<T> = Vec::new();
        let mut vec = ManuallyDrop::new(vec);
        // ... detection logic
    } else {
        // Non-ZST case: use reserve
        let mut vec: Vec<T> = Vec::new();
        vec.reserve(1);
        // ... detection logic
    }
}

With the offset known, we can safely construct a &mut usize to the length field and build our API entirely on safe abstractions. The type system prevents illegal operations—trying to do something invalid simply won't compile.

Why This Matters

This journey exemplifies several important principles in Rust development:

Type Safety as Enforcement: By storing &mut [MaybeUninit<T>] instead of raw pointers, illegal operations become impossible without unsafe code. The type system does the heavy lifting.

Validation is Essential: Miri caught issues that might have lurked as rare bugs. Always validate unsafe code rigorously.

Abstraction Boundaries: Concentrating unsafe code in construction while keeping usage safe creates robust APIs. Module privacy ensures soundness.

Community Collaboration: None of this would have been possible without the Rust community's thoughtful feedback and iteration. Special thanks to quinedot, whose insights were instrumental in refining the design and identifying sound approaches.

The Ecosystem

Interestingly, this problem has been tackled before. The sharded-vec-writer crate provides similar functionality for multi-threaded scenarios, demonstrating that the underlying need is real and well-established.

Lessons Learned

Building this API reinforced several key lessons:

  1. Unsafe code is not inherently bad—it's a tool for building safe abstractions
  2. The type system is your ally—design APIs so the compiler enforces correctness
  3. Validation catches bugs—use Miri and other tools religiously
  4. Iteration improves designs—the first approach is rarely the best

The Rust community's emphasis on rigor and correctness creates an environment where these explorations can flourish. When you push against the boundaries of the type system, you often discover elegant solutions that would be impossible in languages with weaker guarantees.

Conclusion

What started as a simple question about splitting a vector evolved into a deep exploration of Rust's memory model, type system, and API design principles. The final implementation provides a safe, ergonomic API built on a foundation of carefully validated unsafe code.

The full discussion with all the implementation details and community insights is available in the original Rust users forum thread. I highly recommend reading through it to see the iterative refinement process in action.

If you're interested in exploring these concepts further or have your own approaches to solving this problem, I'd love to hear about them. Rust's safety guarantees make it an incredible playground for these kinds of low-level explorations—and the community makes it even better.