The Rust compiler is known to trade compile time for performance aggressively. I look at how type erasure can significantly reduce compile times by only generating one version of a datastructure and it’s methods.
You can find the full source here.
Why is compilation, especially optimization, so slow for Rust? A common cause is monomorphization of generics.
For each instantiation of some generic type, the compiler generates
code specialized to the type parameters: HashMap<u32, u32>
and HashMap<u64, u32>
each
have their functions generated as if you handwrote two different versions, which places
a heavy load on code generation. Furthermore, when optimizing, much of this code is inlined
and re-optimized at each call site. Doing so results in tremendous performance improvements,
but also significant compile time increases.
Idiomatic Rust heavily encourages the use of generics, especially the standard library collections. Whether you are using a vector in an extremely tight loop, or merely managing config data in rarely-called code, the optimizer dedicates vast amounts of effort to ensure the code is as fast as possible. For many use cases, we don’t want to waste this effort.
One way to solve this problem is with type erasure.
Erasing types
How can we get around this problem? One way is to have every single instantiation share a common instantiation of the datastructure. Imagine the BTreeMap, instead of holding truly generic types, just held trait objects that made indirect calls to the correct comparison functions:
// Semi-pseudocode
trait BTreeMember {
fn compare(&self, other: &dyn BTreeMember) -> Ordering;
}
struct BTreeKey {
key: Box<dyn BTreeMember>
}
impl Ord for BTreeKey {
fn cmp(&self, other: &BTreeKey) -> Ordering {
self.key.compare(&*other)
}
}
Along with wrapper structs to prevent inlining of any of the actual tree code while providing a type-safe wrapper around it
// PartialEq, PartialCmp, etc
struct InnerTree {
tree: std::collections::BTreeMap<BTreeKey, Box<dyn std::any::Any>>,
}
impl InnerTree {
#[inline(never)]
fn insert(/*...*/) {
/*...*/
}
// Other functions
}
struct ErasedBTree<K, V> {
inner: InnerTree,
}
impl<K, V> ErasedBTree<K, V> {
fn insert(key: K, val: V) {
/* convert key, val to appropriate inner types */
inner.insert(inner_key, inner_val)
}
}
You can see what I had to do for real right here. There were some practical issues with more straightforward approaches that I detailed in the source, but these are somewhat unrelated to the actual concept of type erasure, and there may be a better way.
Type erasure accomplishes two things:
- It makes it so that for any type, the code executed is identical. As far as the actual tree knows, it only ever deals with one set of types:
BTreeKey
andBox<std::any::Any>
- It wraps all of the tree calls with functions marked
inline(never)
, ensuring that the code should only get generated once (LTO might still interfere here).
As a result, the compiler only generates the marshaling code on a type-by-type and callsite-by-callsite bases, instead of everything for manipulating the tree. It would be possible to virtualize even more code with some more unsafe, but from the benchmarks, it’s clear that the majority of the code was in tree manipulation.
Benchmarks
I made a dummy compilation benchmark to test the use of the standard library BTreeMap vs. my ErasedBTreeMap.
It consists of 100 identical files, each implementing 10 modules with a unique local type. Each module defines a map containing the local type and a function to insert it into the map. Doing so ensures the optimizer has code to optimize.
This program is almost a worst-case for generics monomorphization. Every single instance and callsite is a unique type, so at some point during compilation, every callsite should theoretically have a unique representation. The compiler may recognize each version is the same, but in my experience and the benchmark, this is not happening.
To run the benchmark, I do a clean build of the repository in either Debug or Release mode, using either the type-erased BTree or the standard library BTree. The results on my laptop, measured in seconds, are:
Debug | Release | |
---|---|---|
Erased | 20 | 25 |
Standard library | 140 | 370 |
The magnitude in compile-time difference here is astounding - 6 minutes to 30 seconds for release. Further, the type-erased map is far less affected by release optimizations than the standard library map. This is also expected because there’s only one version getting build, but the magnitude is still surprising.
Overall, this is an excellent result. It shows that there’s promise in using this to reduce compile times in code heavily using standard library datastructures.
Can we make drop-in replacements for the standard library datastructures?
In principle, I don’t see any reason why one can’t write type-erased versions of the standard library datastructures by just wrapping the existing ones as I did for the BTree. There’s no change in the lifetime or functionality of any objects, just virtualizing all of the calls. I currently have to use a fair amount of unsafe code with a lot of implied invariants, but there might be better ways to reduce the amount and scope of unsafe.
I’ve only started trying to implement such wrappers, so there might be undiscovered problems along the way. I’ve already run into some difficulty with how the Hash trait requires the implementor to be generic across hashers. If anybody has already tried to do this, please tell me! I see compile times as one of Rust’s biggest problems, and having a good set of datastructures made with rapid compilation in mind could help to offset this issue.