My Algebra Is a Little Rusty
On of my main responsibilities in my job as a research software engineer is maintaining and developing a C++ library for computing in abstract algebraic objects called libalgebra (link). This is a mature and fairly complete library in terms of its capabilities, and it has excellent support for the tensor algebra and Lie algebras. However, it was mostly written a long time ago, and it doesn’t take advantage of more recent features of C++ (even those in C++11). Consider for a moment that C++ did not have move semantics until C++11, which could mean that there are a lot of unnecessary copies or allocations being made. (I can’t be totally certain that the compiler isn’t doing some work here to reduce copies or temporary allocations.) In any case, one of main themes of my work is expanding the support for newer features within the library.
Now I’m a fairly recent adopter of C++. I started learning C++ about 6 months before I started my job here, and most of my programming experience prior to this came in the form of Python. There are some things about C++ that I find somewhat frustrating, that mostly stem from its relation to C and the lack of a good build system and dependency manager. (Sorry CMake, you just don’t cut the mustard.) That being said, the C++ template system is a marvellous, and really is a great way to construct a sensible interface to mathematical objects. I’ll come back to this later.
Let’s talk about Rust. Rust is a modern systems programming language (much like C++) built with concurrency safety in mind. C++ was first developed in a time where processors had clock speeds measured in MHz (or perhaps even KHz, I’m not sure) and multi-core processors were perhaps not even a dream. Thus a lot of the support and features that help manage threads and multicolour execution were added to the language later, mostly in the C++11 revision. One of the major problems that faces multihreaded programming is the concept of data race conditions (see this). Broadly speaking, this is where the execution of a program relies on operations on data shared between multiple threads being in a predictable sequence, but the nature of multithreading makes this difficult to ensure. (One usually makes use of constructions such as mutex locks or atomic values which guarantee predictable access to data to avoid problems resulting from race conditions.) Rust was built with concurrency in mind, and utilises a borrow checker in it’s compiler to make strong guarantees about access to data at compile time. This does make Rust rather tricky to work with at times, but it is still easily my favourite language at the moment.
I decided some time ago, even before I started this job, that I should write a general purpose library for working with abstract mathematical structures - much like libalgebra - in Rust. This is going to look very different to libalgebra, because of the differences between C++ and Rust; for instance, Rust does not support object orientated programming. This necessitates changing the emphasis from combined data and behaviour to separate behaviour from underlying data in a meaningful way. I really enjoy the forced change of focus that this entails, and it really makes me think about the design of the library. I thought I’d document and share the process of designing and implementing the library here. This is still very much a work in progress, but you can see the code in my GitHub here.
About the design
Libalgebra makes heavy use of the C++ template system to construct classes representing various mathematical structures. The template classes build upon one another, in a very similar way to how one builds on mathematical structures to add in new behaviour. For example, a vector template class would have two template arguments that provide the basis type and the scalar field type. This pair of parameters describe a vector space as a linear span of basis elements with coefficients in the scalar field. In C++, this would look something like this.
|
|
We can the derive an algebra type on top of a vector, using class inheritance, by providing an additional template parameter that provides a means of multiplying objects of the vector
class. This might look something like this.
|
|
This process is very similar to the way that an algebra is constructed from a vector space by providing a multiplication operation that is compatible with the vector space operations. Of course the fact that an abstract vector space admits a basis is a theorem. Defining the basis first then using formal linear combinations to define the vector space might seem slightly unnatural to some, but it really isn’t. Moreover, from a programming perspective this is pretty much the only way to proceed, since there is no sensible way to check axioms as one would in the mathematical world.
Let’s briefly talk about the representation of structures here. A specific object created according to one of these template classes - a specific instantiation of the template class- represents an element of the vector space, so the class itself is a representation of the vector space. This is actually a very sensible way represent these abstract structures. Using objects - rather than classes - to represent a abstract structure in it’s entirety often leads to problems, at least in my experience. As I mentioned above, checking axioms - exhaustively or otherwise - is often doomed to fail. Indeed, floating point numbers do not have associative operations, which is a requirement in basically every mathematical structure. However, this doesn’t make them irrelevant. A class is a recipe for creating objects, much like an abstract structure is a description of the elements. They serve the same purpose.
Unfortunately, the object orientated paradigm is convenient in some cases but it also has a rather large problem: it ties the behaviour strongly to the internal data representation. Let me explain. The point of object orientated programming is that you combine together the data with behaviour one wishes to describe. Now one can use dynamic dispatch and other techniques to work around the the constraints of the system, but ultimately this means that changing the data model necessitates completely re-implementing any part of the class that relies on this specific model. Building a hierarchy of classes that is somehow agnostic to the data model is rather tricky in these circumstances.
I used to love object orientated programming, and I still do (to some extent). But I’ve come to realise that building hierarchies of behaviour separate to the data model is actually a much more reasonable way of building rich structures. Let me explain. The set of integers can be given two perfectly good commutative, associative operations of addition and multiplication with a neutral element. We can consider the set of integers (the data model) with either one of these two operations (the behaviour) separately, or we could consider them together. Of course if we consider them together, provided that they satisfy the compatibility rules, the set forms a ring. How would I represent this in an object orientated program? It might look something like the following:
|
|
So what exactly is the problem here?
Well the classes integer_add
and integer_mul
are independent, except for their common base class, and it is not simple to move between these two types.
Moreover, what if I want to change the underlying data representation?
This would lead to another completely separate class hierarchy.
Let’s compare this to a similar implementation using the Rust trait system, which might look something like this.
|
|
This code has several advantages.
Firstly, the data model is separate from the two different operations.
This means that the struct Integer
can simultaneously satisfy Add
and Mul
.
These are not classes with a common base class, they are the same object, as is the case when it is considered with both operations.
Secondly, and perhaps more importantly, the relationship between all of these different behaviours is clear.
An object that implements both Add
and Mul
can be a ring - we could actually auto implement this, but that would actually be counter productive.
The implementation of the Ring
trait for Integer
is the assertion by the programmer that these two operations satisfy the compatibility requirements to form a ring.
I think this is much more explicit and descriptive, and I generally feel that this is a more sensible way to represent abstract structures from mathematics.
In any case, this forces one to completely reconsider the way that one thinks about designing a program, which is the interesting part for me at the moment.
My aims for rustyalgebra
I should be clear that this project is supposed to be a learning experience, and not to replace libalgebra. Rewriting the library in Rust forces me to rethink the way that these structures are built and represented. That being said, it will be interesting to compare the result against libalgebra to see if there are performance penalties or gains to be made by changing the implementation drastically. This may or may not be a fair comparison since I’ve spent a lot of time optimising libalgebra and I bring all of that experience with me to write this new library.
Now I’m going to have to re-imagine the basic structure of the library, which is one of the most interesting aspects for me, so it is a good idea to re-examine how things are structured and make some notes for the new library.
Our basic structures (mathematically) are vectors, which are going to require a trait for the behaviour, and various implementations for different storage models.
To start with, we’ll be looking to implement a vector with dense storage and a vector with sparse storage.
To keep things simple, the dense storage can just be a simple struct containing a Vec
of scalar types.
The sparse vector can be a thin wrapper around a HashMap
, much like it is in libalgebra.
The Vector
trait will have to contain information about the basis, and corresponding key type, and the scalar field that the vector space is over.
Once I’m happy with the implementation of vectors I can start to think about expanding the library to included traits for algebras and various specific structures such as Lie algebras and tensor algebras. This is going to take a long time. At the time of writing I’ve already implemented dense vectors in a basic capacity, and started implementing sparse vectors. I have a lot of work to do to tune these implementations before I’m happy to fully implement algebras and other features. (I have started implementing some things like tensors, because it is easier to write tests using a full production basis rather than making something from scratch just for tests.)
Defining the traits
The first big challenge will be to design the traits that will form the foundation of everything that we do here. Since I want this to be easily extendable in the future, I want to define some generic concepts for low level concepts like scalar fields. I am aware that there is a crate (link) that provides a large number of traits for mathematical structures, including rings and fields, but I’m not sure I want to use these to start with. My main reason here is that I’d rather keep the library self-contained as much as possible and I can certainly add support for these traits later if it is necessary.
The field trait
A vector space is defined over a field, such as the real numbers or complex numbers, and as such we need a trait that describes the properties of a field.
This is really our first taste of a trait system to define a mathematical structure and, as you’ll see in a moment, it really is perfect for this job.
First let’s create a new module called coefficients.rs
and start to populate the properties of a field.
A field is a set that with two compatible operations of addition and multiplication, so that it forms a ring as described above, such that the multiplication is commutative-the order of operations does not matter-and has a multiplicative identity element-this is almost always denoted 1, for obvious reasons-and finally that every non-zero element has a multiplicative inverse.
This final property that is unique to a field is what allows us to form reciprocals $1/x$ for any non-zero $x$-note that $1/0$ is not, and should not, be defined.
Let’s start putting together a trait that captures these properties.
|
|
So this is a good first attempt, but there some obvious omissions.
(This is similar, but not exactly the trait that is currently implemented in the repository.
Some changes will be explained below, others are new.)
You’ll notice that I’ve added sub-traits PartialEq
, Sized
, and Clone
on top of the properties described above.
This is mostly to satisfy the Rust type system but also to allow us to compare elements of a field in a meaningful way later.
The curious amongst you will probably be wondering why I haven’t used the Eq
trait over PartialEq
, and this is because we want to be able to implement this for floating point numbers, which don’t satisfy Eq
.
(The short reason for this is that NaN
never compares equal, even to itself.)
However, PartialEq
should make do for almost all things that we might want to do.
This trait requires the definition of three constant values, ONE
, for the multiplicative identity, ZERO
, for the additive identity, and MONE
for the additive inverse of the multiplicative identity.
(Read these as 1, 0, and -1 if you like.)
The first two are required by a field, and the third is a useful constant to have.
Next, I have defined two methods uminus
and inv
to generate the additive and multiplicative inverses, respectively, from a given element.
These functions will take $x$ and return $-x$ or $1/x$ (for non-zero $x$), respectively.
(I haven’t included inv
as a method in the GitHub version of the library, but I think it should be there.)
Finally, I’ve defined the four crucial methods: add
, sub
, mul
, and div
.
In the library that is in the GitHub repository, I’ve also added in-place versions of these four functions for completeness.
It has been quite a long time since I wrote the CoefficientField
trait as it appears in the library, and since then I’ve noticed that there are several drawbacks.
First is the fact that all four functions take references to the field type, which can be a bit of a pain when nesting calls, such as x.add(y.inv())
.
The inner call to y.inv()
returns a new scalar type, which then cannot be used as an argument to x.add
because it is not a reference.
This can be fixed easily by inserting an &
to turn this temporary object into a reference, but this kind of pattern happens enough that it is frustrating to have to do this.
I could solve this in two ways.
The easiest solution would be to take the argument to each of our trait functions by value, which would move their values into the scope of the function - taking ownership of the value - and using Clone
to generate new copies of the value whenever I’m not passing in a temporary.
In practice, this might be worse than the existing problems since instead of a single &
, I would now have to write .clone()
for every case where I don’t want to move the value out of the containing scope.
Of course, for types that implement the Copy
trait, the default semantics are copy rather than move, so this would be handled automatically, but in general we can’t make this assumption.
The second solution is to take a generic argument that implements the Borrow<Self>
trait.
This adds an additional layer of indirection to the trait but it would dramatically simplify things
in the rest of the library.
Since, for a type T
, both T
and &'_ T
implement Borrow<T>
, so that we can pass by either reference or value without any concerns.
This feels more natural to me, and will certainly prevent some headaches later on when I’m writing code for specific implementations.
I don’t know whether this would have any performance penalties, but I’d hope that these kinds of scenarios would perfect for compiler optimisation.
I am going to make this change in a future update to the library.
Now I want to be able to deal with more general structures than simple vector spaces in the future, which means that I want to form modules over rings, which could be non-commutative and not all (non-zero) elements are invertible. For this reason, I’m going to add an associated type for rational values, those members of the ring that have a multiplicative inverse (units, in the terminology of rings). This will allow me to adapt the code more easily in the future. Thus the final version of my coefficient field trait will look something like this:
|
|
Something else that I will need to address is the ability to convert various integer types to CoefficientField
types, and also converting from a rational type to the parent type.
The former is a natural operation, since any infinite ring with identity contains a copy of the integers.
Indeed, if $R$ is a ring with identity then the distributivity property gives
$$
n x = (1 + (n-1))x = x + (n-1)x
$$
for all $x\in R$, so that (by induction) multiplication by $n$ is defined for all integers $n \geq 1$.
(From there it is a simple exercise in taking negatives to get the negative integers too.)
Taking $x = 1$ in the above gives a copy of $n$ in $R$.
In the current version of the library I added a FromDegreeType
trait to convert from a DegreeType
- currently an alias of u32
- into a scalar field type and added this to the requirements of the CoefficientField
type.
This solves my immediate needs, but it will need to be expanded in the future.
For converting from RationalType
into the field type, we can add Into<Self>
as a constraint on the RationalType
associated type.
This is natural because the invertible members of a ring should indeed be members of the ring.
Introducing Bases
The next thing we need is a trait that describes what a Basis is. This is more simple than defining a trait for a coefficient type since any set can constitute a basis of a vector space. However, to make this trait make sense in the context of the library, we do have some work to do.
We’re going to treat the basis type associated with a vector space as a marker rather than a component, meaning we won’t construct or manipulate the basis object directly and instead use it to distinguish between two different vector types.
This differs from the coefficient field in that the coefficient field will be used in some way to construct a vector and the basis will not (except as PhantomData
, to keep the type system happy).
To enumerate the elements of the vector itself, we will use an associated type called a key.
The requirements on a key are very simple.
We should be able to distinguish (strongly) between two key types, so they will need to implement the Eq
trait, and we should be able to clone a key (the Clone
trait).
Finally, we probably want to be able to print keys out for debugging purposes, so we’ll also add the Debug
trait to make this possible.
A simple basis trait looks like this:
|
|
For now at least, this should be everything we need to define a trait for a vector type.
The vector trait
The last fundamental trait that we’re going to introduce now is the Vector
trait, which describes the properties of an element from a vector space (defined as the linear span of a basis with coefficients taken from a given field).
The Vector
trait must be sufficiently general to cover the wide variety of different data structures - the fundamental objects that hold the data data that constitute the vector - but also capture the necessary properties of a vector.
Let’s take a moment to remind ourselves of the defining properties of a vector. A vector space is a set that is equipped with an addition operation that is commutative and associative, has a zero element, and has inverses (negatives), and also has a second operation of scalar multiplication with scalars coming from a field. (That is, it forms an abelian group with addition.) The scalar multiplication must “play well” with addition, and also have some other expected properties: multiplying by 0 should give the zero vector, and multiplying by 1 should not change the vector. We’ll refer to these operations of addition and scalar multiplication as the vector arithmetic operations.
Now, in terms of the code, we can’t really make any assertions as to the restrictions on the vector arithmetic operations, but we can use our trait to ensure that they exist.
Now we could use the standard library operator traits (Add
, Sub
, and so on) but, as with the field trait, this isn’t quite as flexible as we would like.
In fact, my first at writing an algebra library in Rust tried to make use of these operator traits, and defining a Vector
trait turned out to be hideously complicated.
In addition to the arithmetic operations, we also need some methods for accessing the data in the vector.
A fundamental property of a vector is that one can retrieve the coefficient corresponding to a particular basis element.
This behaviour can be added to our trait by implementing a get
method (and a corresponding get_mut
for getting this value mutably).
Again, we could use the Index
trait to allow us to access these elements using the usual brackets operator [key]
, but we do need to be careful.
If this is functionality we’d like to implement, we can use a blanket implementation to implement bracket access for all vector types by simply calling the required get
method.
Putting these basic properties together, we end up with the start of a reasonable Vector
trait.
|
|
We need to add some methods for creating new vectors, such as the zero vector, and from existing data.
I’m not entirely sure what methods I’ll need yet, but I have included methods for creating unidimensional vectors from a key and scalar value, and from iterators.
There are also a handful of methods for general housekeeping of vectors such as inserting and clearing elements in vectors.
This more or less rounds out Vector
trait as it stands at the moment.
You might notice that my actual trait in the GitHub repository does not look exactly like this.
This is because it is a work in progress and I haven’t fully decided what intermediate traits I will eventually introduce to deal with the more general settings that we might want to use vector-like objects.
Problems of iterators
One of the big things that I haven’t been able to figure out yet is iterators.
I want vectors to support iteration (by reference or mutable reference) over the data that it contains.
Unfortunately, the Rust borrow checker wants to know the lifetime associated with these references, which leads to some rather tricky reasoning about how long the data is valid for in the definition of the Vector
trait.
The main problem is that Rust will not allow - at least when using safe rust - references that may become invalidated during their lifetime.
This is a good thing, it means that when you have a reference you can be certain that it is valid and points to the data you expect it to.
Unfortunately, this also means that the programmer has to make an assertion as to the lifetime of the underlying data using lifetime annotations.
I think the problem is that there is no easy mechanism, as far as I know, to account for the lifetime of the data structure implementing Vector
without introducing a lifetime parameter for the trait itself.
In the end, this might have to be something that I do to make iterators work.
This might not be the worst solution.
Indeed, in most circumstances, there will be some kind of sensible choice of lifetime to use when implementing the Vector
trait for a data storage option.
For example, if we were to implement Vector
for a slice of scalar values, say &'a mut [S]
, then the natural choice of lifetime to use in the trait implementation would be 'a
.
However, not being an expert on Rust trait lifetimes and how these interact with one another, I cannot be sure that this is the “right” choice.
I need to do a whole lot of reading around to find out if this is a viable option, although I am reluctant to introduce generic parameters for the Vector
trait.
Summary
This is a long-term project. The C++ library libalgebra contains an enormous amount of functionality, and redesigning this functionality to be more Rustic will be very challenging. I have a lot of ideas for how this can be done, but obviously this will take a lot of work. I’ve already implemented some things that I will write separate posts about.