What's inside the Klein bottle?

Rusty bit Sudoku solving

written by Sam on 2020-09-12

Recently I've been doing a lot of Sudoku puzzles, following the discovery of the Cracking the Cryptic YouTube channel. I've always enjoyed Sudoku and they have served to entertain me during long journeys for many years. However, I'd never encountered any of the newer variants of the original puzzle such as "chess Sudoku" or "thermo Sudoku", where the placement of a value is restricted based on what values are within a standard chess move or the position on a "thermometer". These variations are a fresh take on the original puzzle and make for some really interesting and difficult problems. (I highly recommend the Cracking the Cryptic YouTube channel link.)

I've wanted to write a solver for Sudoku puzzles for some time, but have never got around to starting. Since I've got some free time now and have been solving a lot of puzzles recently, I thought it might be a good time to start. Of course, this is a rather complex challenge with many different aspects. I'm going to focus on the design of the underlying objects here. I'll cover other aspects in later posts (as I figure these steps out).

Broad overview

There are two core problems that need to be addressed before we start to design the the program itself:

  1. How do we reference positions in the grid; and
  2. How do we store the possible digits in each cell of the grid.

The first problem seems to be fairly easy to solve. We can simply store the row and column number associated to each cell. The row and column coordinate pair - each of which is a digit between 0 and 8 inclusive - uniquely identifies a cell in the 9 by 9 grid. We can also use the row and column number to calculate the box in which the cell lies. The boxes are the smaller 3 by 3 grids of cells, labelled 0 to 8 start in the top left and counting along each row. We will need to be able to access all three of these attributes, along with the value (or possible values) of the cell.

The second question is slightly more difficult to answer. There are various objects available, but none of the basic options seem to be particularly helpful. The most obvious method is to simply store a list on each cell that holds the values that can be placed in the cell. This has the advantage of being fairly flexible, but suffers from complexity. Alternatively, we might store fixed values on the cell objects and then have some other method for describing the possible values. For example, we could store them in a multimap object indexed by values, where there entries are the cells in which that value can be placed. This approach has the advantage of being easy to inspect and analyse, since we can very easily find all possible locations for a given value. However, updating and maintaining this multimap will likely prove to be a rather complex task.

How we choose between these options depends on the type of operations we need to perform in solving a puzzle. For instance, we need to quickly determine the possible locations of a given value within a row, column, or box. We will also need to perform reductions - remove possibilities from a row, column, or box when a value is fixed in a cell. Already we can see that these operations might be easier for the multimap model. However, the multimap model will really struggle with inspecting particular rows, columns, or boxes. Here we will have to perform several value lookups for reach row. I think this is a real disadvantage of trying to use a multimap, especially considering the extra complexity of constructing and maintaining this object.

Borrowing some inspiration

I've been watching a lot of videos on YouTube in the last few months - typically in the background whilst doing other things. (This is how I came across the Cracking the Cryptic channel.) I've watched a lot of videos on logic circuits and microprocessor architecture, which involves manipulating electrical signals on individual wires to set bits according to the architecture of the components. (I recommend the 8-bit breadboard computer series by Ben Eater on YouTube. It is a very interesting deep dive into how computers - microprocessors - work.) This gave me an idea about how I might go about storing the information about the cells in a Sudoku grid.

Rather than storing a structure containing the row and column as an integer, I can store one number from 1 to 9 in 4 bits of information. From a storage point of view, this is far more efficient than storing them as even the smallest integer type (8 bits/1 Byte), since we can fit both numbers in a single byte rather than using 2 bytes. We can apply the same kind of thinking to the keep track of the values that can be placed in a cell. We can assign a single bit to each of the digits 1-9, where a bit is set to 1 if the corresponding value can be placed in the cell and 0 otherwise. Since this will already need more than a single 8 bit integer in memory, we will have to use the next smallest size of integer, which is a 16 bit integer. This gives us 7 spare bits to store any additional information such as the box number (4 bits) and a flag to indicate whether the value is fixed or not (1 bit).

All together, we need 22 bits to store all the information for each cell in the grid. We can store these in a combination of an 8 bit integer and a 16 bit integer - leaving two spare bits in case we need these later. This storage structure is most akin to storing a list of possible values for each cell, but it is a little more subtle than that. I didn't realise this when I decided that this is the way I wanted to implement the Sudoku grid.

Choosing a language

The next step was to choose a programming language to use. I probably could have chosen any programming language for this, but I decided to choose Rust because it is a nice low level programming language where I have control over the size of the integer types used and can easily manipulate bits, but still have a full suite of modern features. I really enjoy writing Rust code because it doesn't feel as restrictive and verbose as a language like C or C++. Of course it is a relatively new language, and still doesn't have some of the flexibility as some of the more mature languages - generic types are not as well developed as C++ templates, for instance - but it is constantly improving.

For this project I won't really need any of the really advanced features of the language. In fact, all we really need is a struct to hold our two integer types, and array of these structs to represent the grid. The only other features we need are individual bit manipulations - bitwise and, or, shifts - and basic arithmetic, along with basic flow control. This really is an exercise in design rather than an exercise in using clever features of the programming language.

Designing the program to work at such a low level - as in, not using clever language features - means that I have to work much harder in the planning and organisation stages to make it work. This is exactly how I wanted this project to work. Unfortunately, it also means that the actual implementation is going to be rather more difficult to understand.

Implementing a Sudoku cell

A Sudoku cell in this program contains the row and column number, combined together into a single 8 bit integer, and the box, set flag, and possible values (and two spare bits) in a 16 bit integer. We'll use a Rust tuple struct to hold these two integers. (We're going to call this a SudokuSquare rather than a Cell.)

struct SudokuSquare(u8, u16);

We need to write an associated function to create a new default square in a given position. The usual practice is to call this function new. However, before we do this, we need to set up some constants that will help us to select the relevant pieces of information from these two integer values. For this we define some constants at the top of our file.

pub(crate) static ROW_MASK: u8 = 0xF0;
pub(crate) static COL_MASK: u8 = 0x0F;
pub(crate) static SET_BIT: u16 = 0x0200;
pub(crate) static DIGIT_MASK: u16 = 0x01FF;
pub(crate) static BOX_MASK: u16 = 0x7800;

These are bit masks that are used to extract certain bits from a value by performing a bitwise and operation of the mask and the data. For example, we can extract the first 4 bits - which encodes the row number in binary - from the 8 bit integer value by performing a bitwise and with the binary number 11110000, which is written in hexidecimal as 0xF0. These constants are made public to the whole crate using pub(crate), which will make them available to other parts of the program, but not from outside. We can see from these masks how the data is going to be arranged in each of the two integer types. The first 4 bits of the row-column data represents the row and the last 4 bits are the column. The first bit of the second integer is not used. The next 4 bits of the the second integer represents the box - since 0x7800 is 0111100000000000 in binary. The next bit is not used, and bit that follows is the set flag. The final 9 bits of the integer - masked by 0x01FF which is 0000000111111111 in binary - represent whether each of the values 1 - 9 are possible in this cell.

To convert from a pair of numbers in the range 1 to 9 into the row-column data to be stored in a SudokuSquare struct, we need to take the binary of that of the row number an shift this 4 bits to the left and add the bits for the column number. We're not going to do any checking at this stage to make sure the values are valid, we assume that the caller has performed these checks in advance. We also have to create a default for the second piece of data, which encodes the box, set flag, and possible digits. At the beginning, all digits are available, the value is not set, and the box is determined from the row and column data. This means our new function will look something like the following.

impl SudokuSquare {

    fn new(row: u8, col: u8) -> SudokuSquare {
        let box_id = SudokuSquare::get_box_index(row, col);

        let position: u8 = ((row & 0x0F) << 4) + (col & 0x0F);
        SudokuSquare(position, box_id | 0x01FF)
    }
}

Let's ignore the get_box_index call for the time being and focus on the second two lines of code. First we set the position by shifting the row digit to the left by 4 places and adding the column digit. (To prevent problems we are using a mask to ignore the first 4 bits of the number.) The position integer is the sum of these two integers. The second value is given by the bitwise or of the box_id value, which is a 16 bit integer provided by the get_box_index function, and the value 0x01FF which signifies that all values are available. The implementation of theget_box_index function is a simple matching exercise.

    fn get_box_index(row: u8, col: u8) -> u16
    {
        let mut box_id: u16 = match (row, col) {
            (r, c) if r <=3 && c<= 3 => 0x0001,
            (r, c) if r <=3 && c<= 6 => 0x0002,
            (r, c) if r <=3 && c<= 9 => 0x0003,
            (r, c) if r <=6 && c<= 3 => 0x0004,
            (r, c) if r <=6 && c<= 6 => 0x0005,
            (r, c) if r <=6 && c<= 9 => 0x0006,
            (r, c) if r <=9 && c<= 3 => 0x0007,
            (r, c) if r <=9 && c<= 6 => 0x0008,
            (r, c) if r <=9 && c<= 9 => 0x0009,
            _ => panic!("Invalid row/column configuration")
        };
        box_id <<= 11;
        box_id
    }

This definition is also placed within the impl block for the SudokuSquare struct. Now we have a struct that represents a cell in a Sudoku grid and a means of creating "blank" squares ready for working.

I'm conscious of the length of this post, so I think this is a good point to stop for now. The code that I have written for this project is available on my GitHub https://github.com/inakleinbottle/bitsudoku.


New start with a different twist

written by Sam on 2020-08-15

I've been thinking recently about moving my blog from being hosted as a Wordpress site and instead deploying a custom made app using Python behind the scenes.

My first idea was to try and build a Django app, built completely from scratch, which made for some interesting experimentation. Unfortunately, I quickly realised that the limiting factor in building a new website from scratch was not my skill at writing Python code, but in making the html templates to make the page look the way I want. (This is not to say that building a Django web app is difficult, which it certainly isn't.) It was at this point that I got very busy with work, and my website project was shelved.

I didn't get another chance to think about the website for several months between September 2019 and June 2020. The chaos of the start of the academic year put the website to the back of my mind, and in the first few weeks I also had to think about writing applications for jobs for the next year. Towards the end of the Autumn teaching term I had my first (and last) interview of the year. Around the same time, I got a message out of the blue on LinkedIn inviting me to write a book about using Python for mathematics. I accepted this invitation, but this is a topic for another day.

Six months later I had mostly finished the book and was starting my new job as a research software engineer. This gave me some time to start thinking about other things, including my website. I returned to my original idea of rebuilding a website app from scratch using Django, and explored some of the content management systems built using Django (Wagtail and Django CMS in particular). However, the problem I experienced at very beginning has not gone away; I'm still bad at designing websites and styling them using HTML and CSS. Since my website is unlikely to be updated too frequently, I decided it would be better to instead generate it statically as simple HTML.

Finally, almost a full year after I first considered changing my website I decided to use the static site content management system Lektor to build and deploy my website. Hopefully this will prove to be a good decision.


Formatting Matrices from Numpy to LaTeX

written by Sam on 2020-02-18

LaTeX is a great tool for producing high quality documents, but it can sometimes be rather cumbersome for producing things like matrices, which hold large amounts of information in an array with many rows and columns. This is made especially frustrating when the matrix you wish to format has been computed using Python and Numpy, and is right there on the PC. I thought that writing a small Python function that formats a Numpy array into LaTeX syntax would be a nice, easy exercise in the Python course (for first year mathematics students) that I teach. However, this turned out to be rather more complex than I had originally thought. I'm going to give a full description of how I might solve this problem using Python, and how to overcome some of the issues that arise.

Before we can do any coding, we need to understand the format that we are aiming to produce. A matrix is formatted in LaTeX within the pmatrix environment, each element of each row is separated by an alignment character &, and each row is separated by a newline character \\. A simple 2 by 2 matrix might be formatted as follows.

\begin{pmatrix} 1 & 2\\ 3 &4 \end{pmatrix}

This should be relatively easy to create using Python in theory, since it is just a string with the numbers inserted into a template of sorts. There are, however, some subtle details to work through here. Let's start by producing a naive implementation of the function that will do the actual formatting. It's going to take a single argument, the matrix to format as a Numpy array, and return a string containing the formatted LaTeX code for that matrix.

def format_matrix(matrix):
    """Format a matrix using LaTeX syntax"""
    rows, cols = matrix.shape 
    lines = ["\\begin{pmatrix}"] 
    for row in range(rows): 
        line = "" 
        for col in range(cols): 
            sep = " & " if col > 0 else "" 
            line = line + sep + str(matrix[row, col]) 
            lineend = "\\\\" if row < rows-1 else "" 
            line = line + lineend
            lines.append(line) lines.append("\\end{pmatrix}")
    matrix_formatted = "\n".join(lines) 
    return matrix_formatted

This function will perform as we expect, but it is horribly inefficient and not particularly clean. I would describe this as a reasonable first pass following exactly the procedure of formatting by hand. Let's walk through this function definition step by step.

The first two lines are the standard declaration of a Python function, using the def keyword followed by the name of the function and the argument specification (signature), and then the documentation string for the function. On the next line, we retrieve the shape of the matrix, which describes the number of rows and columns of the matrix as a tuple of integers. We unpack the two integers into the variables rows and cols that we will use for the iteration.

Next comes the real body of the function, the part where we actually construct each line of the output as a string. In this implementation, we use two nested for loops to achieve the output. Our strategy is to build up a list of strings that constitute the lines of the output that we will join together as lines right at the end of the function using the string join method docs. Before we start the looping, we first create the list of lines that we will populate that contains the start of the pmatrix environment as a string:

lines = ["\\begin{pmatrix}"]

Now we can start the looping. The first loop is over the range of indices of each row in the matrix, generated using the Python range object range(rows). In each of the iteration of this row we will build up the string that will be added to the lines list. Here we build this string sequentially, starting with a blank string. Now we start the inner loop, which iterates over each column index (just as we did for rows). Inside this loop we need to add each element of the matrix by index and add this to the string that we are building. This involves adding the separator & if it is not the first element in the row and the number. Here we are using the ternary assignment in Python

sep = " & " if row > 0 else ""

We can't simply join a number to a string, we need to convert it into a string first. There are two ways that we can build up the string. The first is to simply convert the number to a string, by calling the str function to explicitly convert to string. The alternative is to use a format method or an f-string. The latter method is probably better in many cases but later we will replace this with an alternative anyway.

Once we've built the string for the line inside the inner loop, we need to add the line separators \\ to all but the last line, and then append the line string to the lines list. Inside the outer loop, but not the inner loop, we again use the ternary assignment to conditionally add the line separator, and then append the completed string to the list.

At the very end, we use the join method, as mentioned above, to join the strings in the lines list, and then return the completed string.

Fixing the obvious problems

As it stands, the function we wrote above is pretty basic. First, and probably most important, is that it is not very Pythonic. Roughly speaking, a piece of code is Pythonic if it (correctly) leverages the features of the Python language and follows the Zen of Python link.

The first thing that jumps out at me when I look at the function we have written is the nested for loops. Generally speaking, this is a sign that the code we have written isn't going to perform well, and certainly could be refactored to make it easier to debug. (Of course, there are some circumstances where nested for loops are simply unavoidable, but these cases are certainly rare.) Let's take a closer look at the main body of the outer loop, and see if we can make some improvements.

line = "" 
for col in range(cols): 
    sep = " & " if col > 0 else "" 
    line = line + sep + str(matrix[row, col]) 
    lineend = "\\\\" if row < rows-1 else "" 
    line = line + lineend

The purpose of this code is to build up each line of the matrix in LaTeX format. As we've discussed, we start with a blank string, and build this up in the for loop that follows. Building up a string with a separator is a common task and, perhaps unsurprisingly, there is a fast and efficient way to do this in Python: the str.join method. Now we can't simply apply this method to the row of the matrix such as follows.

line = " & ".join(matrix[row, :])

The problem here is that the join method expects an iterable of strings, not numbers. Instead we have to change each of the numbers to a string by applying the str function to each number. There are other ways of doing this, but in this context perhaps the easiest is to use the map function docs, which creates a new iterable by applying a function to all the elements of the old iterable. Now we can replace most of the body of the outer for loop with a single line. (We opted to use the str function before because it allows us to apply it using the map function here.)

line = " & ".join(map(str, matrix[row, :]))

This code is more dense, but is somehow much more descriptive as to what is actually happening (from the inside out): we apply the str function to each number in the matrix row and then join these strings together with the separator "&". What we can't change is the way that we apply the line ending to each line. (We'll come back to this later.) Now our code for the body of the outer loop will look something like this:

lineend = "\\\\" if row < rows-1 else ""
line = " & ".join(map(str, matrix[row, :])) + lineend

Now let's look at the outer loop. Here we are iterating over a range of indices generated by range(rows) This is not very Pythonic, and doesn't make use of the fact that Numpy arrays are themselves iterators. This means we can use a Numpy array directly in a for loop to iterate over the rows of the (two dimensional) array. (Iterating over a 1 dimensional array yields the elements.) This means we could replace the outer loop code by the following.

for row in matrix:
    line = " & ".join(map(str, row)) + "\\\\"
    lines.append(line)

Notice that we've replaced the row lookup in body of the loop matrix[row, :] by just the row variable coming from the loop. This row variable now contains a 1 dimensional Numpy array rather than an integer. Unfortunately, by doing this we've gained an extra LaTeX new line at the end of the matrix body. (Actually this won't cause any problems in the LaTeX compilation, but it is good from a code style point of view.) Our full code now looks as follows.

def format_matrix(matrix):
    """Format a matrix using LaTeX syntax"""
    rows, cols = matrix.shape
    lines = ["\\begin{pmatrix}"]

    for row in matrix:
        line = " & ".join(map(str, row)) + "\\\\"
        lines.append(line)

    lines.append("\\end{pmatrix}")
    matrix_formatted = "\n".join(lines)
    return matrix_formatted

This is already a great improvement on the original function, but we still have some way to go to clean this function up properly. Since we've changed our method of iteration in the one remaining for loop, we no longer need to retrieve the number of rows and columns of the matrix in the first line. Second, we can still improve the way construct the final string to return and, by doing so, make the loop simpler yet.

At present, we construct each line of the whole formatted matrix string and then join all these lines together to form the final string. However, we could instead reserve the join for the body of the matrix only, allowing us to simplify the loop. For this, we will replace the final few lines of the function with a f-string such as the following.

body = "\\\\\n".join(body_lines)
return f"\\begin{{pmatrix}}\n{body}\n\\end{{pmatrix}}"

Our task now is to define the body_lines list using only the lines that come from the matrix. The advantage of this over the code we had above is that we have also recovered the original functionality where the final line of the main matrix body did not have an extra LaTeX line end that was lost in the first pass rewrite.

This method also allows us to remove the clunky for loop in favour of the more Pythonic, and easy to read, list comprehension. This means we can replace the loop and list initialisation with the following list comprehension.

body_lines = [" & ".join(map(str, row)) for row in matrix]

Now we have the start of a nice, well-written Python function that has a fraction of the number of lines that we started with. The following is the full code that we have so far.

def format_matrix(matrix):
    """Format a matrix using LaTeX syntax"""
    body_lines = [" & ".join(map(str, row)) for row in matrix]

    body = "\\\\\n".join(body_lines)
    return f"\\begin{{pmatrix}}\n{body}\n\\end{{pmatrix}}"

Fixing some potential problems

There is still some considerable way to go to make this function "idiot proof". The first thing we should really do is add a better documentation string, but we won't be extending this to save some space. For those who wish to know, there are official guidelines for writing documentation strings in PEP257 link. The other things we need to address, such as checking the type and shape of the input array.

What I mean by this is that, at the moment, we could pass any variable we like into this function, even though we really only want this to work with 2 dimensional Numpy arrays. Of course, we will get an error at various points if the object we pass doesn't conform to certain conditions. For example, if we pass None into this function, we will get a TypeError since None is not iterable. Moreover, the error message that we get from the function, as it currently stands, will not be particularly helpful in diagnosing problems later down the line.

The best thing to do here is to insert a type checking statement at the top of the function, that will raise a meaningful exception if the type of the argument is not a Numpy array. We can do this using the follow lines of code.

if not isinstance(matrix, np.ndarray):
    raise TypeError("Function expects a Numpy array")

We also need to make sure the array is 2 dimensional, otherwise we will get another TypeError from the map function if the members of the array are numbers. We don't need to raise an exception if the array is 1 dimensional though, because we can perform a cheap reshape of the array to make a 1 dimensional array into a 2 dimensional array. This is a perfect opportunity to use the "walrus operator" (PEP527) that is new in Python 3.8. If the array has more than 2 dimensions, we will need to throw an exception.

if len(shape := matrix.shape) == 1:
    matrix = matrix.reshape(1, shape[0])
elif len(shape) > 2:
    raise ValueError("Array must be 2 dimensional")

Adding these checks in gives the following "finished" code.

import numpy as np

def format_matrix(matrix):
    """Format a matrix using LaTeX syntax"""

    if not isinstance(matrix, np.ndarray):
        raise TypeError("Function expects a Numpy array")

    if len(shape := matrix.shape) == 1:
        matrix = matrix.reshape(1, shape[0])
    elif len(shape) > 2:
        raise ValueError("Array must be 2 dimensional")

    body_lines = [" & ".join(map(str, row)) for row in matrix]

    body = "\\\\\n".join(body_lines)
    return f"\\begin{{pmatrix}}\n{body}\n\\end{{pmatrix}}"

This function will now give us useful error messages if we provide an argument that isn't a Numpy array. Unfortunately this comes at a cost. Before we integrated our type checking, we could have called the function with nested lists, such as those that you might provide to np.array function to create a new Numpy array. For example, the following call will no longer work.

format_matrix([[1, 2], [3, 4]])

This is an important point about Python programming, that embracing the lack of strong type checking often leads to errors that can be difficult to diagnose, but implementing some type checking can make your code less flexible. We can recover some of the flexibility here by attempting to convert the argument to a Numpy array first, raising an exception if this conversion fails.

if not isinstance(matrix, np.ndarray):
    try:
        matrix = np.array(matrix)
    except Exception:
        raise TypeError("Could not convert to Numpy array")

This will mean that we can call this function with nested lists, as above, and it will work. In this case the run-time cost of converting to a Numpy array is relatively small, especially for matrices that we are likely to print into a LaTeX document. Hence our full function is now complete.

import numpy as np

def format_matrix(matrix):
    """Format a matrix using LaTeX syntax"""

    if not isinstance(matrix, np.ndarray):
        try:
            matrix = np.array(matrix)
        except Exception:
            raise TypeError("Could not convert to Numpy array")

    if len(shape := matrix.shape) == 1:
        matrix = matrix.reshape(1, shape[0])
    elif len(shape) > 2:
        raise ValueError("Array must be 2 dimensional")

    body_lines = [" & ".join(map(str, row)) for row in matrix]

    body = "\\\\\n".join(body_lines)
    return f"\\begin{{pmatrix}}\n{body}\n\\end{{pmatrix}}"

Going the extra mile

The function we have written already is functional and should be relatively easy to use, debug, and maintain in the future, even when we have forgotten how it works. Really the only thing we should have done is written a more complete documentation string. (As I mentioned earlier, we haven't done this for space.) However, there are some further improvements that we can make that will greatly improve the functionality.

A very simple improvement we can make is to allow for optionally changing the LaTeX matrix environment from pmatrix to another matrix environment such as bmatrix. We can do this by adding an optional argument to the signature of the function, and then incorporating this environment variable into the f-string at the end of the function.

def format_matrix(matrix, environment="pmatrix"):
    """Format a matrix using LaTeX syntax"""

    # -/- snip -/-

    return f"""\\begin{{{environment}}}
{body}
\\end{{{environment}}}"""

At the moment, if we call the function with a matrix containing fractions then we will get a rather bloated LaTeX formatted matrix. This is because the default behaviour for converting a floating point number to a string is to print all the decimal points. Since we have used the str function to perform this conversion, we can adapt the function rather easily to accept custom formatters for printing the matrix elements. We can again include an optional argument to allow for this customisation.

def format_matrix(matrix, environment="pmatrix", formatter=str):
    """Format a matrix using LaTeX syntax"""

    # -/- snip -/-

    body_lines = [" & ".join(map(formatter, row)) for row in matrix]

    # -/- snip -/-

This means we can truncate the number of decimal places or perform any other operation we like by supplying a custom formatting function beyond the standard str function.

All these improvements together gives us the final, finished version of the code as follows.

import numpy as np

def format_matrix(matrix, environment="pmatrix", formatter=str):
    """Format a matrix using LaTeX syntax"""

    if not isinstance(matrix, np.ndarray):
        try:
            matrix = np.array(matrix)
        except Exception:
            raise TypeError("Could not convert to Numpy array")

    if len(shape := matrix.shape) == 1:
        matrix = matrix.reshape(1, shape[0])
    elif len(shape) > 2:
        raise ValueError("Array must be 2 dimensional")

    body_lines = [" & ".join(map(formatter, row)) for row in matrix]

    body = "\\\\\n".join(body_lines)
    return f"""\\begin{{{environment}}}
{body}
\\end{{{environment}}}"""

I still think this exercise might have been a bit tricky, but there are a lot of elements involved here. Hopefully you have learned something by reading the code I have written here, and understood my reasoning for making all of these changes.


Grant applications and the EPSRC

written by on 2019-04-02

I recently attended a workshop on the grant application procedure ran by the Engineering and Sciences Research Council (EPSRC). The EPSRC is one of several UK research councils and is responsible for funding research across engineering, science, and mathematics through a number of grant and fellowship schemes. I went to the workshop with many questions, and misconceptions, about the grant review process and how funds are allocated. I'm happy to say that I learned a lot, and most of my questions were answered.

The grant writing process is a bit of a mystery and I suspect that this is also true for many other early career academics. I hope that a better understanding of the review process for grant proposals will make it easier when I do eventually apply for a grant. I thought it wise to share what I learned from the workshop. Obviously this is specific to mathematics, but the procedure in other subjects is presumably similar.

Disclaimer: I do not claim to be an expert on grant proposal writing, and I do not represent the EPSRC in any way. I hope that this article is useful to people and that the information contained within is accurate. Please let me know if there are any errors so that they may be corrected.

Once a grant proposal has been submitted, the proposal is first passed to one of a number of portfolio managers, each covering one or more subject areas within mathematics. They will find three reviewers, who will read and comment on the proposal. Two of these reviewers will be found by the portfolio manager from the EPSRC college of reviewers. The third reviewer is chosen will be one of three reviewers suggested by the applicant. (Assuming that these suggestions are appropriate and able to provide a review.)

The reviewers will score the proposal and make comments to be passed back to the applicant. If the reviews are supportive of the proposal, it will be passed to the next stage; an unsupported proposal will be rejected. If the proposal progresses to the next stage, the reviews of the proposal will be passed back to the applicant, and they will have the opportunity to respond to some of the points raised by the reviewers. It is important to note that this response will not bee seen by the reviewers!

The response by the applicant should address any criticisms and concerns, and should be reasonably self-contained. The reviews and the applicants response will be the main points considered in the next stage of the application procedure, which is a panel.

The grant review panel meets several times per year to consider proposals and decide which have sufficient merit to be funded. The panel consists of a number of academics, around 15 I believe, who sit at a table and discuss each proposal, its reviews, and the applicant's response to the criticism. The proposals are ranked, and a number of proposals from the top of this list are chosen to be funded. The number will depend on the value of each proposal and the funds available.

At present, the EPSRC offer a number of grant and fellowship schemes. For most, especially those in established positions who have received grants in the past, the standard grant scheme will be most appropriate. This scheme covers any proposals that fall within the EPSRC remit with no restrictions placed on the length or value of the proposal.

For early career academics who have not yet submitted a grant proposal, there is also the "new investigator award", which is only available for those who have not yet been the recipient of a significant grant (over the value of £100k). There are no restrictions on when an proposal can be submitted to this scheme - this is a recent change to allow more flexibility for those who have taken career breaks. Note that the applicant is expected to hold a permanent academic post in order to apply for this scheme.

In addition to these grant schemes are three fellowship schemes, which aim to pay for the applicants time to further their research and typically last three to five years. Unlike other awards a fellowship is a personal award, which means that the funding is tied to the investigator and not a specific institution. There are limitations placed on the subject of a fellowship: it must align with one of the EPSRC priority areas. At present, the areas "intradisciplinary research" and "new connections from mathematical sciences" cover a large number of possibilities.

There are three levels of fellowships available: postdoctoral; early career; established career. These levels represent the different career stages to which they are available, although it is left to the applicant to apply for the scheme that they believe is most appropriate. The EPSRC provides person specifications that describe typical applicants at each stage. Postdoctoral fellowships have a shorter duration but offer the greatest flexibility in subject area, whilst established career fellowships have the longest duration but are more restrictive on subject area.

It seems that there is a large amount of flexibility in the fellowship schemes. However, I feel that fellowships are still extremely competitive and will likely require a large investment of time, with the knowledge that the proposal might not be funded. The standard grant schemes may be less competitive but also seem to be intended for people who already hold (permanent) lectureship positions. (Even if this is not strictly the case, it would be troublesome to apply and receive a grant on a temporary contract.)

It seems that the EPSRC have made a number of positive steps recently to provide what support they can to academics on "non-standard" career paths; in particular, those who have taken career breaks or breaks from research. This is important for those who intend to start families early in their careers. It also seems that they value input from early career researchers through their early career forum, of which I was not aware before the workshop. They also seem to encourage the participation of early career researchers in the review and panel processes.


The napkin ring problem

written by Sam on 2018-10-28

I heard an interesting thought experiment on a podcast some time ago, but I am not sure of the origins of the problem. This problem is the napkin ring problem. Despite hearing this problem some time ago I have only just found time to convince myself that it is true, surprising as it is. I thought I would share the solution since it is a lovely application of multiple integrations. The problem is as follows:

Take a tennis ball and drill a cylindrical hole exactly through the centre of the ball so a to leave a ring around the circumference whose height is 2cm. (This ring would resemble a napkin ring.) Now take a much larger ball, say the Earth (if this were a perfect sphere), and perform the same task, leaving a ring around the equator of height 2cm. Then these two rings will have precisely the same volume.

On first hearing, this seems impossible. The Earth is much larger than a tennis ball. Taking even a small ring around the equator must be much larger than the tennis ball. Then we draw a few pictures - not to scale, of course - and see that as the sphere becomes larger, the size of the ring around the circumference gets much thinner.

napkin-ring-image

The napkin ring problem sketch. As the sphere gets larger, the ring around the circumference must become thinner.

If we fix a small length, say 1cm, then Pythagoras's theorem tells us that the distance from the centre of the sphere to the inner edge of the ring is given by the square root of the radius squared minus this length squared. (Assuming the length is smaller than the radius, this can be done.) When the the length is relatively large compared to the radius, as in the case of the tennis ball, the thickness of the ring is also quite large. However, when the length is very small compared to the radius, the thickness of the ring must be very thin, as in the case of the Earth.

In case this is not clear, let's put some numbers in to see how the thickness changes. Let's suppose that the radius of the tennis ball is 3cm and the radius of the earth is 1000m - that is 100000cm, which is a gross underestimate of the radius of the earth. We take our small length to be 1cm. As in our argument above, the thickness of the ring for the tennis ball is 3cm minus the square root of 8, which gives approximately 0.172. Comparatively, performing the same calculation for the Earth gives the thickness of the ring as 0.000005cm. (For the actual radius of the Earth, this quantity is tiny!)

The argument above does not show that the volume does not change in these two cases. For this, we need to use a little calculus. What we actually need is to calculate the area of the segment of the circle that is "cut off" to make the ring. The volume is $2\pi$ multiplied by this area. I should warn you here, the argument get's a little technical at this point.

To get the area of this segment, we use the following double integral \[ A = \int_{-\ell}^{\ell}\int_{\sqrt{R^2 - \ell^2}}^{\sqrt{R^2 - z^2}} r \,\mathrm{d}r\,\mathrm{d}z = 2. \] Here $R$ is the radius of the sphere, $\ell$ is the height of the ring measured from the diameter (we are assuming, for simplicity, that the sphere is centred at the origin), and $z$ is the vertical distance from the horizontal diameter. The quantity $r$ denotes the distance from the centre of the cylindrical drilled-out section to a point in the sphere. (Technically, we are using cylindrical polar coordinates to evaluate this integral.) The limits of integration are obtained using some careful applications of Pythagoras's theorem. Evaluating the integral, we see that the area is two-thirds of the length $\ell$ cubed; in particular, this is independent of the radius $R$. Thus the volume of the ring is given by \[ \frac{4\pi\ell^3}{3}=2. \] This is a lovely problem, where the mathematics gives a relatively surprising fact. The solution using multiple integrals does not necessarily provide any intuition as to why this is true, but it is a nice application of vector calculus to a fairly real-world problem.


Off on a tangent

written by Sam on 2018-09-22

I've recently become somewhat interested in smooth manifolds and surfaces as a result of preparation for various interviews, amongst other reasons. The concept of a surface is very intuitive, and is a concept that a student of mathematics is likely to encounter very early in their mathematical careers, though a reasonable definition of a surface takes more effort. The result of this formalisation is a remarkably elegant theory, which eventually leads to ideas of smooth manifolds - arguably one of the best sounding mathematical objects - and generalised calculus.

It took quite some time before I had a "big picture" of what a surface is, and how this fits into the grander theory. Undoubtedly I am missing some of the major pieces to this puzzle, but it does show some of the elegance of the theory (at least for me).

Let us return to the relatively basic theory of differential calculus. Our first introduction to differentiation usually comes during A-level maths, where we are told various standard derivatives without much explanation, and then that these derivatives represent the gradient of a curve at a given point. (For those who may need to look at derivatives once again, I suggest the Wikipedia page on differentiation, which has some lovely illustrations that will help with this discussion.)

Geometrically, the what is happening is that we are constructing a line that meets the curve only once at our selected point (here we only consider the points that are relatively "close-by", a tangent line might cross the curve elsewhere but this should be sufficiently "far away"). The gradient of this tangent line is equal to the value of the derivative at the chosen point.

Lines are very simple geometric figures that we can easily describe and manipulate. If we start at our points and move along the tangent line a small amount, then the point on the tangent line will be very close to the point on the curve a similar distance from our selected point. We might say then that a derivative gives us the means of approximating the value of a complicated curve using straight lines nearby to a given point. (In fact, this is the fundamental idea that underpins a large number of numerical methods for solving differential equations.)

In three dimensions, we have a new possibilities to consider. We are no longer restricted to curves, and we can instead investigate the properties of surfaces such as the unit sphere (those points that have distance 1 to the origin). Here we can no longer approximate using a single line, since the sphere expands away from any given point in many directions. This is also reflected in the equation that determines the points in the sphere, which has two independent variables. This is typical of surfaces in three-dimensional space. (Think of a sheet that has peaks and valleys, you can move about on the sheet as if you were in the plane, although your movements also move you through the third, unseen dimension.)

Many surfaces can be realised as the zero set of some function of several variables. The partial derivatives of such a function give us information about how the function evolves in the direction of each of its variables, which are usually the x, y, z (and so on) directions. From these partial derivatives, we can find a directional derivative of our function in any direction by taking a weighted combination of the partial derivatives (a linear combination).

The evolution of a surface, as we move away from a given point, can be approximated by the set of all the possible weighted combinations of the partial derivatives at that point. This is the tangent space of the surface at the point. In the case of the unit sphere, the associated function has two variables so the two partial derivatives determine a plane at each point.

Tangent spaces provide an essential tool in differential geometry - the study of smooth surfaces - because it is much easier to understand a plane than a complex surface.


The ArXiv and me (Part 1)

written by on 2018-08-10

The ArXiv is a popular pre-print article server for physics, mathematics, and computer science (and other subjects) hosted by Cornell University. It is a fairly common practice for academics to upload a preliminary version of their articles (or other works) to the ArXiv to make them publicly available before they are formally published in a journal. (The process of publication is often lengthy, and many consider it best to make the article available in advance, even though it probably has not yet been peer-reviewed.)

At present, there are around 1.4 million articles hosted on the ArXiv, and more are added every day. (Should you wish to see a visual representation of the articles on the ArXiv, which I assume you do, you should visit paperscape.) In the sub-topics that I watch, there are (approximately) between 4-10 new papers added per day, and these topics are not amongst the most active on the ArXiv. The problem then is to filter the daily uploads to find the papers that are likely to be of interest for me. Luckily, about the same time that I started to think of an automated solution to this problem, I discovered that the ArXiv has some tools that can help with this problem.

My go-to language for automating things is Python, which has many tools for retrieving processing web data. Building on an example provided in the ArXiv's API (Application Programming Interface) documentation, I decided to use the feedparser to gather and process the ArXiv's daily RSS feed. The basic code is as follows.

import feedparser
url = 'http://arxiv.org/rss/math'
feed = feedparser.parse(url)

Once the feed has been retrieved, we must extract the data that we need. My original approach was to use Python's built-in namedtuple to store the data. These give a nice class-like object with named attributes (in this case authors, title, id, and, abstract), but are still relatively light-weight data structures. The namedtuple comes from the Python 'collections' module in the standard library where many useful data-structure types can be found.

from collections import namedtuple 
ArxivEntry = namedtuple(
    'ArxivEntry', 
    ('authors', 
     'title', 
     'id', 
     'abstract')
    )
entries = [
    ArxivEntry(
        entry.authors, 
        entry.title, 
        entry.id, 
        entry.summary) 
    for entry in feed.entries]

Now comes the tricky part of filtering out those entries that may be of interest. The method that I chose for this task was a simple keyword filter on the abstract of each entry. I set up a list of keywords from articles that I have read in the past, stored in a list called keywords and filtered entries by whether any of the keywords appeared in the abstract.

keywords = [ ... ] # too many to list here. 
accepted = [entry for entry in entries 
            if any(kw in entry.abstract 
                   for kw in keywords)]

Now that I had the vital parts of the problem solved, I added some code to write all the accepted articles into a file for each day, and set the script to run on my Raspberry Pi at 6 am each day, using a Cron job.

So far it has selected some 30 articles, though not all have been exactly to my taste. The script too is rather simplistic, and does not allow for easy modification to the filtering method. I have started working on a new and improved set of tools for filtering ArXiv entries, which will eventually allow me to customise and experiment with the filtering method without major rewrites to my code.


The Zombie Apocalypse

written by Sam on 2018-07-15

Zombies are a staple of modern popular culture, and appear in a variety of forms, including the traditional slow-moving, unintelligent zombie hordes and less common fast-moving - and perhaps intelligent - zombies. The common theme in media featuring zombies is the the zombie infection, which may affect either the living or the deceased, or both. It is generally agreed that this infection is passed from zombie to non-zombie by means of a scratch or bite, and infection always leads to a transformation into a zombie. (In some more modern interpretations, this transformation may be reversible.)

The spread of the zombie infection is an interesting problem to model mathematically. There are many factors to consider: the chances of a non-zombie becoming infected in an interaction with a zombie; the rate at which interactions between non-zombies and zombies occur; the spread the zombie horde from place to place; and the "critical mass" of the zombie horde at which point there is insufficient food for all zombies. We can model parts of this problem in isolation, with some simplifying assumptions.

For example, we might consider the following scenario. Suppose that the world is divided into nine "square" regions, numbered one to nine in the usual "keypad" arrangement, and that the zombie infection first arises in region five (in the centre of the arrangement). Then the zombie population across all nine regions can be modelled using a diffusion model.

We might also examine spread of the zombie infection from a probabilistic point of view, where we instead model the spread by assigning a probability that a non-zombie will become infected during an encounter with a zombie. At each point in time there will be interactions between existing zombies and non-infected people, and at each of these interactions, there is a chance that the non-infected person will become infected. This can be pictured as a tree, where each branch represents a different zombie, and each branch splits in two if the zombie infects a person at a given time. Mathematically, this is a branching process.

A zombie apocalypse is just one of many scenarios that can be modeled using a branching process, even if it is (most likely) fictitious. Indeed, epidemic modeling using branching processes, and more general stochastic processes, is an active field of research and provides a useful tool for predicting the evolution of an infection.

I have not studied probability formally since I was an undergraduate, and since then I have acquired a much more powerful arsenal of mathematical tools, and many concepts that once seemed impenetrable are now much more clear. I decided that it was time for me to refresh my knowledge of probability theory with my new-found knowledge and endeavor to understand some of the nuances of the theory.

This refresher is motivated by the surprising appearance of probabilistic aspects in operator theory. This gives me an excellent excuse for spending time searching for mathematical papers on the zombie apocalypse, and "researching" their role in popular culture, although I feel the latter would have happened either way.