What's inside the Klein bottle?

Formatting Matrices from Numpy to LaTeX

written by Sam on 2020-02-18

LaTeX is a great tool for producing high quality documents, but it can sometimes be rather cumbersome for producing things like matrices, which hold large amounts of information in an array with many rows and columns. This is made especially frustrating when the matrix you wish to format has been computed using Python and Numpy, and is right there on the PC. I thought that writing a small Python function that formats a Numpy array into LaTeX syntax would be a nice, easy exercise in the Python course (for first year mathematics students) that I teach. However, this turned out to be rather more complex than I had originally thought. I'm going to give a full description of how I might solve this problem using Python, and how to overcome some of the issues that arise.

Before we can do any coding, we need to understand the format that we are aiming to produce. A matrix is formatted in LaTeX within the pmatrix environment, each element of each row is separated by an alignment character &, and each row is separated by a newline character \\. A simple 2 by 2 matrix might be formatted as follows.

\begin{pmatrix} 1 & 2\\ 3 &4 \end{pmatrix}

This should be relatively easy to create using Python in theory, since it is just a string with the numbers inserted into a template of sorts. There are, however, some subtle details to work through here. Let's start by producing a naive implementation of the function that will do the actual formatting. It's going to take a single argument, the matrix to format as a Numpy array, and return a string containing the formatted LaTeX code for that matrix.

def format_matrix(matrix):
    """Format a matrix using LaTeX syntax"""
    rows, cols = matrix.shape 
    lines = ["\\begin{pmatrix}"] 
    for row in range(rows): 
        line = "" 
        for col in range(cols): 
            sep = " & " if col > 0 else "" 
            line = line + sep + str(matrix[row, col]) 
            lineend = "\\\\" if row < rows-1 else "" 
            line = line + lineend
            lines.append(line) lines.append("\\end{pmatrix}")
    matrix_formatted = "\n".join(lines) 
    return matrix_formatted

This function will perform as we expect, but it is horribly inefficient and not particularly clean. I would describe this as a reasonable first pass following exactly the procedure of formatting by hand. Let's walk through this function definition step by step.

The first two lines are the standard declaration of a Python function, using the def keyword followed by the name of the function and the argument specification (signature), and then the documentation string for the function. On the next line, we retrieve the shape of the matrix, which describes the number of rows and columns of the matrix as a tuple of integers. We unpack the two integers into the variables rows and cols that we will use for the iteration.

Next comes the real body of the function, the part where we actually construct each line of the output as a string. In this implementation, we use two nested for loops to achieve the output. Our strategy is to build up a list of strings that constitute the lines of the output that we will join together as lines right at the end of the function using the string join method docs. Before we start the looping, we first create the list of lines that we will populate that contains the start of the pmatrix environment as a string:

lines = ["\\begin{pmatrix}"]

Now we can start the looping. The first loop is over the range of indices of each row in the matrix, generated using the Python range object range(rows). In each of the iteration of this row we will build up the string that will be added to the lines list. Here we build this string sequentially, starting with a blank string. Now we start the inner loop, which iterates over each column index (just as we did for rows). Inside this loop we need to add each element of the matrix by index and add this to the string that we are building. This involves adding the separator & if it is not the first element in the row and the number. Here we are using the ternary assignment in Python

sep = " & " if row > 0 else ""

We can't simply join a number to a string, we need to convert it into a string first. There are two ways that we can build up the string. The first is to simply convert the number to a string, by calling the str function to explicitly convert to string. The alternative is to use a format method or an f-string. The latter method is probably better in many cases but later we will replace this with an alternative anyway.

Once we've built the string for the line inside the inner loop, we need to add the line separators \\ to all but the last line, and then append the line string to the lines list. Inside the outer loop, but not the inner loop, we again use the ternary assignment to conditionally add the line separator, and then append the completed string to the list.

At the very end, we use the join method, as mentioned above, to join the strings in the lines list, and then return the completed string.

Fixing the obvious problems

As it stands, the function we wrote above is pretty basic. First, and probably most important, is that it is not very Pythonic. Roughly speaking, a piece of code is Pythonic if it (correctly) leverages the features of the Python language and follows the Zen of Python link.

The first thing that jumps out at me when I look at the function we have written is the nested for loops. Generally speaking, this is a sign that the code we have written isn't going to perform well, and certainly could be refactored to make it easier to debug. (Of course, there are some circumstances where nested for loops are simply unavoidable, but these cases are certainly rare.) Let's take a closer look at the main body of the outer loop, and see if we can make some improvements.

line = "" 
for col in range(cols): 
    sep = " & " if col > 0 else "" 
    line = line + sep + str(matrix[row, col]) 
    lineend = "\\\\" if row < rows-1 else "" 
    line = line + lineend

The purpose of this code is to build up each line of the matrix in LaTeX format. As we've discussed, we start with a blank string, and build this up in the for loop that follows. Building up a string with a separator is a common task and, perhaps unsurprisingly, there is a fast and efficient way to do this in Python: the str.join method. Now we can't simply apply this method to the row of the matrix such as follows.

line = " & ".join(matrix[row, :])

The problem here is that the join method expects an iterable of strings, not numbers. Instead we have to change each of the numbers to a string by applying the str function to each number. There are other ways of doing this, but in this context perhaps the easiest is to use the map function docs, which creates a new iterable by applying a function to all the elements of the old iterable. Now we can replace most of the body of the outer for loop with a single line. (We opted to use the str function before because it allows us to apply it using the map function here.)

line = " & ".join(map(str, matrix[row, :]))

This code is more dense, but is somehow much more descriptive as to what is actually happening (from the inside out): we apply the str function to each number in the matrix row and then join these strings together with the separator "&". What we can't change is the way that we apply the line ending to each line. (We'll come back to this later.) Now our code for the body of the outer loop will look something like this:

lineend = "\\\\" if row < rows-1 else ""
line = " & ".join(map(str, matrix[row, :])) + lineend

Now let's look at the outer loop. Here we are iterating over a range of indices generated by range(rows) This is not very Pythonic, and doesn't make use of the fact that Numpy arrays are themselves iterators. This means we can use a Numpy array directly in a for loop to iterate over the rows of the (two dimensional) array. (Iterating over a 1 dimensional array yields the elements.) This means we could replace the outer loop code by the following.

for row in matrix:
    line = " & ".join(map(str, row)) + "\\\\"
    lines.append(line)

Notice that we've replaced the row lookup in body of the loop matrix[row, :] by just the row variable coming from the loop. This row variable now contains a 1 dimensional Numpy array rather than an integer. Unfortunately, by doing this we've gained an extra LaTeX new line at the end of the matrix body. (Actually this won't cause any problems in the LaTeX compilation, but it is good from a code style point of view.) Our full code now looks as follows.

def format_matrix(matrix):
    """Format a matrix using LaTeX syntax"""
    rows, cols = matrix.shape
    lines = ["\\begin{pmatrix}"]

    for row in matrix:
        line = " & ".join(map(str, row)) + "\\\\"
        lines.append(line)

    lines.append("\\end{pmatrix}")
    matrix_formatted = "\n".join(lines)
    return matrix_formatted

This is already a great improvement on the original function, but we still have some way to go to clean this function up properly. Since we've changed our method of iteration in the one remaining for loop, we no longer need to retrieve the number of rows and columns of the matrix in the first line. Second, we can still improve the way construct the final string to return and, by doing so, make the loop simpler yet.

At present, we construct each line of the whole formatted matrix string and then join all these lines together to form the final string. However, we could instead reserve the join for the body of the matrix only, allowing us to simplify the loop. For this, we will replace the final few lines of the function with a f-string such as the following.

body = "\\\\\n".join(body_lines)
return f"\\begin{{pmatrix}}\n{body}\n\\end{{pmatrix}}"

Our task now is to define the body_lines list using only the lines that come from the matrix. The advantage of this over the code we had above is that we have also recovered the original functionality where the final line of the main matrix body did not have an extra LaTeX line end that was lost in the first pass rewrite.

This method also allows us to remove the clunky for loop in favour of the more Pythonic, and easy to read, list comprehension. This means we can replace the loop and list initialisation with the following list comprehension.

body_lines = [" & ".join(map(str, row)) for row in matrix]

Now we have the start of a nice, well-written Python function that has a fraction of the number of lines that we started with. The following is the full code that we have so far.

def format_matrix(matrix):
    """Format a matrix using LaTeX syntax"""
    body_lines = [" & ".join(map(str, row)) for row in matrix]

    body = "\\\\\n".join(body_lines)
    return f"\\begin{{pmatrix}}\n{body}\n\\end{{pmatrix}}"

Fixing some potential problems

There is still some considerable way to go to make this function "idiot proof". The first thing we should really do is add a better documentation string, but we won't be extending this to save some space. For those who wish to know, there are official guidelines for writing documentation strings in PEP257 link. The other things we need to address, such as checking the type and shape of the input array.

What I mean by this is that, at the moment, we could pass any variable we like into this function, even though we really only want this to work with 2 dimensional Numpy arrays. Of course, we will get an error at various points if the object we pass doesn't conform to certain conditions. For example, if we pass None into this function, we will get a TypeError since None is not iterable. Moreover, the error message that we get from the function, as it currently stands, will not be particularly helpful in diagnosing problems later down the line.

The best thing to do here is to insert a type checking statement at the top of the function, that will raise a meaningful exception if the type of the argument is not a Numpy array. We can do this using the follow lines of code.

if not isinstance(matrix, np.ndarray):
    raise TypeError("Function expects a Numpy array")

We also need to make sure the array is 2 dimensional, otherwise we will get another TypeError from the map function if the members of the array are numbers. We don't need to raise an exception if the array is 1 dimensional though, because we can perform a cheap reshape of the array to make a 1 dimensional array into a 2 dimensional array. This is a perfect opportunity to use the "walrus operator" (PEP527) that is new in Python 3.8. If the array has more than 2 dimensions, we will need to throw an exception.

if len(shape := matrix.shape) == 1:
    matrix = matrix.reshape(1, shape[0])
elif len(shape) > 2:
    raise ValueError("Array must be 2 dimensional")

Adding these checks in gives the following "finished" code.

import numpy as np

def format_matrix(matrix):
    """Format a matrix using LaTeX syntax"""

    if not isinstance(matrix, np.ndarray):
        raise TypeError("Function expects a Numpy array")

    if len(shape := matrix.shape) == 1:
        matrix = matrix.reshape(1, shape[0])
    elif len(shape) > 2:
        raise ValueError("Array must be 2 dimensional")

    body_lines = [" & ".join(map(str, row)) for row in matrix]

    body = "\\\\\n".join(body_lines)
    return f"\\begin{{pmatrix}}\n{body}\n\\end{{pmatrix}}"

This function will now give us useful error messages if we provide an argument that isn't a Numpy array. Unfortunately this comes at a cost. Before we integrated our type checking, we could have called the function with nested lists, such as those that you might provide to np.array function to create a new Numpy array. For example, the following call will no longer work.

format_matrix([[1, 2], [3, 4]])

This is an important point about Python programming, that embracing the lack of strong type checking often leads to errors that can be difficult to diagnose, but implementing some type checking can make your code less flexible. We can recover some of the flexibility here by attempting to convert the argument to a Numpy array first, raising an exception if this conversion fails.

if not isinstance(matrix, np.ndarray):
    try:
        matrix = np.array(matrix)
    except Exception:
        raise TypeError("Could not convert to Numpy array")

This will mean that we can call this function with nested lists, as above, and it will work. In this case the run-time cost of converting to a Numpy array is relatively small, especially for matrices that we are likely to print into a LaTeX document. Hence our full function is now complete.

import numpy as np

def format_matrix(matrix):
    """Format a matrix using LaTeX syntax"""

    if not isinstance(matrix, np.ndarray):
        try:
            matrix = np.array(matrix)
        except Exception:
            raise TypeError("Could not convert to Numpy array")

    if len(shape := matrix.shape) == 1:
        matrix = matrix.reshape(1, shape[0])
    elif len(shape) > 2:
        raise ValueError("Array must be 2 dimensional")

    body_lines = [" & ".join(map(str, row)) for row in matrix]

    body = "\\\\\n".join(body_lines)
    return f"\\begin{{pmatrix}}\n{body}\n\\end{{pmatrix}}"

Going the extra mile

The function we have written already is functional and should be relatively easy to use, debug, and maintain in the future, even when we have forgotten how it works. Really the only thing we should have done is written a more complete documentation string. (As I mentioned earlier, we haven't done this for space.) However, there are some further improvements that we can make that will greatly improve the functionality.

A very simple improvement we can make is to allow for optionally changing the LaTeX matrix environment from pmatrix to another matrix environment such as bmatrix. We can do this by adding an optional argument to the signature of the function, and then incorporating this environment variable into the f-string at the end of the function.

def format_matrix(matrix, environment="pmatrix"):
    """Format a matrix using LaTeX syntax"""

    # -/- snip -/-

    return f"""\\begin{{{environment}}}
{body}
\\end{{{environment}}}"""

At the moment, if we call the function with a matrix containing fractions then we will get a rather bloated LaTeX formatted matrix. This is because the default behaviour for converting a floating point number to a string is to print all the decimal points. Since we have used the str function to perform this conversion, we can adapt the function rather easily to accept custom formatters for printing the matrix elements. We can again include an optional argument to allow for this customisation.

def format_matrix(matrix, environment="pmatrix", formatter=str):
    """Format a matrix using LaTeX syntax"""

    # -/- snip -/-

    body_lines = [" & ".join(map(formatter, row)) for row in matrix]

    # -/- snip -/-

This means we can truncate the number of decimal places or perform any other operation we like by supplying a custom formatting function beyond the standard str function.

All these improvements together gives us the final, finished version of the code as follows.

import numpy as np

def format_matrix(matrix, environment="pmatrix", formatter=str):
    """Format a matrix using LaTeX syntax"""

    if not isinstance(matrix, np.ndarray):
        try:
            matrix = np.array(matrix)
        except Exception:
            raise TypeError("Could not convert to Numpy array")

    if len(shape := matrix.shape) == 1:
        matrix = matrix.reshape(1, shape[0])
    elif len(shape) > 2:
        raise ValueError("Array must be 2 dimensional")

    body_lines = [" & ".join(map(formatter, row)) for row in matrix]

    body = "\\\\\n".join(body_lines)
    return f"""\\begin{{{environment}}}
{body}
\\end{{{environment}}}"""

I still think this exercise might have been a bit tricky, but there are a lot of elements involved here. Hopefully you have learned something by reading the code I have written here, and understood my reasoning for making all of these changes.