Automated Testing and Automatic documentation

Foreword

ACSE 1 Lecture Four 15th October 2020 - Version 3.0.3

# This cell sets the css styles for the rest of the notebook.
# Unless you are particlarly interested in that kind of thing,
# you can run this once, thensafely ignore it

%run add_colours.py
css_styling()

Yesterday we covered:

  • Software environments.

In this lecture we will cover:

  • Software Code Testing

  • Automated testing frameworks

  • Automatic Documentation Generation

Contact details:
  • Dr. James Percival
  • 4.85 RSM building (but not much this term)
  • email: j.percival@imperial.ac.uk
  • Teams: @Percival, James R. in acse1 & General or DM me.

By the end of this lecture you should:

  • Understand the basics of code testing and ways to automate it.

  • Understand the different sorts of test.

  • Have seen how to use Sphinx to generate software documentation from docstrings

Code Testing

When writing software, especially scientific software, a key question is whether the code is correct, and provably correct.

How do we check something for correctness? We test.

What is a test?

The simplest kind of test is the ad hoc kind you run when hacking about with code. If I have created a new function add(x, y) which adds together two numbers and returns the result, then it might be a good idea to try something like

print(add(1, 0) == 1)
print(add(1, 1) == 2)

This can’t catch every possible mode of failure. For example, an add defined function like

def add(x, y):
    return x**2+y**2

would pass both the tests given above, despite not being returning x+y.

The examples above are both examples of “testing to pass”, in that we have written a statement we expect to return true, and will be happy if it does so. One important thing to check is that operations you expect to raise an exception error actually do so (i.e. testing to fail). There are usually ways of writing a test to fail as a test to pass if needed. E.g. for a divide operator:

def divide(x, y):
    """return x divided by y."""
    return x/y


## Check we get an error when dividing by zero.
try:
    divide(1, 0)
    print(False)
except ZeroDivisionError:
    print(True)
True

By writing our tests like this we can use a statemnt like assert X to turn the test from raising an exception if when it does what we want into raising an exception if it doesn’t.

Test Driven Development

The most extreme version of this philosopy is strict Test Driven Development (TDD), sometimes also called “Red-Green” testing. Here the idea is that when implementing new code first you write a test, ensuring that it fails (red), then second you write just enough code to pass the test you’ve writen (green). Finally you refactor (i.e. rewrite) any code you believe can be improved (simplified or speeded up), while ensuring that your existing tests pass or are fixed.

This is intended to be an iterative process, so once you are done with one test, you move on to the next. Let’s have an example, pushing things as far as they can go:

Problem: Write a function that returns the repeated elements in a list.

So, first, we write a test.

assert(f([0, 1, 1]) == 1)

Does it fail on a do nothing implementation?

def f(x):
    """Return the repeated elements in a list."""
    pass


print(f([0, 1, 1]) == 1)
False

Ok, our test is failing, let’s actually write some code. A quick way to catch repeated elements is to use a set.

def f(x):
    """Return the repeated elements in a list."""
    vals = set()
    for _ in x:
        if _ in vals:
            return _
        else:
            vals.add(_)


print(f([0, 1, 1]) == 1)
True

Our test is now passing. There’s no old code to improve, so we either write another test, or move on. Let’s try having a couple of repeated elements

f([0,1,1,0]) == [0,1]
print(f([0, 1, 1]) == 1)
print(f([0,1,1,0]) == [0,1])
True
False
def f(x):
    """Return the repeated elements in a list."""
    vals = set()
    out = []
    for _ in x:
        if _ in vals:
            out.append(_)
        else:
            vals.add(_)
    return sorted(out)


print(f([0, 1, 1]) == 1)
print(f([0, 1, 1, 0]) == [0, 1])
False
True

We’ve broken the first test. But it’s actually the test which is bad, so we’ll fix it.

print(f([0, 1, 1]) == [1])
print(f([0, 1, 1, 0]) == [0,1])
True
True

Ok. Another iteration. What about if an element turns up 3 times?

print(f([0, 1, 1, 0, 0]) == [0, 1])
print(f([0, 1, 1, 0, 0]))
False
[0, 0, 1]
def f(x):
    """Return the set of repeated elements in a list."""
    vals = set()
    out = set()
    for _ in x:
        if _ in vals:
            out.add(_)
        else:
            vals.add(_)
    return out


print(f([0, 1, 1]) == set([1]))
print(f([0, 1, 1, 0]) == set([0, 1]))
print(f([0, 1, 1, 0, 0]) == set([0, 1]))
True
True
True

And so on. At some point you run out of tests to set and the code is finished.

Exercise One: TDD

Try and write your own attempt at TTD for the following problems:

  1. Write an implementation of a fizz buzz function. Fizz Buzz is a children's game, where players count up from one around a circle.
    • However, any number divisible by 3 is replaced by the word "fizz".
    • Any number divisible by 5 is replaced by the word "buzz".
    • Numbers divisible by both become "fizz buzz".
    Players who say the worng thing lose the game. Writing a program to generate the first n integers in fizz buzz encoding has sometimes been used as an interview question for programmers.

    Your function should accept an integer, $x$, and return either $x$, 'fizz', 'buzz' or 'fizz buzz' according to the rules above.


  2. Remember, in TTD first you write a test for an aspect of the function, then you write the code to solve it.

  3. Write a program to put the elements of an $n$ dimensional numpy array, $X$, into order of size.
    • The code should have output $Y$ with Y[i,...]<[j,...] for all i<j.
    • The code should have output $Y$ with Y[k,i..]<[k,j,..] for all i<j and fixed k
    • And so on.
    • Note that this means when written in the form

      [[..[x_1, .., x_n], [x_n+1, .., x_2n], .., x_N]]

      we have x_n< x_n+i for all i>0.
  4. Write a function to accept or reject a string as a candidate for a password based on the following criteria:
    • The string is between 8 and 1024 characters long.
    • The string contains a number
    • The string contains a capital letter
    • The string contains none of the following characters: @<>!

Write only enough new code to satisfy each test you write, and to fix your previous tests, before moving on to another test stage. Once the test passes, remember to have a look at your code to see if anything can be refactored.

The goal here is to concentrate on the TDD process, not on the code itself, but for completeness, model answers are available.

At this point you can hopefully see that

  1. TDD is a fine ideal for correct software engineering

  2. No sane person would write all their tests at that level of detail and in that order all the time.

However, there are still some very important ideas to pull out of the paradigm:

  1. Make sure your tests can fail if things go wrong.

  2. Consider both the usual and the (obvious) corner cases.

  3. Each bug you fix is a missing test you might need to add.

Ways to implement and run tests

Lets have a slighly more concrete example. We’ll look for solutions to the quadratic equation $\( ax^2 + bx + c =0,\)\( where \)x\( is a real number. The formula for solutions is \)\( x=\frac{-b \pm \sqrt{b^2-4ac}}{2a}.\)$ Now to write some code in a module file

import numpy as np # for the sqrt function

def solve_quadratic(a,b,c):
    """Solve a quadratic equation ax^2+bx+c=0 in the reals"""
    if 0<b**2-4.0*a*c:
        # two solutions
        return ((-b-np.sqrt(b**2-4.0*a*c))/(2.0*a),(-b+np.sqrt(b**2-4.0*a*c))/(2.0*a))
    elif 0==b**2-4.0*a*c:
        # one solution
        return -b/(2.0*a)
    else:
        # no solutions
        return None

We can try some ad hoc tests, like we might run like coding the function:

# solve x^2-1=0
solve_quadratic(1, 0, -1)==(-1.0, 1.0)
True
solve_quadratic(1, 0, 0)==(0.0)
True
solve_quadratic(1, 0, 1) is None
True

On the other hand, humans are lazy and forgetful, so we want to make running tests when the code changes as easy as possible. We could just roll the tests into a single function, and use the assert statement, so that it throws an exception if anything goes wrong.

def test_solve_quadratic():
    
    assert solve_quadratic(1,0,-1)==(-1.0,1.0)
    assert solve_quadratic(1,0,0)==(0.0)
    assert solve_quadratic(1,0,1) is None
    
    return 'Tests pass'
    
test_solve_quadratic()
'Tests pass'
test_solve_quadratic()
'Tests pass'

but we can do better. There are a number of ways of automating the testing process, so that it “just happpens” without requring the programmer to do things

The doctest module

The module doctest, from the standard Python library, provides a simple way to include code which is both a test and documentation of an example of the use of your code.

To write a test, one simply copies the input and output that one would see in the vanilla python interpreter pretty much identically into a docstring, whether for a function or module.

docstring_test.py:

import doctest

def mean(x):
    """Mean of a list of numbers.
    
    >>> mean([1, 5, 9)
    5
    
    """
    return sum(x)/len(x)

if __name__ == "__main__":
    import doctest
    doctest.testmod()

In this case we can run the test by calling the module as a script:

python3 -m docstring_test

If the test suceeds it silently returns a successful exit code. If the test fails (e.g. we replace the out put of 5.0 in the example) then an error message is printed, looking like the following:

**********************************************************************
File "docstring_test.py", line 6, in __main__.mean
Failed example:
    mean([1, 5, 9])
Expected:
    3.0
Got:
    5.0
**********************************************************************
1 items had failures:
   1 of   1 in __main__.mean
***Test Failed*** 1 failures.

Doctests can also be run in Python on plain text files as

import doctest
doctest.testfile("example.txt")

or from the command line as

python -m doctest example.txt

In fact, you can use the same syntax to skip cluttering up your code with the if __name__ == "__main__": block in your python modules.

Exercise Two: doctest

Write some tests using the doctest module inside your own module or script.

  1. First write some tests which pass.
  2. Next write some tests which do not pass.
  3. Swap your work with another student (why not use github?).
  4. Fix their failing tests, either by editing the code or changing the tests.

You can use some of the modules you wrote earlier in the course, or else write some new code.

The unittest module

It’s still useful to automate things a bit more, so that we can . Python provides an inbuilt unittest module, which (with some work) can be used to build a test framework. It introduces the basic concept of the three stages of a test:

  1. Set up. We create anything which must already exist for a test to make sense.

  2. Running. The test itself operates

  3. Tear down. We clean up anything which won’t get dealt with automatically

The file unittest_example.py contains the following code

import unittest

import numpy as np # for the sqrt function

def solve_quadratic(a,b,c):
    """Solve a quadratic equation ax^2+bx+c=0 in the reals"""
    if 0<b**2-4.0*a*c:
        # two solutions
        return ((-b-np.sqrt(b**2-4.0*a*c))/(2.0*a),(-b+np.sqrt(b**2-4.0*a*c))/(2.0*a))
    elif 0==b**2-4.0*a*c:
        # one solution
        return -b/(2.0*a)
    else:
        # no solutions
        return None

class TestSolveQuadratic(unittest.TestCase):
    def test_solve_quadratic(self):
    
        self.assertEqual(foo.solve_quadratic(1,0,-1), (-1.0,1.0))
        self.assertEqual(foo.solve_quadratic(1,0,0), 0.0 )
        self.assertEqual(foo.solve_quadratic(1,0,1), None)
   
unittest.main()

Running the test using the syntax

python3 unittest_example.py

gives the output

.
----------------------------------------------------------------------
Ran 1 test in 0.000s

OK

Exercise Three: unittest

Try this yourself by writing a unittest using the unittest module. You can start from some of the code you wrote for the introductory exercises, or earlier in the week if you want. Try breaking and then fixing the test.

Pytest

The pytest package (not included in a default Python installation) simplifies the actions of writing tests even further, as well as providing a more informative interface. Pytest can be installed with

pip install pytest

which also adds a tool of the same name which can be run from the command line. This tool can be used to run both doctests, (add the --doctest-modules) and unit tests based on the unit test module (just leave out the unittest.main()), as well as tests in its own format.

We have created a GitHub repository in which the file pytest_example.py contains the following code


def test_solve_quadratic():
    
    assert foo.solve_quadratic(1,0,-1)==(-1.0,1.0)
    assert foo.solve_quadratic(1,0,0)==0.0
    assert foo.solve_quadratic(1,0,1) is None
    

while the foo.py module contains our solve_quadratic function. We can run the pytest tests as well as the others using the following syntax:

python -m pytest --doctest-modules pytest_example.py docstring_test.py unittest_example.py

Exercise Four: Pytest

  • Clone the repository using git, install pytest using pip or conda and run the tests.
  • Try breaking some of the tests by editting the .py files.
  • Write your own pytest test for some of your code.

Code coverage.

In general, it is best practice to ensure that your tests exercise every line of your code at least once. This is especially true for interpretted languages like Python, which only checks the syntax of a file (i.e its “grammar”) when it is first loaded, but doesn’t check that the meaning makes sense.

That means you can write code like the following in a module file and it will load without error

example.py:

def f(x):
    return y

but trying to actually run it will error out, in this case with a NameError.

import example
example.f(1)

returns

NameError: name 'y' is not defined

There is a Python package, pytest-cov, which adds coverage information to pytest. Once installed with pip, coverage information can be collected for a package or module called mycoolproject by calling pytest as

python -m pytest --cov=mycoolproject .

This will generate output like

---------- coverage: platform darwin, python 3.6.8-final-0 -----------
Name                               Stmts   Miss  Cover
------------------------------------------------------
mycoolproject/__init__.py              4      0   100%
mycoolproject/analysis.py             33     24    27%
mycoolproject/live.py                 58     40    31%
mycoolproject/tests/test_example.py   11      1    91%
mycoolproject/validator.py            11      0   100%
------------------------------------------------------
TOTAL                                117     65    56%

Note that while 100% code coverage is useful, it only means that the code has been run, not that every possible code logic branch has been tested. It is a useful, but neither necessary nor sufficient clue to healthy code.

The many sorts of test

When developing software, opportunities (and often the need) to test appear at many different levels, and these families have developed their own names.

Unit tests

Unit tests are small and (ideally) quick tests which verify the behaviour of a single programming “unit” (i.e. a module or function). Thus they ensure that the unit satisfies the “contract” it makes in isolation.

Since the function is tested by itself, rather than on the job, inputs must be “mocked up” (i.e. hardcoded at compile time, or replaced with trivial substitutes) to generate known outputs and external dependencies may be replaced with “test stubs” which short circuit to quickly give the information needed by the unit being tested. The unittest module defines a useful class for this, called unittest.Mock. It also provides some other helpful functions.

Integration tests

Tests which combine multiple program units together and confirm that the interaction proceeds as expected. This might mean chaining multiple functions you write together, or using the real external dependencies rather than the fakes you generated for your unit tests. Integration tests tend exercise more code at once, but are slower to run and can be difficult to write effectively

Feature/Functionality tests

Tests which confirm an entire feature is working successfully as a whole, effectively interacting with your software as a user would. For numerical codes, this often involves analytic solutions and/or using simple “toy” problems which are well understood and have nice solutions. For more general software, it might even involve testing human interaction to make sure that GUIs are clear and robust.

Regression tests

Tests which check that fixed bugs stay fixed. I.e. that introducing a new change does not break the existing functionality. The tests may be at the level of the unit, integration or features. If bug reports have a minimal example attached, then this can provide the basis for useful examples for regression testing (this is a form of “red-green” testing similar to test driven development).

Automatic documentation

Documentation is written for three broad groups of people:

  1. Users (“I don’t care how it does it, I care what it does.”)

  2. User-Developers (“I build what I need.”)

  3. Developers (“I make it, but I don’t use it.”)

These individuals have different needs from the documentation you provide to them. Users want to know what software can do, and how to run it for their own problems. Developers need to understand how code works, but not necessarily what it’s intended for in a wider sense. User-developers need to understand both sides. Commercial software is generally written by pure developers, but a lot of scientific software (and many other small projects) is written by user-developers.

For all groups, the most important piece of documentation in Python is the docstring, since it remains with the code it applies to, and connects to the Python online help. For developers & user-developers, additional comments in the body of the source text may also be useful, however users will probably never choose to look there.

Although it is now generally acknowledged that the best place for living documentation is in the source code, near where it applies, since this gives the best probability of developers updating it when they changes, it is still useful to maintain a proper (electronic) manual or write-up0, both for ease of reference, and to give a general project overview.

Several tools have been developed to close this gap by automatically collating comments and function call signatures from the source code and converting it into a human readable document to label as a “manual”. Perhaps the most famous open source, cross-language documentation tool is Doxygen. However we’ll look further at a tool called Sphinx, which originated in the Python community and is the tool of choice on several large Python projects including SciPy, Django and Python itself

Sphinx

You can install the core Sphinx documentation generation tools with the command

pip install sphinx

Sphinx works by converting reStructured text, whether inside docstrings or in special files into HTML pages or PDF files.

The simplest use pattern is probably to use the automatic scanning tools to collect together all the docstrings into an index of APIs.

This requires creating two files, the first containing the configuration options for the Sphinx toosl, and the second containing a skeleton reStructured text file into which to slot the generated documentation:

docs/conf.py:

import sys
import os

## We're working in the ./docs directory, but need the package root in the path
## This command appends the directory one level up, in a cross-platform way. 
sys.path.insert(0, os.path.abspath(os.sep.join((os.curdir, '..'))))

project = 'MyCoolProject'
extensions = ['sphinx.ext.autodoc',
              'sphinx.ext.napoleon',
              'sphinx.ext.mathjax']
source_suffix = '.rst'
master_doc = 'index'
exclude_patterns = ['_build']
autoclass_content = "both"

docs/index.rst:

#############
mycoolproject
#############


A heading
---------

This is just example text, perhaps with mathematics like :math:`x^2`, **bold text** and *italics*.
It might also include citations [1]_ inline references to functions like :func:`my_func` or even whole code blocks::

    def my_example(a, b):
        """Do something!"""
        return a**b+1

.. automodule:: mycoolproject
  :members:
  
  
.. rubric:: References
[1] http://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#citations

With this setup we can build a html version of the documentation with the command

sphinx-build -b html docs docs/html
Warning

Microsoft Windows uses \ as the separator symbol between levels in the directory tree. Meanwhile Linux and Mac OSX use /. This makes it almost impossible for human-readable notes to give paths that work on both sets of computers (it’s a lot easier for Python code, just use os.sep from the os module. In this section I use the *nix standard of / in the write-up, to match the Sphinx documentation. Please remember to convert in your head on a Windows computer.

If this is successful, you should be able to open ./docs/html/index.html to see documentation automatically generated from the docstrings in your project. Sphinx also supports other output formats (for example LaTeX) with the -b flag. A recipe to generate a pdf manual on a suitably configured system is

sphinx-build -b latex docs .
pdflatex MyCoolProject.tex
pdflatex MyCoolProject.tex

which will generate MyCoolProject.pdf. We run LaTeX twice to ensure that references and citations (including the index) are set correctly.

Exercise Five: Autodocumenting your module

  • Use pip or conda to install sphinx on your computer.
  • Create a docs directory inside your module and add conf.py and index.rst files based on the ones given above.
  • Run sphinx-build to generate html documentation for your project.
  • Try editting the index.rst file to add more text.

In this lecture we learned

  • How to write a formal test
  • The basics of Test Driven Development
  • To use automated testing using testing frameworks and py.test
  • Documentation via Sphinx

Next week: The Cloud, Continous Integration and Pandas

Further Reading