Automated Testing and Automatic documentation¶
Foreword¶
ACSE 1 Lecture Four 15th October 2020 - Version 3.0.3¶
# This cell sets the css styles for the rest of the notebook.
# Unless you are particlarly interested in that kind of thing,
# you can run this once, thensafely ignore it
%run add_colours.py
css_styling()
Yesterday we covered:
Software environments.
In this lecture we will cover:
Software Code Testing
Automated testing frameworks
Automatic Documentation Generation
Contact details:
- Dr. James Percival
- 4.85 RSM building (but not much this term)
- email: j.percival@imperial.ac.uk
- Teams:
@Percival, James R.
inacse1
&General
or DM me.
By the end of this lecture you should:¶
Understand the basics of code testing and ways to automate it.
Understand the different sorts of test.
Have seen how to use Sphinx to generate software documentation from docstrings
Code Testing¶
When writing software, especially scientific software, a key question is whether the code is correct, and provably correct.
How do we check something for correctness? We test.
What is a test?¶
The simplest kind of test is the ad hoc kind you run when hacking about with code. If I have created a new function add(x, y)
which adds together two numbers and returns the result, then it might be a good idea to try something like
print(add(1, 0) == 1)
print(add(1, 1) == 2)
This can’t catch every possible mode of failure. For example, an add
defined function like
def add(x, y):
return x**2+y**2
would pass both the tests given above, despite not being returning x+y
.
The examples above are both examples of “testing to pass”, in that we have written a statement we expect to return true, and will be happy if it does so. One important thing to check is that operations you expect to raise an exception error actually do so (i.e. testing to fail). There are usually ways of writing a test to fail as a test to pass if needed. E.g. for a divide
operator:
def divide(x, y):
"""return x divided by y."""
return x/y
## Check we get an error when dividing by zero.
try:
divide(1, 0)
print(False)
except ZeroDivisionError:
print(True)
True
By writing our tests like this we can use a statemnt like assert X
to turn the test from raising an exception if when it does what we want into raising an exception if it doesn’t.
Test Driven Development¶
The most extreme version of this philosopy is strict Test Driven Development (TDD), sometimes also called “Red-Green” testing. Here the idea is that when implementing new code first you write a test, ensuring that it fails (red), then second you write just enough code to pass the test you’ve writen (green). Finally you refactor (i.e. rewrite) any code you believe can be improved (simplified or speeded up), while ensuring that your existing tests pass or are fixed.
This is intended to be an iterative process, so once you are done with one test, you move on to the next. Let’s have an example, pushing things as far as they can go:
Problem: Write a function that returns the repeated elements in a list.
So, first, we write a test.
assert(f([0, 1, 1]) == 1)
Does it fail on a do nothing implementation?
def f(x):
"""Return the repeated elements in a list."""
pass
print(f([0, 1, 1]) == 1)
False
Ok, our test is failing, let’s actually write some code. A quick way to catch repeated elements is to use a set.
def f(x):
"""Return the repeated elements in a list."""
vals = set()
for _ in x:
if _ in vals:
return _
else:
vals.add(_)
print(f([0, 1, 1]) == 1)
True
Our test is now passing. There’s no old code to improve, so we either write another test, or move on. Let’s try having a couple of repeated elements
f([0,1,1,0]) == [0,1]
print(f([0, 1, 1]) == 1)
print(f([0,1,1,0]) == [0,1])
True
False
def f(x):
"""Return the repeated elements in a list."""
vals = set()
out = []
for _ in x:
if _ in vals:
out.append(_)
else:
vals.add(_)
return sorted(out)
print(f([0, 1, 1]) == 1)
print(f([0, 1, 1, 0]) == [0, 1])
False
True
We’ve broken the first test. But it’s actually the test which is bad, so we’ll fix it.
print(f([0, 1, 1]) == [1])
print(f([0, 1, 1, 0]) == [0,1])
True
True
Ok. Another iteration. What about if an element turns up 3 times?
print(f([0, 1, 1, 0, 0]) == [0, 1])
print(f([0, 1, 1, 0, 0]))
False
[0, 0, 1]
def f(x):
"""Return the set of repeated elements in a list."""
vals = set()
out = set()
for _ in x:
if _ in vals:
out.add(_)
else:
vals.add(_)
return out
print(f([0, 1, 1]) == set([1]))
print(f([0, 1, 1, 0]) == set([0, 1]))
print(f([0, 1, 1, 0, 0]) == set([0, 1]))
True
True
True
And so on. At some point you run out of tests to set and the code is finished.
Exercise One: TDD
Try and write your own attempt at TTD for the following problems:
- Write an implementation of a fizz buzz function. Fizz Buzz is a children's game, where players count up from one around a circle.
- However, any number divisible by 3 is replaced by the word "fizz".
- Any number divisible by 5 is replaced by the word "buzz".
- Numbers divisible by both become "fizz buzz".
n integers in fizz buzz encoding has sometimes been used as an interview question for programmers.Your function should accept an integer, $x$, and return either $x$,
'fizz'
,'buzz'
or'fizz buzz'
according to the rules above. - Write a program to put the elements of an $n$ dimensional
numpy
array, $X$, into order of size.- The code should have output $Y$ with
Y[i,...]<[j,...]
for alli<j
. - The code should have output $Y$ with
Y[k,i..]<[k,j,..]
for alli<j
and fixed k - And so on.
- Note that this means when written in the form
we have[[..[x_1, .., x_n], [x_n+1, .., x_2n], .., x_N]]
x_n< x_n+i
for alli>0
.
- The code should have output $Y$ with
- Write a function to accept or reject a string as a candidate for a password based on the following criteria:
- The string is between 8 and 1024 characters long.
- The string contains a number
- The string contains a capital letter
- The string contains none of the following characters:
@<>!
Remember, in TTD
Write only enough new code to satisfy each test you write, and to fix your previous tests, before moving on to another test stage. Once the test passes, remember to have a look at your code to see if anything can be refactored.
The goal here is to concentrate on the TDD process, not on the code itself, but for completeness, model answers are available.
At this point you can hopefully see that
TDD is a fine ideal for correct software engineering
No sane person would write all their tests at that level of detail and in that order all the time.
However, there are still some very important ideas to pull out of the paradigm:
Make sure your tests can fail if things go wrong.
Consider both the usual and the (obvious) corner cases.
Each bug you fix is a missing test you might need to add.
Ways to implement and run tests¶
Lets have a slighly more concrete example. We’ll look for solutions to the quadratic equation $\( ax^2 + bx + c =0,\)\( where \)x\( is a real number. The formula for solutions is \)\( x=\frac{-b \pm \sqrt{b^2-4ac}}{2a}.\)$ Now to write some code in a module file
import numpy as np # for the sqrt function
def solve_quadratic(a,b,c):
"""Solve a quadratic equation ax^2+bx+c=0 in the reals"""
if 0<b**2-4.0*a*c:
# two solutions
return ((-b-np.sqrt(b**2-4.0*a*c))/(2.0*a),(-b+np.sqrt(b**2-4.0*a*c))/(2.0*a))
elif 0==b**2-4.0*a*c:
# one solution
return -b/(2.0*a)
else:
# no solutions
return None
We can try some ad hoc tests, like we might run like coding the function:
# solve x^2-1=0
solve_quadratic(1, 0, -1)==(-1.0, 1.0)
True
solve_quadratic(1, 0, 0)==(0.0)
True
solve_quadratic(1, 0, 1) is None
True
On the other hand, humans are lazy and forgetful, so we want to make running tests when the code changes as easy as possible. We could just roll the tests into a single function, and use the assert
statement, so that it throws an exception if anything goes wrong.
def test_solve_quadratic():
assert solve_quadratic(1,0,-1)==(-1.0,1.0)
assert solve_quadratic(1,0,0)==(0.0)
assert solve_quadratic(1,0,1) is None
return 'Tests pass'
test_solve_quadratic()
'Tests pass'
test_solve_quadratic()
'Tests pass'
but we can do better. There are a number of ways of automating the testing process, so that it “just happpens” without requring the programmer to do things
The doctest
module¶
The module doctest, from the standard Python library, provides a simple way to include code which is both a test and documentation of an example of the use of your code.
To write a test, one simply copies the input and output that one would see in the vanilla python interpreter pretty much identically into a docstring, whether for a function or module.
docstring_test.py:
import doctest
def mean(x):
"""Mean of a list of numbers.
>>> mean([1, 5, 9)
5
"""
return sum(x)/len(x)
if __name__ == "__main__":
import doctest
doctest.testmod()
In this case we can run the test by calling the module as a script:
python3 -m docstring_test
If the test suceeds it silently returns a successful exit code. If the test fails (e.g. we replace the out put of 5.0 in the example) then an error message is printed, looking like the following:
**********************************************************************
File "docstring_test.py", line 6, in __main__.mean
Failed example:
mean([1, 5, 9])
Expected:
3.0
Got:
5.0
**********************************************************************
1 items had failures:
1 of 1 in __main__.mean
***Test Failed*** 1 failures.
Doctests can also be run in Python on plain text files as
import doctest
doctest.testfile("example.txt")
or from the command line as
python -m doctest example.txt
In fact, you can use the same syntax to skip cluttering up your code with the if __name__ == "__main__":
block in your python modules.
Exercise Two: doctest
Write some tests using the doctest module inside your own module or script.
- First write some tests which pass.
- Next write some tests which do not pass.
- Swap your work with another student (why not use github?).
- Fix their failing tests, either by editing the code or changing the tests.
You can use some of the modules you wrote earlier in the course, or else write some new code.
The unittest
module¶
It’s still useful to automate things a bit more, so that we can . Python provides an inbuilt unittest
module, which (with some work) can be used to build a test framework. It introduces the basic concept of the three stages of a test:
Set up. We create anything which must already exist for a test to make sense.
Running. The test itself operates
Tear down. We clean up anything which won’t get dealt with automatically
The file unittest_example.py
contains the following code
import unittest
import numpy as np # for the sqrt function
def solve_quadratic(a,b,c):
"""Solve a quadratic equation ax^2+bx+c=0 in the reals"""
if 0<b**2-4.0*a*c:
# two solutions
return ((-b-np.sqrt(b**2-4.0*a*c))/(2.0*a),(-b+np.sqrt(b**2-4.0*a*c))/(2.0*a))
elif 0==b**2-4.0*a*c:
# one solution
return -b/(2.0*a)
else:
# no solutions
return None
class TestSolveQuadratic(unittest.TestCase):
def test_solve_quadratic(self):
self.assertEqual(foo.solve_quadratic(1,0,-1), (-1.0,1.0))
self.assertEqual(foo.solve_quadratic(1,0,0), 0.0 )
self.assertEqual(foo.solve_quadratic(1,0,1), None)
unittest.main()
Running the test using the syntax
python3 unittest_example.py
gives the output
.
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK
Exercise Three: unittest
Try this yourself by writing a unittest using the unittest
module. You can start from some of the code you wrote for the introductory exercises, or earlier in the week if you want. Try breaking and then fixing the test.
Pytest¶
The pytest
package (not included in a default Python installation) simplifies the actions of writing tests even further, as well as providing a more informative interface. Pytest can be installed with
pip install pytest
which also adds a tool of the same name which can be run from the command line. This tool can be used to run both doctest
s, (add the --doctest-modules
) and unit tests based on the unit test module (just leave out the unittest.main()
), as well as tests in its own format.
We have created a GitHub repository in which the file pytest_example.py
contains the following code
def test_solve_quadratic():
assert foo.solve_quadratic(1,0,-1)==(-1.0,1.0)
assert foo.solve_quadratic(1,0,0)==0.0
assert foo.solve_quadratic(1,0,1) is None
while the foo.py
module contains our solve_quadratic
function. We can run the pytest tests as well as the others using the following syntax:
python -m pytest --doctest-modules pytest_example.py docstring_test.py unittest_example.py
Exercise Four: Pytest
- Clone the repository using git, install
pytest
usingpip
orconda
and run the tests. - Try breaking some of the tests by editting the
.py
files. - Write your own pytest test for some of your code.
Code coverage.¶
In general, it is best practice to ensure that your tests exercise every line of your code at least once. This is especially true for interpretted languages like Python, which only checks the syntax of a file (i.e its “grammar”) when it is first loaded, but doesn’t check that the meaning makes sense.
That means you can write code like the following in a module file and it will load without error
example.py:
def f(x):
return y
but trying to actually run it will error out, in this case with a NameError.
import example
example.f(1)
returns
NameError: name 'y' is not defined
There is a Python package, pytest-cov
, which adds coverage information to pytest
. Once installed with pip
, coverage information can be collected for a package or module called mycoolproject
by calling pytest
as
python -m pytest --cov=mycoolproject .
This will generate output like
---------- coverage: platform darwin, python 3.6.8-final-0 -----------
Name Stmts Miss Cover
------------------------------------------------------
mycoolproject/__init__.py 4 0 100%
mycoolproject/analysis.py 33 24 27%
mycoolproject/live.py 58 40 31%
mycoolproject/tests/test_example.py 11 1 91%
mycoolproject/validator.py 11 0 100%
------------------------------------------------------
TOTAL 117 65 56%
Note that while 100% code coverage is useful, it only means that the code has been run, not that every possible code logic branch has been tested. It is a useful, but neither necessary nor sufficient clue to healthy code.
The many sorts of test¶
When developing software, opportunities (and often the need) to test appear at many different levels, and these families have developed their own names.
Unit tests¶
Unit tests are small and (ideally) quick tests which verify the behaviour of a single programming “unit” (i.e. a module or function). Thus they ensure that the unit satisfies the “contract” it makes in isolation.
Since the function is tested by itself, rather than on the job, inputs must be “mocked up” (i.e. hardcoded at compile time, or replaced with trivial substitutes) to generate known outputs and external dependencies may be replaced with “test stubs” which short circuit to quickly give the information needed by the unit being tested. The unittest
module defines a useful class for this, called unittest.Mock
. It also provides some other helpful functions.
Integration tests¶
Tests which combine multiple program units together and confirm that the interaction proceeds as expected. This might mean chaining multiple functions you write together, or using the real external dependencies rather than the fakes you generated for your unit tests. Integration tests tend exercise more code at once, but are slower to run and can be difficult to write effectively
Feature/Functionality tests¶
Tests which confirm an entire feature is working successfully as a whole, effectively interacting with your software as a user would. For numerical codes, this often involves analytic solutions and/or using simple “toy” problems which are well understood and have nice solutions. For more general software, it might even involve testing human interaction to make sure that GUIs are clear and robust.
Regression tests¶
Tests which check that fixed bugs stay fixed. I.e. that introducing a new change does not break the existing functionality. The tests may be at the level of the unit, integration or features. If bug reports have a minimal example attached, then this can provide the basis for useful examples for regression testing (this is a form of “red-green” testing similar to test driven development).
Automatic documentation¶
Documentation is written for three broad groups of people:
Users (“I don’t care how it does it, I care what it does.”)
User-Developers (“I build what I need.”)
Developers (“I make it, but I don’t use it.”)
These individuals have different needs from the documentation you provide to them. Users want to know what software can do, and how to run it for their own problems. Developers need to understand how code works, but not necessarily what it’s intended for in a wider sense. User-developers need to understand both sides. Commercial software is generally written by pure developers, but a lot of scientific software (and many other small projects) is written by user-developers.
For all groups, the most important piece of documentation in Python is the docstring, since it remains with the code it applies to, and connects to the Python online help. For developers & user-developers, additional comments in the body of the source text may also be useful, however users will probably never choose to look there.
Although it is now generally acknowledged that the best place for living documentation is in the source code, near where it applies, since this gives the best probability of developers updating it when they changes, it is still useful to maintain a proper (electronic) manual or write-up0, both for ease of reference, and to give a general project overview.
Several tools have been developed to close this gap by automatically collating comments and function call signatures from the source code and converting it into a human readable document to label as a “manual”. Perhaps the most famous open source, cross-language documentation tool is Doxygen. However we’ll look further at a tool called Sphinx, which originated in the Python community and is the tool of choice on several large Python projects including SciPy, Django and Python itself
Sphinx¶
You can install the core Sphinx documentation generation tools with the command
pip install sphinx
Sphinx works by converting reStructured text, whether inside docstrings or in special files into HTML pages or PDF files.
The simplest use pattern is probably to use the automatic scanning tools to collect together all the docstrings into an index of APIs.
This requires creating two files, the first containing the configuration options for the Sphinx toosl, and the second containing a skeleton reStructured text file into which to slot the generated documentation:
docs/conf.py
:
import sys
import os
## We're working in the ./docs directory, but need the package root in the path
## This command appends the directory one level up, in a cross-platform way.
sys.path.insert(0, os.path.abspath(os.sep.join((os.curdir, '..'))))
project = 'MyCoolProject'
extensions = ['sphinx.ext.autodoc',
'sphinx.ext.napoleon',
'sphinx.ext.mathjax']
source_suffix = '.rst'
master_doc = 'index'
exclude_patterns = ['_build']
autoclass_content = "both"
docs/index.rst
:
#############
mycoolproject
#############
A heading
---------
This is just example text, perhaps with mathematics like :math:`x^2`, **bold text** and *italics*.
It might also include citations [1]_ inline references to functions like :func:`my_func` or even whole code blocks::
def my_example(a, b):
"""Do something!"""
return a**b+1
.. automodule:: mycoolproject
:members:
.. rubric:: References
[1] http://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#citations
With this setup we can build a html
version of the documentation with the command
sphinx-build -b html docs docs/html
Warning
Microsoft Windows uses \
as the separator symbol between levels in the directory tree. Meanwhile Linux and Mac OSX use /
. This makes it almost impossible for human-readable notes to give paths that work on both sets of computers (it’s a lot easier for Python code, just use os.sep
from the os
module. In this section I use the *nix standard of /
in the write-up, to match the Sphinx documentation. Please remember to convert in your head on a Windows computer.
If this is successful, you should be able to open ./docs/html/index.html
to see documentation automatically generated from the docstrings in your project. Sphinx also supports other output formats (for example LaTeX) with the -b
flag. A recipe to generate a pdf
manual on a suitably configured system is
sphinx-build -b latex docs .
pdflatex MyCoolProject.tex
pdflatex MyCoolProject.tex
which will generate MyCoolProject.pdf
. We run LaTeX twice to ensure that references and citations (including the index) are set correctly.
Exercise Five: Autodocumenting your module
- Use
pip
orconda
to installsphinx
on your computer. - Create a
docs
directory inside your module and addconf.py
andindex.rst
files based on the ones given above. - Run
sphinx-build
to generatehtml
documentation for your project. - Try editting the
index.rst
file to add more text.
In this lecture we learned
- How to write a formal test
- The basics of Test Driven Development
- To use automated testing using testing frameworks and py.test
- Documentation via Sphinx
Next week: The Cloud, Continous Integration and Pandas
Further Reading¶
The Pytest documentation.
A software carpentry lecture on testing.
The Python documentation page on venv
A much fuller Sphinx tutorial