Continuous Integration¶
Aims:¶
Become familiar with the basic concepts of continuous integration (CI)
Gain basic knowledge of various existing CI frameworks
Become familiar with the basic use of Github Actions, including:
Setting up basic testing workflows
Utilizing actions within a workflow
Contact details:
Dr. Rhodri Nelson
Room 4.85 RSM building
email: rnelson@ic.ac.uk
Teams: @Nelson, Rhodri B in #ACSE1 or #General, or DM me.
What is Continuous integration?¶
In software development, Continuous Integration (CI) is the practice of developers integrating code into a shared repository several times a day. Each ‘push’ to the repository is then verified by automated testing tools, allowing any problems to be detected early.
And, from Wikipedia:
In software engineering, continuous integration (CI) is the practice of merging all developers’ working copies to a shared mainline several times a day. Grady Booch first proposed the term CI in his 1991 method, although he did not advocate integrating several times a day. Extreme programming (XP) adopted the concept of CI and did advocate integrating more than once per day – perhaps as many as tens of times per day.
Automated Continuous integration¶
Once you have tests, you want to make sure that any public changes to the code don’t break them. For code held under version control, this means that you ideally want to rerun your test suite on all new public merges, to ensure that the code installs and runs successfully on a clean system. Continous integration (CI) and Continuous Delivery (CD) frameworks tie together version control repositories and test suites so that this happens automatically whenever a commit to production code (i.e. your master branch) is proposed, or even whenever any change is commited anywhere in the repository.
The goal is to first ensure that the production branch always works, so that pure users can trust they will always be able to use it securely. Meanwhile, developers can feel safe in repeatedly merging changes to and from the master version and knowing whether their code works or not. At its furthest, under the doctrine of Extreme Programming (XP), this might mean work can merged multiple times a day by every developer among moderately sized teams.
XP workflows stress the following points:
Merge back to master frequently (i.e keep feature changes small and atomic )
Automate the software build
Automate the testing framework
Some CI solutions¶
As with test frameworks, a large number of CI/CD solutions exist, e.g.
Travis CI is a hosted continuous integration service used to build and test software projects hosted at GitHub and Bitbucket.
Jenkins, a free, open source CI tool which you can run on your own machine. Popular with Java developers.
Buildbot - Another self-run CI tool, written and configured with Python.
Azure Pipelines - A cloud system like Travis, this time run by Microsoft
and of course, the main CI pipeline we will learn about today
Github Actions, which allows building continuous integration and continuous deployment pipelines for testing, releasing and deploying software without the use of third-party websites/platforms
All the above CI solutions require their own setup files, which have many of the same concepts, but different syntax. For example, Azure Pipelines uses a file called azure-pipelines.yml
, which looks like
trigger:
- master
pool:
vmImage: 'ubuntu-latest'
steps:
# set the python version to use
- task: UsePythonVersion@0
displayName:
inputs:
versionSpec: '3.6'
# Script sections run bash scripts
- script: |
python -m pip install --upgrade pip setuptools wheel
pip install pytest pytest-cov
pip install -r requirements.txt
displayName: 'Install dependencies'
# here's the actual test
- script: PYTHONPATH=$(System.DefaultWorkingDirectory) pytest quaternions_acse --junitxml=junit/test-results.xml --cov=quaternions_acse --cov-report=xml --cov-report=html
displayName: 'Run pytest'
As you will see, this is very similar to the Github Actions .yml
files we will explore shortly, although the syntax and available functionality will of course vary.
In general, which CI framework to use is rarely a true choice. Businesses and existing projects will have picked and offer support for one, and expect you to play nicely with it. Meanwhile for personal projects people want minimum effort, minimum cost solutions. On these two metrics Github Actions scores very highly.
It is also worth noting that Github Actions is a relatively new CI solution, having been added to Github in 2019. It is therefore undergoing continuous development and new features are being added and existing ones enhanced regularly.
Github Actions¶
To begin familiarizing ourselves with Actions, let us dive straight into an example. For this we will make use of a dummy repository located at https://github.com/rhodrin/ci_acse1. As in the environments lecture last week, start by forking this repository so that you have your own copy within your Github user space. Following this, make a clone of the forked repository:
git clone https://github.com/<my username>/ci_acse1.git
Now, lets set up a ‘Conda’ environment for this simple_functions
package and install it:
cd ci_acse1
# Create the 'ci' environment
conda env create -f environment.yml
# Activate the environment
conda activate ci
# Install the package
pip install -e .
Lets qickly review the contents of the base folder:
environment.yml - The Anaconda environment file.
LICENSE - The license file, here the MIT license.
README.md
requirements.txt
setup.py
simple_functions/ - Folder containing the ‘main’ code of this package.
tests/ - Folder containing the tests.
Notice also the folder(s):
.github/worfklows/ - Default folder placing Github Actions workflows. The folder currently contains two
.yml
files,flake8.yml
andpytest-unit-tests.yml
.
Github provides a range of virtual machines (such as those you’ve encountered during the Azure lectures) to execute workflows automatically. As stated on their webpages:
GitHub offers hosted virtual machines to run workflows. The virtual machine contains an environment of tools, packages, and settings available for GitHub Actions to use.
and
A GitHub-hosted runner is a virtual machine hosted by GitHub with the GitHub Actions runner application installed. GitHub offers runners with Linux, Windows, and macOS operating systems.
On the webpage of our github repository, notice that if we click on the Actions
tab we see something like the following:
This tab provides us with a summary of which workflows are being executed (here Flake8
and CI-unit-tests
), when they were last executed and whether they produced a pass (green tick) or a fail (red cross).
Before we explore how these are created and what they are doing lets take a quick loot at our simple_functions
and the related tests.
Currently, wthin simple_functions/functions1.py
is the following code
__all__ = ['my_sum']
def my_sum(iterable):
tot = 0
for i in iterable:
tot += i
return tot
that is, just one trivial function. In the tests
folder there is currently just one file, test_simple_functions.py
, containing the following code:
import pytest
from simple_functions import my_sum
class TestSimpleFunctions(object):
'''Class to test our simple functions are working correctly'''
@pytest.mark.parametrize('iterable, expected', [
([8, 7, 5], 20),
((10, -2, 5, -10, 1), 4)
])
def test_my_add(self, iterable, expected):
'''Test our add function'''
isum = my_sum(iterable)
assert isum == expected
Throughout the course of this lecture, we will expand on these functions and tests, and in the process expand our CI through adding further Gihub Actions workflows.
So now, lets look at our Github Actions workflows and what they are doing. Although mentioned above, it is worth re-emphasizing that in order to execute actions workflows for any Github repository, all that is required is for one or more valid workflows to be placed in the folder .github/workflows/
(and then pushed to Github).
If we look at the contents of flake8.yml
it reads:
name: Flake8
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
pull_request:
branches:
- main
jobs:
flake8:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.7
uses: actions/setup-python@v1
with:
python-version: 3.7
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install flake8
- name: Lint with flake8
run: |
flake8 --builtins=ArgumentError .
Lets analyse each segment of the script.
name: Flake8
This is simply the name of the action (recall that this name appeared when we looked on the Actions
tab of the repositories Github webpage)
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
pull_request:
branches:
- main
The on:
segment of the workflow defines when it will be triggered. Here, we’re instructing the workflow to trigger when 1) we push directly to the main
branch, or 2) When we push to a branch with an open pull-request. A simple on:
statement could be on: push
- this would instruct the workflow to execute on any branch when any push to that branch is made. A more complete range of triggers can be found here.
jobs:
flake8:
This is where the ‘meat’ of the script begins. A workflow run is made up of one or more jobs. Jobs run in parallel by default. For example, a workflow of the form
jobs:
job1:
...
job2:
...
would spawn two processes job1
and job2
that are run in parallel. If we want jobs to run sequentially we can define dependencies via a needs
tag, e.g.
jobs:
job1:
...
job2:
needs: job1
...
means that job1
must complete before job2
is started.
In our workflow, we have a single job that we have here called flake8
.
runs-on: ubuntu-latest
This defines the operating system on which the job will be run e.g. Ubunti, Linux, MacOS or on a custom ‘self-hosted’ machine. Possible options are detailed here.
Within each of the defined jobs is a series of steps
that (within a job
) will be executed sequentially. A new step is started via, e.g., - uses
or - name
where the importance of the -
should be noted (which defines the begining of a new step). In the absence of name
being defined for a step some default naming convention will be used.
Step 1:¶
The first step of our job is
- uses: actions/checkout@v2
This ‘uses’ is defining a so called ‘action’ to use (and is where this CI framework derives its name). In Github’s own words:
“Actions are the building blocks that power your workflow. A workflow can contain actions created by the community, or you can create your own actions directly within your application’s repository.”
A large range of community developed actions are freely available at the Github marketplace (which is where all actions used today will come from).
The checkout
action being used in this first step is simple an action to clone the repository of interest, acse1_ci
, onto the github runner.
Step 2:¶
- name: Set up Python 3.7
uses: actions/setup-python@v1
with:
python-version: 3.7
Notice that this step begins with name
. This simply means that in the actions logs this step will appear with the given name (instead of one derived by default). This step makes use of the setup-python
action which, as the name suggests, simply sets up the given version of python
on the runner (here v3.7). Note that runners will generally have some version of python
installed by default but this will be the version ‘shipped’ with the operating system of choice and possibly not the version you wish to use during testing.
Step 3:¶
- name: Lint with flake8
run: |
flake8 --builtins=ArgumentError .
In the final step we’re making use of run
. The run command executes the given command-line input in the operating systems default login shell (e.g. bash
on unix systems).
Single commands can be executed via, e.g.
run: pwd
Multi-line commands make use of |
, e.g.
run: |
cd myproject
python my_sweet_script.py
echo "my sweet script has run"
In our actual test, we’re simply checking that our code is PEP 8 compliant.
Now, lets take a quick look at pytest-unit-tests.yml
located in the same folder as flake8.yml
.
You’ll see that most of the content is very similar to that of the .yml
we have just studied. Some new additions to point out are the following:
- name: Set test directory
run : |
echo "::set-env name=TESTS::tests/"
The syntax being used above is saying “we are setting the environment variable name TESTS
to have the value tests/
. We will then be able to make use of this environment variable in future steps.
- name: Install dependencies
run: |
sudo apt-get install python3-setuptools -y
pip3 install --upgrade pip
pip3 install -e .
pip3 install -r requirements.txt
This step is executing some shell commands to install the package and install the required dependencies.
- name: Test with pytest
run: |
$RUN pytest $TESTS
The final step in the job is executing pytest
within the location provided by the environment variable TESTS
. Note that the exact command being executed here is python3 -m pytest tests/
- the addition of python3 -m
is necessary here owing to the configuration of the Github hosted runners.
Adding more ‘simple functions’: how does numpy
compute \(\pi\), sin(x)
etc.?¶
Software users often make use of basic functions such as \(\pi\), sin
, cos
and tan
etc. without thinking about the underlying algorithm used to actually produce the result.
An important aspect of this course is that you start thinking about algorithms at a more fundamental level and that you’re able to develop and produce such algorithms (and not just be a user).
Lets consider how we would write a function to compute \(\pi\). Srinivasa Ramanujan produced many rapidly convergent series for approximating \(\pi\). One of them is as follows: \begin{equation} \frac{1}{\pi}=\frac{2\sqrt{2}}{9801}\sum_{k=0}^{\infty}\frac{(4k)!(1103+26390k)}{k!^4(396^{4k})}. \end{equation} Note: I believe, many modern day algorithms are developed based on a 1985 formula derived by the Chudnovsky brothers.
First, in order to implement this formula we’ll need to write a factorial
function. (We could also implement a square root function, but for that we’ll just use numpy.sqrt
for now). In functions1.py
lets add:
@lru_cache(maxsize=None) # Note: -> @cache in python >= 3.9
def factorial(n):
return n * factorial(n-1) if n else 1
lru_cache
should be imported from functools
and lets remember to update __all__
.
Then, lets add a test for this function (remember to import factorial
!), e.g.
@pytest.mark.parametrize('number, expected', [
(5, 120),
(3, 6),
(1, 1)
])
def test_factorial(self, number, expected):
'''Test our factorial function'''
answer = factorial(number)
assert answer == expected
After ensuing our factorial
function is working, lets moving on to our computation of \(\pi\).
Lets add a file called constants.py
and implement Ramanujan’s equation. Our new file should contain code allowing the following lines:
from numpy import sqrt
from simple_functions.functions1 import factorial
from functools import lru_cache
__all__ = ['pi']
def pi(terms=1):
return 1./(2.*sqrt(2.)/9801.*rsum(terms))
@lru_cache(maxsize=None) # Note: -> @cache in python >= 3.9
def rsum(n):
t = factorial(4*n)*(1103+26390*n)/(factorial(n)**4*396**(4*n))
return t + rsum(n-1) if n else t
Then, lets add a new test file called test_constants.py
. This should look something like the following:
import numpy as np
from simple_functions import pi
class TestPi(object):
'''Class to test our constants are computed correctly'''
def test_pi(self):
'''Test computation of pi'''
my_pi = pi(2)
assert np.isclose(my_pi, np.pi, atol=1e-12)
With that done, lets commit, push and check that our workflows are correctly executing these tests!
Exercise:¶
Add a new function to compute
sin(x)
tofunction1.py
(or to a new, e.g.,trig_functions.py
file)Add a related parameterised (i.e. test a few different inputs automatically) test to
test_simple_functions.py
Ensure the test is passing locally
Finally, commit the changes and push them to Github and check that they are now indeed being executed by the pytest-unit-tests.yml workflow
Notes:¶
How trigonometric functions are actually computed by software such as numpy is actually quite interesting and not entirely trivial (see this discussion for example). For this basic exercise you can assume that the input is real and between 0 and \(2\pi\) radians.
A quick and easy implementation is via a Taylor series expansion - if you’re not familiar with it, the expansion can be found within this webpage.
You should test your result against, e.g., numpy’s result. For your tests you’ll find the isclose and allclose functions handy.
Think about when this tests is not good and how to improve the algorithm e.g. what if the input is \(>2\pi\)? Does it work if the input is complex?
Some other useful Github Actions workflow features¶
cron jobs¶
It is often useful to have workflows trigger on a schedule. Such a schedule can be define using POSIX cron syntax. Take for example the following snippet:
on:
schedule:
- cron: '0 3 11,25 * *'
This states that the workflow should be executed at 0 minutes, 3 hours on the 11 and 25 day of every month of every year.
workflow_dispatch¶
Sometimes it’s handy to have a ‘push button’ to run your tests. This is available via the a workflow_dispatch - an example of which is given in the following snippet:
on:
workflow_dispatch:
inputs:
tags:
description: 'Run this workflow'
This introduces a ‘Run Workflow’ button on the Actions tab under the relevant workflow as shown in the image below
strategy: fail-fast¶
By default, is a job fails the workflow will not progress beyond this job. Sometimes this is not the behavior required. This can be altered by setting the fail-fast
flag to false
:
my-job:
needs: my-last-job
runs-on: ubuntu-latest
strategy:
# Job will run even if its dependency failed
fail-fast: false
steps:
...
Step output
’s¶
A value from one step can be used in a sub-sequent step (if we don’t wish to use environment variables) via (e.g. generate some python
argument):
- name: Make some value
run: |
...
generate some_value
...
echo ::set-output name=some_output::some_value
id: msv
and then
- name: Use some value
run:
python --${{ steps.msv.outputs.some_output }} my_file.py
if
statements¶
In many workflows we will want some steps to run under one condition and other steps to run under a different one. This can be achieved via if
statements. An example is shown below:
- name: conditional job
if: $MY_ENVIRONMENT_VARIABLE == some_value
run: |
echo "do something"
Hence the above step would only run is the value of MY_ENVIRONMENT_VARIABLE
is equivalent to that of some chosen some_value
.
Matrices¶
Another immensely useful feature that can be utilized to achieve all kinds of automation. Lets say we want a job to run on several different operating systems and/or with several different versions of python. This could be achieved via the following:
jobs:
pytest:
name: ${{ matrix.name }}
runs-on: "${{ matrix.os }}"
env:
PYTHON_VERSION: "${{ matrix.python-version }}"
TESTS: "tests/"
strategy:
# Prevent all build to stop if a single one fails
fail-fast: false
matrix:
name: [
python36-ubuntu1804,
python38-ubuntu2004,
python37-macOS
]
include:
- name: python36-ubuntu1804
python-version: 3.6
os: ubuntu-18.04
- name: python38-ubuntu2004
python-version: 3.8
os: ubuntu-20.04
- name: python37-macOs
python-version: 3.7
os: macos-latest
steps:
...
Self-hosted runners¶
Instead of running on the default runners provided, you can also set jobs to be run on self-hosted runners. These could be, e.g., a Microsoft Azure VM or some other bare metal machine. Once the runner is configured, jobs can be sent to this machine via:
runs-on: self-hosted
Note: If you have many different forms of self-hosted runners you can utilize custom ‘tags’ to send different jobs to different runners, e.g.
runs-on: [self-hosted, gpu]
or
runs-on: [self-hosted, mpi, some_other_tag_if_needed]
And many more…¶
We’ve covered a fair amount today, but we’ve just scratched to the surface as to what can be achieved with Github Actions. As you delve into more advanced development projects maybe you’ll need to start exploring some of the more creative ways to make use of actions!
Github Secrets
!¶
Sometimes a workflow may require the use of sensitive information (e.g. ssh login credentials). Clearly, you don’t want such information displayed in a public repository, but if you want contributors to be able to make use of your workflows ‘hiding’ these is also not an option. In such situations Github Secrets
come in handy.
A secret can be set by clicking on the repository setting tab and then under options choosing the secrets tab. They can then be utilized in a worfklow via ${{ secrets.my_secret }}
- the value stored can be numerical, a string or characters, a combination or even a snippet of code!
Some useful actions¶
We’ve already seen the checkout
action in use. Below is an extremely non-exhaustive list of some other useful actions:
Codecov action:¶
Codecov was discussed and they have a nice action for uploading and advertising the coverage of your repository:
- name: Upload coverage to Codecov
if: matrix.name != 'pytest-docker-py36-gcc-omp'
uses: codecov/codecov-action@v1.0.6
with:
token: ${{ secrets.CODECOV_TOKEN }}
name: ${{ matrix.name }}
github-push-action¶
Action to push any changes made by your CI to a designated repository:
- name: Push new configurations
uses: ad-m/github-push-action@master
if: ${{ steps.new-configs.outputs.new_configurations }} == true
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
ssh-action¶
An action to make login into a machine via ssh and executing some commands/scripts nice and easy:
- name: start actions runner app
uses: fifsky/ssh-action@master
with:
command: |
#!/bin/bash
nohup actions-runner/run.sh >/dev/null 2>&1 &
host: ${{ steps.host.outputs.action_host }}
user: ${{ secrets.ADMIN_LOGIN }}
pass: ${{ secrets.ADMIN_PASSWORD }}
args: "-tt"
upload-artifact and download-artifact¶
“Artifacts allow you to share data between jobs in a workflow and store data once that workflow has completed.”
This is incredibly useful when you want to collect and check, e.g., a set of results files from runs on various runners. Here are some example snippets of an upload and then a download:
- name: Upload result
uses: actions/upload-artifact@v2
with:
name: ${{ matrix.runner }}_${{ matrix.name }}
path: ${{ steps.fetch-results.outputs.results_file }}
- uses: actions/download-artifact@v2
with:
path: results
Execrise:¶
Add a new simple function of your choice, which requires a new dependency, together with some appropriate tests.
Create a new, or expand an old, workflow and utilise matrices to run all tests on ubuntu 18.04, ubuntu 20.04, MacOs and (if you like) Windows. Notice how the use of matrices parallelizes the jobs! TIP: Only add one matrix entry at a time!
Some final discussion points¶
Github Actions is an extremely powerful tool.
Whilst fairly new (< 1 year at the time of this lecture) its set of features is expanding all the time.
CI solution of the future?
In tandem with other technologies it allows for the automation of some faily exotic tasks. Lets skim over a workflow of TheMatrix project found here.