Continuous Integration

Aims:

  • Become familiar with the basic concepts of continuous integration (CI)

  • Gain basic knowledge of various existing CI frameworks

  • Become familiar with the basic use of Github Actions, including:

    • Setting up basic testing workflows

    • Utilizing actions within a workflow

Contact details:

  • Dr. Rhodri Nelson

  • Room 4.85 RSM building

  • email: rnelson@ic.ac.uk

  • Teams: @Nelson, Rhodri B in #ACSE1 or #General, or DM me.

What is Continuous integration?

In software development, Continuous Integration (CI) is the practice of developers integrating code into a shared repository several times a day. Each ‘push’ to the repository is then verified by automated testing tools, allowing any problems to be detected early.

And, from Wikipedia:

In software engineering, continuous integration (CI) is the practice of merging all developers’ working copies to a shared mainline several times a day. Grady Booch first proposed the term CI in his 1991 method, although he did not advocate integrating several times a day. Extreme programming (XP) adopted the concept of CI and did advocate integrating more than once per day – perhaps as many as tens of times per day.

Automated Continuous integration

Once you have tests, you want to make sure that any public changes to the code don’t break them. For code held under version control, this means that you ideally want to rerun your test suite on all new public merges, to ensure that the code installs and runs successfully on a clean system. Continous integration (CI) and Continuous Delivery (CD) frameworks tie together version control repositories and test suites so that this happens automatically whenever a commit to production code (i.e. your master branch) is proposed, or even whenever any change is commited anywhere in the repository.

The goal is to first ensure that the production branch always works, so that pure users can trust they will always be able to use it securely. Meanwhile, developers can feel safe in repeatedly merging changes to and from the master version and knowing whether their code works or not. At its furthest, under the doctrine of Extreme Programming (XP), this might mean work can merged multiple times a day by every developer among moderately sized teams.

XP workflows stress the following points:

  • Merge back to master frequently (i.e keep feature changes small and atomic )

  • Automate the software build

  • Automate the testing framework

Some CI solutions

As with test frameworks, a large number of CI/CD solutions exist, e.g.

  • Travis CI is a hosted continuous integration service used to build and test software projects hosted at GitHub and Bitbucket.

  • Jenkins, a free, open source CI tool which you can run on your own machine. Popular with Java developers.

  • Buildbot - Another self-run CI tool, written and configured with Python.

  • Azure Pipelines - A cloud system like Travis, this time run by Microsoft

and of course, the main CI pipeline we will learn about today

  • Github Actions, which allows building continuous integration and continuous deployment pipelines for testing, releasing and deploying software without the use of third-party websites/platforms

All the above CI solutions require their own setup files, which have many of the same concepts, but different syntax. For example, Azure Pipelines uses a file called azure-pipelines.yml, which looks like

trigger:
- master

pool:
  vmImage: 'ubuntu-latest'

steps:
# set the python version to use
- task: UsePythonVersion@0
  displayName: 
  inputs:
    versionSpec: '3.6'

# Script sections run bash scripts 
- script: |
    python -m pip install --upgrade pip setuptools wheel
    pip install pytest pytest-cov
    pip install -r requirements.txt
  displayName: 'Install dependencies'


# here's the actual test
- script: PYTHONPATH=$(System.DefaultWorkingDirectory) pytest quaternions_acse --junitxml=junit/test-results.xml --cov=quaternions_acse --cov-report=xml --cov-report=html
  displayName: 'Run pytest'

As you will see, this is very similar to the Github Actions .yml files we will explore shortly, although the syntax and available functionality will of course vary.

In general, which CI framework to use is rarely a true choice. Businesses and existing projects will have picked and offer support for one, and expect you to play nicely with it. Meanwhile for personal projects people want minimum effort, minimum cost solutions. On these two metrics Github Actions scores very highly.

It is also worth noting that Github Actions is a relatively new CI solution, having been added to Github in 2019. It is therefore undergoing continuous development and new features are being added and existing ones enhanced regularly.

Github Actions

To begin familiarizing ourselves with Actions, let us dive straight into an example. For this we will make use of a dummy repository located at https://github.com/rhodrin/ci_acse1. As in the environments lecture last week, start by forking this repository so that you have your own copy within your Github user space. Following this, make a clone of the forked repository:

git clone https://github.com/<my username>/ci_acse1.git

Now, lets set up a ‘Conda’ environment for this simple_functions package and install it:

cd ci_acse1
# Create the 'ci' environment
conda env create -f environment.yml
# Activate the environment
conda activate ci
# Install the package
pip install -e .

Lets qickly review the contents of the base folder:

  • environment.yml - The Anaconda environment file.

  • LICENSE - The license file, here the MIT license.

  • README.md

  • requirements.txt

  • setup.py

  • simple_functions/ - Folder containing the ‘main’ code of this package.

  • tests/ - Folder containing the tests.

Notice also the folder(s):

  • .github/worfklows/ - Default folder placing Github Actions workflows. The folder currently contains two .yml files, flake8.yml and pytest-unit-tests.yml.

Github provides a range of virtual machines (such as those you’ve encountered during the Azure lectures) to execute workflows automatically. As stated on their webpages:

GitHub offers hosted virtual machines to run workflows. The virtual machine contains an environment of tools, packages, and settings available for GitHub Actions to use.

and

A GitHub-hosted runner is a virtual machine hosted by GitHub with the GitHub Actions runner application installed. GitHub offers runners with Linux, Windows, and macOS operating systems.

On the webpage of our github repository, notice that if we click on the Actions tab we see something like the following:

This tab provides us with a summary of which workflows are being executed (here Flake8 and CI-unit-tests), when they were last executed and whether they produced a pass (green tick) or a fail (red cross).

Before we explore how these are created and what they are doing lets take a quick loot at our simple_functions and the related tests.

Currently, wthin simple_functions/functions1.py is the following code


__all__ = ['my_sum']


def my_sum(iterable):
    tot = 0
    for i in iterable:
        tot += i
    return tot

that is, just one trivial function. In the tests folder there is currently just one file, test_simple_functions.py, containing the following code:

import pytest

from simple_functions import my_sum


class TestSimpleFunctions(object):
    '''Class to test our simple functions are working correctly'''

    @pytest.mark.parametrize('iterable, expected', [
        ([8, 7, 5], 20),
        ((10, -2, 5, -10, 1), 4)
    ])
    def test_my_add(self, iterable, expected):
        '''Test our add function'''
        isum = my_sum(iterable)
        assert isum == expected

Throughout the course of this lecture, we will expand on these functions and tests, and in the process expand our CI through adding further Gihub Actions workflows.

So now, lets look at our Github Actions workflows and what they are doing. Although mentioned above, it is worth re-emphasizing that in order to execute actions workflows for any Github repository, all that is required is for one or more valid workflows to be placed in the folder .github/workflows/ (and then pushed to Github).

If we look at the contents of flake8.yml it reads:

name: Flake8

on:
   # Trigger the workflow on push or pull request,
   # but only for the main branch
   push:
     branches:
       - main
   pull_request:
     branches:
       - main

jobs:
  flake8:

    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v2
    - name: Set up Python 3.7
      uses: actions/setup-python@v1
      with:
        python-version: 3.7
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install flake8
    - name: Lint with flake8
      run: |
        flake8 --builtins=ArgumentError .

Lets analyse each segment of the script.

name: Flake8 

This is simply the name of the action (recall that this name appeared when we looked on the Actions tab of the repositories Github webpage)

on:
   # Trigger the workflow on push or pull request,
   # but only for the main branch
   push:
     branches:
       - main
   pull_request:
     branches:
       - main

The on: segment of the workflow defines when it will be triggered. Here, we’re instructing the workflow to trigger when 1) we push directly to the main branch, or 2) When we push to a branch with an open pull-request. A simple on: statement could be on: push - this would instruct the workflow to execute on any branch when any push to that branch is made. A more complete range of triggers can be found here.

jobs:
  flake8:

This is where the ‘meat’ of the script begins. A workflow run is made up of one or more jobs. Jobs run in parallel by default. For example, a workflow of the form

jobs:
  job1:
    ...
  job2:
    ...

would spawn two processes job1 and job2 that are run in parallel. If we want jobs to run sequentially we can define dependencies via a needs tag, e.g.

jobs:
  job1:
    ...
  job2:
    needs: job1
    ...

means that job1 must complete before job2 is started.

In our workflow, we have a single job that we have here called flake8.

runs-on: ubuntu-latest

This defines the operating system on which the job will be run e.g. Ubunti, Linux, MacOS or on a custom ‘self-hosted’ machine. Possible options are detailed here.

Within each of the defined jobs is a series of steps that (within a job) will be executed sequentially. A new step is started via, e.g., - uses or - name where the importance of the - should be noted (which defines the begining of a new step). In the absence of name being defined for a step some default naming convention will be used.

Step 1:

The first step of our job is

- uses: actions/checkout@v2

This ‘uses’ is defining a so called ‘action’ to use (and is where this CI framework derives its name). In Github’s own words:

“Actions are the building blocks that power your workflow. A workflow can contain actions created by the community, or you can create your own actions directly within your application’s repository.”

A large range of community developed actions are freely available at the Github marketplace (which is where all actions used today will come from).

The checkout action being used in this first step is simple an action to clone the repository of interest, acse1_ci, onto the github runner.

Step 2:

- name: Set up Python 3.7
  uses: actions/setup-python@v1
  with:
    python-version: 3.7

Notice that this step begins with name. This simply means that in the actions logs this step will appear with the given name (instead of one derived by default). This step makes use of the setup-python action which, as the name suggests, simply sets up the given version of python on the runner (here v3.7). Note that runners will generally have some version of python installed by default but this will be the version ‘shipped’ with the operating system of choice and possibly not the version you wish to use during testing.

Step 3:

- name: Lint with flake8
  run: |
    flake8 --builtins=ArgumentError .

In the final step we’re making use of run. The run command executes the given command-line input in the operating systems default login shell (e.g. bash on unix systems).

Single commands can be executed via, e.g.

    run: pwd

Multi-line commands make use of |, e.g.

    run: |
      cd myproject
      python my_sweet_script.py
      echo "my sweet script has run"

In our actual test, we’re simply checking that our code is PEP 8 compliant.

Now, lets take a quick look at pytest-unit-tests.yml located in the same folder as flake8.yml.

You’ll see that most of the content is very similar to that of the .yml we have just studied. Some new additions to point out are the following:

    - name: Set test directory
      run : |
        echo "::set-env name=TESTS::tests/"

The syntax being used above is saying “we are setting the environment variable name TESTS to have the value tests/. We will then be able to make use of this environment variable in future steps.

    - name: Install dependencies
      run: |
        sudo apt-get install python3-setuptools -y
        pip3 install --upgrade pip
        pip3 install -e .
        pip3 install -r requirements.txt

This step is executing some shell commands to install the package and install the required dependencies.

    - name: Test with pytest
      run: |
        $RUN pytest $TESTS

The final step in the job is executing pytest within the location provided by the environment variable TESTS. Note that the exact command being executed here is python3 -m pytest tests/ - the addition of python3 -m is necessary here owing to the configuration of the Github hosted runners.

Adding more ‘simple functions’: how does numpy compute \(\pi\), sin(x) etc.?

Software users often make use of basic functions such as \(\pi\), sin, cos and tan etc. without thinking about the underlying algorithm used to actually produce the result.

An important aspect of this course is that you start thinking about algorithms at a more fundamental level and that you’re able to develop and produce such algorithms (and not just be a user).

Lets consider how we would write a function to compute \(\pi\). Srinivasa Ramanujan produced many rapidly convergent series for approximating \(\pi\). One of them is as follows: \begin{equation} \frac{1}{\pi}=\frac{2\sqrt{2}}{9801}\sum_{k=0}^{\infty}\frac{(4k)!(1103+26390k)}{k!^4(396^{4k})}. \end{equation} Note: I believe, many modern day algorithms are developed based on a 1985 formula derived by the Chudnovsky brothers.

First, in order to implement this formula we’ll need to write a factorial function. (We could also implement a square root function, but for that we’ll just use numpy.sqrt for now). In functions1.py lets add:

@lru_cache(maxsize=None)  # Note: -> @cache in python >= 3.9
def factorial(n):
    return n * factorial(n-1) if n else 1

lru_cache should be imported from functools and lets remember to update __all__.

Then, lets add a test for this function (remember to import factorial!), e.g.

    @pytest.mark.parametrize('number, expected', [
        (5, 120),
        (3, 6),
        (1, 1)
    ])
    def test_factorial(self, number, expected):
        '''Test our factorial function'''
        answer = factorial(number)
        assert answer == expected

After ensuing our factorial function is working, lets moving on to our computation of \(\pi\).

Lets add a file called constants.py and implement Ramanujan’s equation. Our new file should contain code allowing the following lines:

from numpy import sqrt
from simple_functions.functions1 import factorial
from functools import lru_cache

__all__ = ['pi']


def pi(terms=1):
    return 1./(2.*sqrt(2.)/9801.*rsum(terms))


@lru_cache(maxsize=None)  # Note: -> @cache in python >= 3.9
def rsum(n):
    t = factorial(4*n)*(1103+26390*n)/(factorial(n)**4*396**(4*n))
    return t + rsum(n-1) if n else t

Then, lets add a new test file called test_constants.py. This should look something like the following:

import numpy as np

from simple_functions import pi


class TestPi(object):
    '''Class to test our constants are computed correctly'''

    def test_pi(self):
        '''Test computation of pi'''
        my_pi = pi(2)
        assert np.isclose(my_pi, np.pi, atol=1e-12)

With that done, lets commit, push and check that our workflows are correctly executing these tests!

Exercise:

  • Add a new function to compute sin(x) to function1.py (or to a new, e.g., trig_functions.py file)

  • Add a related parameterised (i.e. test a few different inputs automatically) test to test_simple_functions.py

  • Ensure the test is passing locally

  • Finally, commit the changes and push them to Github and check that they are now indeed being executed by the pytest-unit-tests.yml workflow

Notes:

  • How trigonometric functions are actually computed by software such as numpy is actually quite interesting and not entirely trivial (see this discussion for example). For this basic exercise you can assume that the input is real and between 0 and \(2\pi\) radians.

  • A quick and easy implementation is via a Taylor series expansion - if you’re not familiar with it, the expansion can be found within this webpage.

  • You should test your result against, e.g., numpy’s result. For your tests you’ll find the isclose and allclose functions handy.

  • Think about when this tests is not good and how to improve the algorithm e.g. what if the input is \(>2\pi\)? Does it work if the input is complex?

Some other useful Github Actions workflow features

cron jobs

It is often useful to have workflows trigger on a schedule. Such a schedule can be define using POSIX cron syntax. Take for example the following snippet:

on:
  schedule:
    - cron:  '0 3 11,25 * *'

This states that the workflow should be executed at 0 minutes, 3 hours on the 11 and 25 day of every month of every year.

workflow_dispatch

Sometimes it’s handy to have a ‘push button’ to run your tests. This is available via the a workflow_dispatch - an example of which is given in the following snippet:

on:
  workflow_dispatch:
    inputs:
      tags:
        description: 'Run this workflow'

This introduces a ‘Run Workflow’ button on the Actions tab under the relevant workflow as shown in the image below

strategy: fail-fast

By default, is a job fails the workflow will not progress beyond this job. Sometimes this is not the behavior required. This can be altered by setting the fail-fast flag to false:

  my-job:
    needs: my-last-job
    runs-on: ubuntu-latest

    strategy:
      # Job will run even if its dependency failed
      fail-fast: false

    steps:
      ...

Step output’s

A value from one step can be used in a sub-sequent step (if we don’t wish to use environment variables) via (e.g. generate some python argument):

    - name: Make some value
      run: |
        ...
        generate some_value
        ...
        echo ::set-output name=some_output::some_value
      id: msv

and then

    - name: Use some value
      run:
        python --${{ steps.msv.outputs.some_output }} my_file.py

if statements

In many workflows we will want some steps to run under one condition and other steps to run under a different one. This can be achieved via if statements. An example is shown below:

    - name: conditional job
      if: $MY_ENVIRONMENT_VARIABLE == some_value
      run: |
        echo "do something"

Hence the above step would only run is the value of MY_ENVIRONMENT_VARIABLE is equivalent to that of some chosen some_value.

Matrices

Another immensely useful feature that can be utilized to achieve all kinds of automation. Lets say we want a job to run on several different operating systems and/or with several different versions of python. This could be achieved via the following:

jobs:
  pytest:
    name: ${{ matrix.name }}
    runs-on: "${{ matrix.os }}"

    env:
      PYTHON_VERSION: "${{ matrix.python-version }}"
      TESTS: "tests/"

    strategy:
      # Prevent all build to stop if a single one fails
      fail-fast: false

      matrix:
        name: [
           python36-ubuntu1804,
           python38-ubuntu2004,
           python37-macOS
        ]
        include:
        - name: python36-ubuntu1804
          python-version: 3.6
          os: ubuntu-18.04

        - name: python38-ubuntu2004
          python-version: 3.8
          os: ubuntu-20.04

        - name: python37-macOs
          python-version: 3.7
          os: macos-latest

    steps:
      ...

Self-hosted runners

Instead of running on the default runners provided, you can also set jobs to be run on self-hosted runners. These could be, e.g., a Microsoft Azure VM or some other bare metal machine. Once the runner is configured, jobs can be sent to this machine via:

runs-on: self-hosted

Note: If you have many different forms of self-hosted runners you can utilize custom ‘tags’ to send different jobs to different runners, e.g.

runs-on: [self-hosted, gpu]

or

runs-on: [self-hosted, mpi, some_other_tag_if_needed]

And many more…

We’ve covered a fair amount today, but we’ve just scratched to the surface as to what can be achieved with Github Actions. As you delve into more advanced development projects maybe you’ll need to start exploring some of the more creative ways to make use of actions!

Github Secrets!

Sometimes a workflow may require the use of sensitive information (e.g. ssh login credentials). Clearly, you don’t want such information displayed in a public repository, but if you want contributors to be able to make use of your workflows ‘hiding’ these is also not an option. In such situations Github Secrets come in handy.

A secret can be set by clicking on the repository setting tab and then under options choosing the secrets tab. They can then be utilized in a worfklow via ${{ secrets.my_secret }} - the value stored can be numerical, a string or characters, a combination or even a snippet of code!

Some useful actions

We’ve already seen the checkout action in use. Below is an extremely non-exhaustive list of some other useful actions:

Codecov action:

Codecov was discussed and they have a nice action for uploading and advertising the coverage of your repository:

    - name: Upload coverage to Codecov
      if: matrix.name != 'pytest-docker-py36-gcc-omp'
      uses: codecov/codecov-action@v1.0.6
      with:
        token: ${{ secrets.CODECOV_TOKEN }}
        name: ${{ matrix.name }}

github-push-action

Action to push any changes made by your CI to a designated repository:

    - name: Push new configurations
      uses: ad-m/github-push-action@master
      if: ${{ steps.new-configs.outputs.new_configurations }} == true
      with:
        github_token: ${{ secrets.GITHUB_TOKEN }}

ssh-action

An action to make login into a machine via ssh and executing some commands/scripts nice and easy:

    - name: start actions runner app
      uses: fifsky/ssh-action@master
      with:
        command: |
          #!/bin/bash
          nohup actions-runner/run.sh >/dev/null 2>&1 &
        host: ${{ steps.host.outputs.action_host }}
        user: ${{ secrets.ADMIN_LOGIN }}
        pass: ${{ secrets.ADMIN_PASSWORD }}
        args: "-tt"

upload-artifact and download-artifact

Artifacts allow you to share data between jobs in a workflow and store data once that workflow has completed.”

This is incredibly useful when you want to collect and check, e.g., a set of results files from runs on various runners. Here are some example snippets of an upload and then a download:


    - name: Upload result
      uses: actions/upload-artifact@v2
      with:
        name: ${{ matrix.runner }}_${{ matrix.name }}
        path: ${{ steps.fetch-results.outputs.results_file }}

    - uses: actions/download-artifact@v2
      with:
        path: results

Execrise:

  • Add a new simple function of your choice, which requires a new dependency, together with some appropriate tests.

  • Create a new, or expand an old, workflow and utilise matrices to run all tests on ubuntu 18.04, ubuntu 20.04, MacOs and (if you like) Windows. Notice how the use of matrices parallelizes the jobs! TIP: Only add one matrix entry at a time!

Some final discussion points

  • Github Actions is an extremely powerful tool.

  • Whilst fairly new (< 1 year at the time of this lecture) its set of features is expanding all the time.

  • CI solution of the future?

  • In tandem with other technologies it allows for the automation of some faily exotic tasks. Lets skim over a workflow of TheMatrix project found here.