Sandboxing & virtual environments¶
Yesterday you learned how to wrap python code up into a package with its own name and version number. There are several situations in which it can be useful to “sandbox” code into its own space so that other package installations cannot interfere with it, and so that it cannot interfere with them.
Aims: (update with links and further details)¶
Introduce the basic idea of environment variables
Introduce various virtual environment and container solutions including
Venv
Anaconda
Docker
Explore in further detail how to work with Anaconda
Contact details:
Dr. Rhodri Nelson
Room 4.85 RSM building
email: rhodri.nelson@imperial.ac.uk
Teams:
@Nelson, Rhodri B
in#acse1
or#General
, or DM me.
Enviroment variables¶
From Wikipedia (https://en.wikipedia.org/wiki/Environment_variable):
An environment variable is a dynamic-named value that can affect the way running processes will behave on a computer. They are part of the environment in which a process runs. For example, a running process can query the value of the TEMP environment variable to discover a suitable location to store temporary files, or the HOME or USERPROFILE variable to find the directory structure owned by the user running the process.
As an example, one important environment variable is the PATH variable. To view the value of this variable, on a Unix machine you can type
echo $PATH
or on a windows machine
echo %PATH%
The variable is a list of directory paths. When the user types a command without providing the full path, this list is checked to see whether it contains a path that leads to the command.
In summary, the values of environment variables govern how certain as aspects of your environment function, e.g. which executables and libraries will be called/accessed by default, or which options will be used when executing certain commands.
Why do we need a virtual environment?¶
You’re also now familiar with the Python
package manager pip
. Consider the following two ‘dummy’ packages and their requirements:
Package A, requires the following packages:
a, version >= 1.0
b, version 1.2
c, version >= 2.2
d, version >= 5.0
Package B, requires:
a, version >= 1.0
b, version >= 1.3
e, version 1.0
f, version >= 7.0.
Reviewing the above, we can see there is a conflict for package b. Clearly, using pip
to switch between two versions of package b
every time want to use A or B is not a good solution. But further, in reality, when working on larger development projects such dependency conflicts may arise for several, of even dozens(!), or packages. Clearly, a better solution is to have both versions of the software installed and an easy way to switch between the appropriate environment variables when using either A or B. This is where virtual environments come in handy.
A virtual environment is a tool that helps to keep dependencies required by different projects separate by creating isolated python virtual environments for them. This is one of the most important tools that most Python developers use.
When and where to use a virtual environment?¶
By default, every project on your system will use the same directories (defined via environment variables) to store and retrieve site packages (third party libraries). Why does this matter? In the above example of two projects, you have two versions of package b
. This is a real problem for Python since it can’t differentiate between versions in the “site-packages” directory. So both v1.2 and v1.3 would reside in the same directory with the same name. This is where virtual environments come into play. To solve this problem, we just need to create two separate virtual environments, one for each project. The great thing about this is that there are no limits to the number of environments you can have since they’re just directories containing a few scripts.
Along with the above example, we may also want to make use of virtual environments because
Sometimes packages have the same name, but do different things, creating a namespace clash.
Sometimes you need a clean environment to test your package dependencies in order to write your
requirements.txt
file (we will talk more about such files later).
venv¶
Python comes with an inbuilt library called venv
which can be used to create so-called “virtual environments”. Inside a virtual environment, only the packages and tools you explicitly choose to copy across are available, and only at the version numbers you request. This gives a quick, easy access to a “clean” system, be it for testing, or to run mutually incompatible software.
To create a new venv
environment you can run a command like
python -m venv foo
or, on systems with both Python 2 and Python 3 available,
python3 -m venv foo
This will create a new directory ./foo/
containing the files relevant to that virtual environment. To start the environment on Windows run
foo\Scripts\activate.bat
or on unix shell like systems
source foo/bin/activate
To disable the environment, on windows systems run
.\foo\Scripts\deactivate.bat
or, in most unix based shells
deactivate
Building a requirements.txt
file using venv
¶
Switching to a venv
environment is one way to build up a short list of required packages for a new project. You can start up a blank environment and then try to build and run your code. If a dependency is missing, this should fail with an ImportError
message, something along the lines of
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-ukwnrb23-build/setup.py", line 5, in <module>
from numpy import get_include
ImportError: No module named 'numpy'
Ideally you will then be able to recognise the missing dependency (in this case numpy
) and fix it by running a command like pip install numpy
. After repeating as needed to fix any further requirements you can generate a requirements.txt
compatible list for your Python environment (with the command pip freeze
, this also lists the currently installed version).
Exercise : Make a `venv`
Create your own venv
environment, giving it a name of your choice. Activate it. Note the difference it makes to your command prompt.
Double check the installed package list using pip list
. Install a package into the virtual environment (such as matplotlib
) using pip
. Check that the list of installed packages inside the environment changes.
Some other tools (before we talk about Anaconda)¶
This is a popular tool for creating isolated Python environments for Python libraries. virtualenv functions by installing various files in a directory (e.g. env/
), and then modifying the PATH
environment variable to prefix it with a custom bin directory (e.g. env/bin/
). An exact copy of the python or python3 binary is placed in this directory, but Python is programmed to look for libraries relative to its path first, in the environment directory. It’s not part of Python’s standard library, but is officially blessed by the PyPA (Python Packaging Authority). Once activated, you can install packages in the virtual environment using pip
.
A virtualenv only encapsulates Python dependencies. A Docker container encapsulates an entire operating system (OS). With a Python virtualenv, you can easily switch between Python versions and dependencies, but you’re stuck with your host OS. With a Docker image, you can swap out the entire OS - install and run Python on Ubuntu, Debian, Alpine, even Windows Server Core. There are Docker images out there with every combination of OS and Python versions you can think of, ready to pull down and use on any system with Docker installed.
Tools such as Docker are excellent for testing software packages and cross operating system/hardware compatibility. For software development, tools such as Anaconda are generally more convenient.
What is Anaconda?¶
A open-source distribution of Python that simplifies package management. It comes with applications such as Jupyter Notebook, the Conda environment manager, and Pip for package installation and management.
Anaconda also comes with hundreds of Python packages such as matplotlib, NumPy, SymPy and so forth.
It eliminates the need to install common applications separately and will (generally) make installing Python on your computer much easier.
Note that a large range of helpful Anaconda tutorials can be found online.
To learn more about the usage of Anaconda, let us together work through an exercise.
For this we will fork, then clone and play around with a dummy package made for this purpose.
Go to the address https://github.com/rhodrin/environments_acse1 and click on the fork
button as shown in image below (make sure you’re logged into your Github account before doing this):
Then clone the forked package (Make sure you’re in an appropriate folder before performing the clone)- in a terminal type
git clone https://github.com/<my github name>/environments_acse1.git
and then checkout v1 of the package via
git checkout tags/v1 -b v1
The package can also be cloned via the Visual Studio Code GUI.
In the base folder notice the presence of both an environment.yml
file and a requirements.txt
file. The environment.yml
file defines the Anaconda virtual environment we wish to create. If we look at its contents, we see (importantly) name: envtest
, specifying the name of the environment we’re going to create and some dependencies. We’ll talk more about them later, but now let us create a ‘conda’ environment. In cloned directory type
conda env create -f environment.yml
When that command is complete type
conda activate envtest
and following this (making sure that the your terminal has now been modified such that (envtest)
is appearing) type
pip install -e .
to install the envtest
package (recall that the operations performed by this command are governed by the contents of setup.py
). Once that is done, let us view the contents of requirements.txt
. Currently, we see that only one dependency is listed, that of numpy
(version > 1.6). Lets install this dependency via
pip install -r requirements.txt
Now, to check everything is correctly set up, from within the base directory run
python scripts/random_number_array.py
You should see output along the lines of
[[0.22330655 0.07439368 0.69014812]
[0.90354345 0.06734495 0.13096386]
[0.22487417 0.6394524 0.41603555]]
(noting that the actual numbers you see will be slightly different since the routine is generating a 3x3 array of random numbers between 0 and 1).
Also, lets now look at the result of echo $PATH
(or echo %PATH%
) again. Notice the modified value within our environment.
Now, lets add a few further functions, dependencies and scripts to our repository.
In the file envtest/builtins.py
:
add the following two lines right after the existing import
from scipy.ndimage import gaussian_filter
from scipy import misc
Modify
__all__ = ['rand_array']
to__all__ = ['rand_array', 'smooth_image']
Add the following function:
def smooth_image(a, sigma=1):
return gaussian_filter(a, sigma=sigma)
Then, modify the file scripts/smooth_image.py
so that it reads (i.e. remove any existing text):
from envtest import smooth_image
from scipy import misc
import matplotlib.pyplot as plt
image = misc.ascent()
sigma = 5
smoothed_image = smooth_image(image, sigma)
f = plt.figure()
f.add_subplot(1, 2, 1)
plt.imshow(image)
f.add_subplot(1, 2, 2)
plt.imshow(smoothed_image)
plt.show(block=True)
Currently, if we try running this script, e.g. python scripts/smooth_image.py
, we’ll see an error of the following form:
Traceback (most recent call last):
File "smooth_image.py", line 1, in <module>
from envtest import smooth_image
File "/data/programs/environments_acse1/envtest/__init__.py", line 1, in <module>
from envtest.builtins import *
File "/data/programs/environments_acse1/envtest/builtins.py", line 2, in <module>
from scipy.ndimage import gaussian_filter
ModuleNotFoundError: No module named 'scipy'
This is of course because we have not yet installed the ‘new’ required dependencies. These are scipy
and matplotlib
, so lets add them to our requirements.txt
file and install them. That is, modify requirements.txt
so that is now reads:
numpy>1.16
scipy
matplotlib
and then type
pip install -r requirements.txt
again. (Note that a pip install scipy
etc would also do the job, but since we want to keep our requirements file up to date it doesn’t hurt to use it directly).
Following this, after running the script we should see a plot with the original image on the left and the smoothed image on the right
Lets go through this exercise once more. To builtins.py
add the following function:
def my_mat_solve(A, b):
return A.inv()*b
Remember that we need to make this function visible within the package and hence must modify the __all__ = [...]
line appropriately.
Then, lets add a new script to make use of it. In the scripts
folder create a new file called solve_matrix_equation.py
and within it paste the following text
from envtest import my_mat_solve
from sympy.matrices import Matrix, MatrixSymbol
# Call function to solve the linear equation A*x=b symbolically
A = Matrix([[2, 1, 3], [4, 7, 1], [2, 6, 8]])
b = Matrix(MatrixSymbol('b', 3, 1))
x = my_mat_solve(A, b)
print(x)
Our new dependency is SymPy
. Hence, lets also add that to requirements.txt
(simply add sympy
to the end of the file) and then repeat the install command we used previously.
To ensure the newly added script runs successfully, upon execution we should see the following output:
Matrix([[b[0, 0]/2 + b[1, 0]/10 - b[2, 0]/5], [-3*b[0, 0]/10 + b[1, 0]/10 + b[2, 0]/10], [b[0, 0]/10 - b[1, 0]/10 + b[2, 0]/10]])
Exercise: Finalizing our repository¶
We’ll now update our environment.yml
file and then rebuild our environment to ensure that within our updated environment everything works correctly ‘out of the box’.
First, commit the changes we’ve made via
git commit -a -m "<my commit message>"
Following this lets checkout the master branch (note that the changes we’ve made above have simply brought us up to date with the
master
branch)
git checkout master
Then, add a further function of your choice to
builtins.py
and an accompanying script to utilize this function. Ensure that this new function requires at least one new additional dependency (remember to modifyrequirements.txt
etc. appropriately). If you’re not sure what new package to use, how about a quick Pandas example? (You’ll learn more about Pandas later in this course).When this is done, and you’ve confirmed that the new script is working as intended, modify the
environment.yml
and add all new dependencies to it. That is, is should now look along the lines of the following but with your additional dependencies also added:
name: envtest
channels:
- defaults
- conda-forge
dependencies:
- python>=3.6
- pip
- numpy>1.16
- scipy
- matplotlib
- sympy
When this is done, commit all these changes to the repository, remembering to add any new files first - e.g.
git add scripts/my_new_script.py
followed by
git commit -a -m "<my commit message>"
Push these changes to github
git push
IMPORTANT: Next, ensure your github repository has updated correctly - you can check this through checking some files in your web-browser.
Now, as a test, we’ll deactivate and delete our environment and remake it using our updated environments.yml
file.
The required commands are as follows:
conda deactivate
conda remove --name envtest --all
conda env create -f environment.yml
Then, one the environment has been created activate it again via conda activate envtest
. Not that if we now look at pip list
we will see the full list of required packages along with their dependencies have been installed already.
(NOTE: In practice we’d generally create a new environment with a different name to test everything is working, but since this is a ‘dummy’ package and to avoid ‘clutter’ we’ll do it this way for the time being).
As a final note, think about why is it important to have both environment.yml
and requirements.txt
files.
The
environment.yml
was used only when creating our environment. Remember it was useful to have therequirements.txt
file to install the required packages when developing our environment. (Although we could have also continuously updated our environment file and then updated our environment viaconda env update -f environment.yml
).In any case, we generally want both for people making use of our package who are not using
Anaconda
. As we saw earlier, we could use such a requirements file invenv
.Additionally, what if we want some packages to not be installed automatically when creating our environment? For various reasons, we may with to have an, e.g.
requirements-optional.txt
file present (generally the packages listed inenvironment.yml
andrequirements.txt
should be in sync unless there’s a good reason for them not to be). Any such optional requirements can be installed via thepip install -r ... .txt
command once again.