Python Dev - Scripts, Modules and Packages¶
Foreword¶
ACSE 1 Lecture One - 12th October 2020 - Version 3.0.5¶
Pre-Sessional material:
Introduction to
git
, Python & thebash
shell
In this lecture: Python as a development platform
- The Python Interpreter :- running code [20 minutes]
- Notebooks
- The command line interpreter
- The IPython console
- Python Scripts :- reusable code (40 minutes)
- ways to run a script, VS Code, shebangs
- Text encoding
- PEP8 and pylint - code linters
- Options parsers
- `matplotlib` in scripts
- Python Modules :- shareable code (45 minutes)
- Python docstrings, PEP 257 & numpydoc
- APIs
- `import`, `sys.path` & `$PYTHONPATH`
- Extension Modules
- Python Packages :- distributable code (1 hour)
- Directory Structure
- `setup.py` & `setuptools`
- `pip` & `conda` installation
Contact details:
Dr. James Percival
Room 4.85 RSM building (but not much this term)
email: j.percival@imperial.ac.uk
Teams:
@Percival, James R
inacse1
orGeneral
, or DM me.
By the end of this lecture you should:¶
Be able to write Python using Visual Studio Code.
Understand the similarities and difference between:
Python scripts
Python modules
Python packages
Know about Python coding standards, PEP8 and linters
Be able to make & install your own Python package.
A note on colours in this notebook:¶
In general, ordinary text looks like this. Much of this will cover the same material presented in the spoken lecture portions. Assessment in this module and the following ones (particularly the miniprojects) will expect you to be familar with these subjects.
# This cell sets the css styles for the rest of the notebook.
# Unless you are particlarly interested in that kind of thing,
# you can run this once, thensafely ignore it
%run add_colours.py
css_styling()
Danger boxes
These boxes contain important warnings. Ignoring information in them might be dangerous, or lead to unexpected behaviour from your computer.
Danger boxes will thankfully be rare.
Information boxes
These boxes contain information which is important, but not vital, and which can safely be ignored if you are running short of time.
Exercise: An example Exercise
These boxes contain exercises to test your knowledge and practice important conceptual ideas.
The many ways to use Python¶
Python code gets used in at least five different ways:
Jupyter notebooks
Hacking in the interpreter/ipython console
Small, frequently modified scripts
Module files, grouping together useful code for reuse
Large, stable project packages
Please follow along as we look at some of these methods.
Jupyter notebooks¶
These should need no further introduction, since you’re currently reading one. Jupyter notebooks combine data permenance, editable code and text comments in the same place.
When a cell is marked as a code cell, and a python kernel is running, it becomes an editable coding environment.
# we can write and run code here
Jupyter has good points and bad points. On average, data scientists like it a little bit more than computational scientists do.
The Python interpreter¶
On Windows Anaconda you can type
python
from the Anaconda command prompt to start a basic, no frills python interpreter session. On linux/Mac you may sometimes need to use the command python3
instead.
Python 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:14:23)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
This is probably the least user friendly way possible to run an interactive Python session, although it is the best supported. Many Mac and Linux systems come with Python as a default installation (although sometimes quite an old one), so it has a very high probability of being installed on machines you are asked to connect to using ssh
. The easiest way to quit is to call exit()
. On Mac/Linux you can also use Ctrl+D or on Windows Ctrl+Z then Return.
Warning!¶
If the first line starts with Python 2.X.Y like
Python 2.7.12 (default, Dec 4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
rather than Python 3.X.Y as shown above, then you’re running a very old interpreter. You can try typing python3
instead to see if Python 3 is installed on the system. If that still gives you Python 2.7, then something has gone wrong with your machine.
The IPython console¶
IPython (or Interactive Python) provides a much more “batteries included” Python experience, with a built in history editor, tab completions and inline matplotlib support. Anaconda provides a version, QtConsole in its own Qt window, so that the user experience on Windows, Mac and Linux is virtually identical.
At start-up an IPython interpreter session looks like
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.4.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]:
Unlike the vanilla python interpreter (what you get by typing python
in a terminal/command prompt), it contains useful features like tab autocompletions, a richer browsable history (using the arrow keys, additional access to the inbuilt documentation system and the easy ability to call out to the underlying operating system.
Many of the features available to you should be familiar even to those of you who have only used Jupyter notebooks before, since they are also available inside Jupyter notebook code cells. In fact “under the hood” Jupyter is running an IPython console (the Python “kernel”) to process Python3 code.
Exercise One: Running Python Code¶
Run the following commands
def square_and_cube(n):
return sorted([i**2 for i in range(n)]+[i**3 for i in range(n)])
print(square_and_cube(3))
print(square_and_cube(10))
and
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 6*np.pi)
plt.plot(x, np.sin(x))
in a notebook, in a vanilla Python interpreter and in a QtConsole/IPython console.
In each case, try modifying the
square_and_cube
function to also include the 4th power of n.The
sorted
function returns a new sorted list from an iterable. Try accessing the Python online help in the ordinary interpreter to invert the order of the list. Note you’ll need to use thehelp
function, since thesorted?
syntax is in IPython/Jupyter only.
Tip: for the IPython console, you may find the ipython magic command %paste
useful.
Python Scripts¶
A Python script file is just a regular plain text file containing only valid python code and comments (i.e lines starting with the hash/pound character, #
), which the Python interpreter transforms into instructions for the computer to perform. Script files are written in the same way you would write Python code in an interactive interpreter.
An Example¶
An example script, rot13.py
might look like
#!/usr/bin/env python3
# -*- coding: ascii -*-
import codecs
import sys
print(codecs.encode(sys.argv[1], 'rot13'))
This file reads a string from the command line and applies the ROT-13 cypher, which cycles letters in the Latin alphabet through to the one 13 places forward/backward (i.e. maps A => N, N => A, g => t and so on). This cypher is its own inverse.
Warning¶
ROT-13 is useful to make text hard to read casually, but is not remotely crypographically secret, let alone secure. Never use it in a situation where it wouldn’t be acceptable to use plain text.
#inside a notebook, the ! allows calls out to the OS shell
!curl https://msc-acse.github.io/ACSE-1/lectures/rot13.py -o rot13.py
!python rot13.py "Uryyb rirelobql!"
The above command will only work if the script is in the same directory as the notebook, or your computer is connected to the internet. Inside the IPython console and in notebooks, we can also use the run
statement:
Warning¶
The !
command lets Jupyter notebooks run commands in the operating system with the same privileges that the user (i.e. you) have, and similar tricks can be played with %%sh
, %%cmd
and similar cell magics. Don’t just run random notebooks off the internet unless you understand what they’re doing, or fully trust the person who you got them from.
%run rot13.py "Uryyb rirelobql!"
Now that we can run a file, lets have another look at the contents.
#!/usr/bin/env python3
# -*- coding: ascii -*-
import codecs
import sys
print(codecs.encode(sys.argv[1], 'rot13'))
Reviewing the contents of a script¶
Shebangs and executable files¶
The “shebang line”, #!/usr/bin/env python3
tells Linux/MacOSX systems that this script should be run with Python 3. If present This means that on those systems we can also turn the script into an executable file and run it straight off:
$ chmod 755 rot13.py
$ ./rot13.py "This works on Linux/Mac systems"
Warning
Note that the shebang line refers to Python 3 explicitly as python3. This is typical behaviour on computer systems with both Python 3 & Python 2 installed, where python
may still run Python 2. For those of you running Anaconda on Windows, python
means Python 3 there, and the python3
executable may not exist.
Text encoding¶
The next line # -*- coding: ascii -*-
tells python (and possibly your text editor) that the script uses the ASCII (American Standard Code for Information Interchange) text encoding. Text encodings map the numbers that computers are able to store onto the characters that humans can read. If a file is opened using the wrong encoding, then it will either read as nonsense, or contain many blank “unknown” characters.
Table above by Tom Gibara CC-BY-SA.
The file doesn’t have to be in ASCII. In fact the Python3 default is to use Unicode encoding (utf-8
) if no explicit encoding is given. This gives access to characters from most world languages. You can even use letter-like symobls from the Unicode standard as well as the more usual Latin characters in the names of functions and objects. For example, let’s write a more international “Hello World function”.
def 你好(x):
print('Hello', x)
你好('World!')
Similarly with the default utf-8
encoding you can use any Unicode characters from the standard you like in comments and strings.
def sorry():
"""😊"""
return "不好意思, 我不会说中文."
print(sorry())
print(sorry.__doc__)
Fortunately, you can’t actually use emoji in function names, so code like
def 😊(x):
return "This doesn't work"
will raise a SyntaxError exception.
Writing a Python script¶
Since a python script is just a text file, you just need a text editor to write them. Indeed, providing you save it as Plain Text
, you could even write it in Microsoft Word (please, please don’t). Your lecture on the shell introduced some console text editors which can be used on remote systems, but this course will use Visual Studio Code, a cross platform lightweight code editor (debatably an IDE, or integrated development environment) distributed by Microsoft, which makes writing, running and understanding Python scripts easier.
Warning¶
There are many reasons not to write code in Microsoft Word, including the autocorrect tool, which has an annoying tendency to “fix” code keywords like elif
in a way which tends to break code. However the most incidious feature (which also affects many code listings on the web) is “smart quotes”. Using pretty unicode punctuation like “ and ” or ‘ and ’ instead of the unidirectional ascii version"
and '
turns Python strings into nonsense.
Some other IDEs/code editors (multilanguage):
Spyder another IDE which comes bundled with Anaconda Python installations.
Visual Studio (Mostly Windows) Visual Studio Code’s big brother. The package also contains Windows compilers for various languages.
Xcode (Mac only) Apple’s IDE equivalent of Visual Studio.
Eclipse A cross-platforn open source IDE (python only):
PyCharm A Python IDE similar to Spyder.
and many others…
Generic text editors with code syntax highlighting include:
Jupyter - as well as notebooks, it can edit plain text files.
Emacs (cross platform) Console/Windowed text editor.
Nano (cross platform) Console text editor.
Notepad++ (Windows only) GUI text editor
Vim (cross platform) Console text editor.
Your choice of editting platform is personal, and each individual should find out what works for them. Don’t be afraid to experiment, but if you have already spent a lot of time writing code using a tool which supports Python, then we recommend you carry on using it.
In your lecture you will open up VS Code be given a quick introduction to the interface.
If you haven’t already downloaded the Python VS Code extension, it can be found (for free) here on the Visual Studio Code marketplace.
Information¶
As with many other IDEs, VS Code has a large community producing extensions, covering a wide range of programming and markup languages. Some other ones you might be interested in:
C/C++ support
Latex support for writing up mathematical content.
Support for Github pull requests
Option parsing in Python¶
Reading from the command line with sys.argv
¶
The sys.argv
variable is a list of the string arguments given when executing the script, with the first variable (sys.argv[0]
) being the name of the script itself. We can use this to communicate with the script from the command line, so that one file can do many things without needing to edit it. For example, the following script counts the number of uses of the letter ‘e’ in a file:
import sys
e_count = 0
with open(sys.argv[1],'r') as infile:
for line in infile.readline():
e_count += line.count('e')
print("There were %d letter e's"%e_count)
Exercise Two: Find the primes¶
Using VS Code (or your own prefered data entry method), write a Python script to output the first 20 prime numbers. If you answered lecture 2 in the introductory exercises, you can start from the code you wrote ther, or start from fresh.
Tips:
- One way of doing this uses an outer loop counting how many primes you have, and then code to find the next prime number.
- Note that a number cannot be prime if it divides by a prime number and that 1 is not prime.
- If a number is not prime, it must have at least one factor smaller than its own square root. This can be used to improve the efficiency of your search.
- If
a
dividesb
exactly, thena%b==0
, which gives a quick test.
When testing your code, you should expect the output for the first 5 primes to be [2, 3, 5, 7, 11]
.
Try to convert the script into a routine to calculate all prime numbers smaller than an input, \(n\) using `sys.argv`.
A model solution for the script is available.
argparse
and options parsing¶
To pass more complicated options to a script, there is the argparse
module, part of the standard python library. This module gives python scripts the (relatively) simple ability to take flags and process (or parse) other complicated inputs.
For full details, you should read the documentation linked to above, but as a short example, we can write a program which download current tube statuses from Transport for London.
status.py:
from urllib.request import urlopen
import json
parser = argparse.ArgumentParser()
parser.add_argument("mode", nargs='*',
help="transport modes to consider: eg. tube, bus or dlr.",
default=("tube", "overground"))
parser.add_argument("-l", "--lines", nargs='+',
help="specific lines/bus routes to list: eg. Circle, 73.")
args = parser.parse_args()
if args.lines:
url = "https://api.tfl.gov.uk/line/%s/status"%','.join(args.lines)
else:
url = "https://api.tfl.gov.uk/line/mode/%s/status"%','.join(args.mode)
status = json.load(url)
short_status = {s['name']:s['lineStatuses'][0]['statusSeverityDescription']
for s in status}
for _ in short_status.items():
print('%s: %s'%_)
This code uses the argparse module to accept multiple positional arguments for modes of transport, e.g.
python status.py tube bus national-rail
as well as flag based options for individual lines
python status.py -l central
Exercise Three: Find the mean
Write a script to calculate the mean of a sequence of numbers. As an extension, tru make it take extra options (using the `argparse` module) -b
, -o
and -x
to work with with binary (i.e. base 2, with 101 == 5
decimal), octal (i.e. base 8, with 31 == 25
decimal) and hexadecimal (i.e. base 16 2A == 42
decimal) numbers.
Test your basic script on the following sequences: 1
(mean 1) 1 5 7 13 8
(mean 6.8), 2.5, 4 ,-3.2, 9.3
(mean 3.15).
Also try feeding it no input.
Tips:
For the longer version you can use the 2 argument version of the int
function to change the base of numbers. For example int('11',2)==3
and int('3A', 16)==58
.
Model answers are provided for the short exercise and for the long version which takes options flags.
interlude
Let’s take a break from talking about Python scripts to point out a weird way that python behaves that can sometimes catch people out when writing code.
If you haven’t seen this before, try and guess the output produced by repeatedly calling these functions in the cells below.
def f(tmp=[]):
"""Try to default to have tmp as an empty list."""
for i in range(4):
tmp.append(i)
return tmp
def g(tmp=None):
"""Doing the same thing explicitly."""
tmp = tmp or []
for i in range(4):
tmp.append(i)
return tmp
print('f()', f())
print('g()', g())
print('f()', f())
print('g()', g())
So, what’s going on here?
In the first case Python creates the empty list variable once when the function is first defined, and then reuses it on every subsequent call, since tmp
is initialised to refer to it whenever we reenter the function. In the second case tmp
is pointed at None
each time, then changed to point at a new empty list each time the tmp = tmp or []
line is called.
The x = y or z
syntax means x
is set to y
if y
is “truthful”, or z
if it isn’t.None
, 0
and empty containers like lists are all not truthful).
End of interlude¶
A reminder on using matplotlib
in scripts¶
In scripts which are run in the terminal, rather than in a notebook or an IPython console, matplotlib
may not automatically put interactive plots on screen. In this case, you will need to use the matplotlib.show()
or the pyplot.show()
command to see your figures.
Alternatively, as you learnt last week, you can use a command like matplotlib.savefig('mycoolplot.png')
to write the images to disk without any human interaction. The output format is guessed from the filename that you give it (e.g. .png
, .jpg
, .pdf
).
Exercise Four: Plots in scripts
Write a script to plot the functions $y=\sin(x)$, $y=\cos(x)$ and $y=\tan(x)$ to screen over the range [0,$2\pi$] and then run it in a terminal/prompt.
Make sure to include labels on your axes.
Change the script to output a .png
file to disk.
Next do the same to write a .pdf
.
Model answers are available.
PEP8 - The Python style guide¶
Although as you saw earlier non-ACSII function names and comments are allowed in the Python 3 standards, you are strongly discouraged from using them in code which other people are going to see (including the assignments on this Masters course). That is actually one of the recommendations of the Python Style Guide, known as PEP8.
Python Enhancement Proposals (PEPs) are the mechanism through which Python expands and improves, with suggestions discussed and debated before being implemented or rejected. PEP8 describes suggestions for good style in Python code, based on Guido van Rossum (the original Python creator) noting that (with the exception of throw-away scripts) most code is read more often than it is written. As such, the single most important aspect of code is readability, by you and by others.
Note that PEP8 does not cover every single decision necessary in generating Python code in a consistent style. As such, there are many more detailed guides, either at the project level , or for entire organizations. For an example of the former, see the discussion of numpy
later in this lecture. For an example of the later, see the Google Python Style Guide. When choosing what to do on your own projects, you are the boss, but PEP8 is a useful minimum (and will gain/lose you marks during the assessed exercises in this course) and it is useful to consider the thinking in the choices other projects make.
Code linters, and static code analysis¶
For Python, as with many other languages, there exist automated tools which check your code against an encoding of a style guide and point out mistakes. These are known as code linters, by analogy with clothes lint and the lint rollers used in laundries. Like the cleaning tool they remove mess and ‘fluff’ from your code to leave things looking neat and tidy.
By Frank C. Müller, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=636140
There are many tools to perform code linting with python, including the lightweight pycodestyle
(formerly known as pep8
) package, which simply checks for conformity with the basic PEP8 guidelines. Some tools, such as pyflakes
and pylint
also perform static code analysis. That is, they parse and examine your code, without actually running it, looking for bad “code smells”, or for syntax which is guaranteed to fail.
An extension to run the pylint
tool is offered as an optiona when using VS Code to edit Python code. You can also elect to use automatic pep8
corrections as you type, as well as running a full pep8
check each time a document is saved, by installing the relevant python packages and turning on the relevant options in the extensions settings.
Other hints for writing good Python scripts:¶
Explicit is better than implicit.
Don’t duplicate code, use functions.
Try to keep things compact enough to read in one go.
Make variable names meaningful if used on more than one line.
Simple is often better than clever.
Practise the principle of least astonishment.
Add comments when they add meaning.
For further discussion, see resources such as Google’s Python Style Guide, the Hitchhiker’s Guide to Python and style guides for large open source python projects such as Django which define, discuss and give verdicts for a number of open questions not covered by PEP8 (or where they disagree).
Finally, remember once again that much about code style is a social issue. You certainly don’t have to decide to behave the way any guide tells you if it affects no-one else, and nobody else ever interacts with your code. You should behave the way you and your team mates agree (or how your boss tells you!)
Exercise Five: Fix the script¶
Copy the following script into your editor/IDE and run the static analysis tool pylint
on it. Fix the errors and warnings that it gives you.
value={1:'Ace',11:'Jack',12:'Queen',13:'King'};
for _ in range(2,11):
value[_]=_
suit={0:'Spades',1:'Hearts',2:'Diamonds',3:'Clubs'}
def the_name_of_your_card(v,s = 0,*args, **kwargs):
"""Name of card as a string.
"""
if (v < 1 or v > 13 or s not in (0,1,2,3)):
raise ValueError
return """I have read your mind and predict your card is the %s of %s."""%(value[v], suit[ s])
print( the_name_of_your_card(2, s= 3))
Interlude¶
If you haven’t seen this before, try and guess the output produced by the functions in the cells below. Can you explain what’s going on?
a = [_**2 for _ in range(5)]
for i, k in enumerate(a):
print('%s: %s'%(i, k))
print('sum:', sum(a))
b = (_**2 for _ in range(5))
for i, k in enumerate(b):
print('%s: %s'%(i, k))
print('sum:', sum(b))
c = (_**2 for _ in range(5))
print('sum:', sum(c))
So what’s going on here?
While [ a for a in b ]
is a list comprehenstion, making up a list (an iterable) from the elements you ask for, the generator syntax ( a for a in b)
does’t make up a tuple (despite looking like it should), but a pure interator. That means that it creates its elements only when asked for them, but can only be cycled through once.
Iterators and generators can be useful to save system memory when dealing with very large sequences. For example, range(10**6)
returns an iterator over the numbers from 0-999,999, and takes just 48 bytes of memory, while list(range(10**6))
fills upwards of 8Mb.
In general, generators and comprenhesions can be very useful ways to code in Python, and are often faster to run than the equivalent for
loop construction would be (though not usually as fast as using numpy
for numerical operations where that’s possible.
Python Modules¶
Python module files denote code which you can use with an import
command in your own scripts and programs. That is to say that it describes an external file from which you are using (or reusing) content. In other languages, a very similar concept might be called a library file. A pure Python module file has the same format as a script, except it expects to be import
ed into other files, or into the interpreter directly, usually without visually interacting with the user. This means that a typical module file contains definitions for functions and classes, but doesn’t produce any output (or try to do any significant extra work) by itself.
The code for a module, code_mod.py
:
"""Wrapper for rot13 encoding."""
import codecs
def rot13(input):
"""Return the rot13 encoding of an input string."""
return codecs.encode(str(input), 'rot13')
import code_mod
code_mod.rot13("Uryyb rirelobql!")
The import
command¶
The import
search path¶
After looking in the current directory, Python uses the other directories inside the sys.path
variable, in order, when asked to find files via an import command.
import sys
print(sys.path)
This means that this variable can be changed within a Python script itself, or can be influenced when the Python session starts through the PYTHONPATH
environment variable.
The importlib.reload
and %reset
commands.¶
The python command reload
in the importlib
module tells the interpreter to update its record of the contents of an indivual module. This can be useful during an interactive interpreter session if you update a code in a module or package, whether automatically, or by editting the file by hand.
The IPython/Jupyter magic command %reset
clears elements of the interpreter history and resets things back to their original blank state.
warning
The reload
command only updates the contents of the module passed as an argument, not necessarily the contents of modules that are imported inside it. If in doubt, it’s safest to exit the interpreter and restart.
x=7
print(x)
# by itself, %reset asks the user for confirmation.
# %reset -f forces it to proceed.
%reset -f
try:
print(x)
except NameError as e:
print(e)
Python docstrings for scripts, modules and packages.¶
Documentation where it’s needed¶
As you were told in the introcution to python course, the text between the “”” blocks is called a docstring. It should appear at the top of scripts & module files, (or just below the file encoding, if one is needed) and as the first text lines inside classes
or function def
blocks. Python uses it to generate help information if asked. This information is stored in the object __doc__
variable.
import code_mod
code_mod.rot13?
There is a sctually a PEP, PEP257 which gives suggestions for a good docstring. In particular it suggests:
One line docstrings should look like
def mod5(a): """Return the value of a number modulus 5.""" return a%5
I.e. the docstring is a full sentence, ending in a period, describing the effect as a command (“Do this”, “Return that”).
Multiline docstings should start with a one line summary with similar syntax and have the terminating “”” on its own line.
The docstring of a pure script should be a “usage” message.
The docstring for a module should list the classes and functions (and any other objects) exported by the module, with a one-line summary of each.
The numpydoc
standard¶
The numpy
package has its own standards, which are well suited to numerical code, especially code which interfaces with numpy
package itself, e.g. by using numpy
multidimensional arrays. You have already seen examples of the numpydoc
style in previous lectures, but lets give another one
%matplotlib inline
import numpy as np
import matplotlib.pyplot as pyplot
def mandelbrot(c, a=2.0, n=20):
"""
Approximate the local Mandelbrot period of a point.
Parameters
----------
c : complex
Point in the complex plane
a : float
A positive bounding length on the horizon of the point z_n
n : int
Maximum number of iterations .
Returns
-------
int
i such that |z_i|>a if i < n, NaN otherwise.
"""
z = c
for _ in range(n):
if abs(z)>a:
return _
z = z**2 + c
return np.nan
dx = np.linspace(-2, 1, 300)
dy = np.linspace(-1.5, 1.5, 300)
x, y= np.meshgrid(dx, dy)
z = np.empty(x.shape)
for i in range(len(dx)):
for j in range(len(dy)):
z[i, j] = mandelbrot(x[i, j]+1j*y[i, j],100)
pyplot.pcolormesh(x, y, z)
pyplot.xlabel('$x$')
pyplot.ylabel('$y$')
pyplot.get_cmap().set_bad('black')
In the numpydoc
style, the Parameters
and Results
sections prescribe the data types (int
, float
, complex
str
etc.) of the inputs and outputs of the method. This uses the syntax of a text markup language called reStructured text. We will revisit this tomorrow when we introuduce the documentation generator, sphinx
.
A note on types¶
By default Python practises form of dynamic typing called “duck typing”, where “as long as it looks like a duck and quacks like a duck, it’s a duck”. This can sometimes cause suprising problems when the names of functions clash.
class Duck(object):
def quack(self):
print ("Quack!")
return self
def fly(self):
print("Flap, flap, flap")
return self
class Bugs(object):
def spider(self):
print("8 legs")
return None
def fly(self):
print("6 legs")
return None
def takeoff(x):
return x.fly()
#This works
duck = Duck()
takeoff(duck).quack()
try:
# this won't
bugs = Bugs()
takeoff(bugs).quack()
except AttributeError:
import traceback
traceback.print_exc()
In a strongly typed language like C, this kind of error would be caught when the code was compiled. In Python, errors can often only be caught when the branch which holds them is run.
Design by contract¶
Those of you used to strongly typed languages like C will find the numpydoc
specification familiar. The numpydoc
docstrings are also a weak example of a wider code design philosophy called design by contract, or programming by contract. In that system, the developer explicitly lists all the assumptions that a function makes about its inputs, as well as the guarantees that it makes about its outputs.
Exercise Six: Complex square root¶
Write a function which accepts a real number and returns the complex square roots of that number.
Your function should include a docstring conforming to the numpydoc
standard.
Tips:
- You can use the `sqrt` function in the `math` module to obtain the square root of a positive real number.
- Python uses the notation `1j` for a unit length imaginary number (which a mathematician would typically denote $i$), where (loosely) $\sqrt{-1}=\pm 1j$.
Questions: how many complex square roots does each real number have? Is it the same for every real number?
A model answer is available.
interlude: Code Quality
Code quality is often a balance between three things:
- Maintainability: The code is easy to read and to understand.
- Performance: The code is as fast and secure to run as we can make it.
- Resources: This is both the size of the machine and the developer time available to address the problem
This is frequently a case of “which two do you want?” As such, there are compromises necessary when designing code. However, it’s important that they are recognised, and only made when appropriate. To quote Donald Knuth
Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.
The moment that code is going to be read a second time (including by you in two months time) then it becomes unacceptable to write it as though it is disposable. Functions need docstrings, and variables should have names which make sense (and not just to you personally right now).
Similarly, when you’ve tested your code, and you know that a specific function takes 90% of the runtime, it may make sense to rewrite it in a faster way, even if that is harder to maintain (more numpy
, using numba
, writing your own C
extension modules, and so on).
Combined files¶
Mixed scripts & modules¶
A file can be both a script and a module providing you use a special if
test to check how it is being used (to avoid being antisocial, and doing all the calculating and printing your script is set up to do):
rot13m.py
:
import codecs
# module definitions
def rot13(input):
"""Return the rot13 encoding of an input"""
return codes.encode(str(input), 'rot13')
if __name__ == "__main__":
# Code in this block runs only as a script,
# not as an import
import sys
print(rot13(sys.argv[1]))
Although it started as something of a hack, the if __name__ == "__main__"
idiom is now accepted as fully Pythonic, and is something you will see often in modules which also have sensible script like behaviour.
Exercise Seven: A primes module¶
Make a copy of your script to calculate prime numbers and:
- add the ability to read the number of primes to output from the command line,
- turn it into a version which can also be used as a module,
- test this by running a copy of the interpreter and `import`ing it, then calling your routine.
- Try running it from the terminal/Anaconda command prompt using the following syntax:
python -m rot13m "this runs a python module"
See what happens if you change directories.
A model answer is available
Python Packages¶
An example package¶
Python packages bundle multiple modules into one place, to make installing and uninstalling them easier and to simplify usage. A simple python package just consists of python files inside a directory tree.
A typical template for a fairly basic python package called mycoolproject
might look like:
mycoolproject
├── __init__.py
├── cool_module.py
├── another_cool_module.py
└── extras
├── __init__.py
├── __main__.py
└── extra_stuff.py
requirements.txt
setup.py
LICENSE
README.md
The __init__.py
file is slightly special (as is common in python with double underscore names, or dunders), in that it gets read when you run import mycoolproject
(or whatever the name of the directory is). The other files can be imported by themselves as mycoolproject.cool_module
, mycoolproject.another_cool_module
, etc. Similarly the __main__.py
file acts like a python version of the if __name__=='__main__':
block for modules, in that it is activated if the package is run like python -m mycoolpackage
In a typical package the __init__.py
file mostly consists of import
commands to load functions and classes from the other modules in the package into the main namespace, as well as possibly defining a few special variables for itself.
mycoolproject/__init.py__:
from .cool_module import my_cool_function, my_cool_class
from .another_cool_module.py import *
Since as the auther of the package, you are in control of everything, this may be the only time the from modX import *
idiom is appropriate.
When import
ing modules, remember that levels of directories are separated using the .
symbol, so
from mycoolproject.cool_module.extras.extra_stuff import super_cool_function
Exercise Eight: A primes package¶
Turn your “find the primes” module file into a package called primes
by creating a suitable directory structure and an __init__.py
so that you can access a function to give you the first \(n\) primes as well as all primes smaller than \(n\).
Try import
ing your new package from the IPython console. Check that you can call your function.
If you have time, add a function to the package to give you a list of the prime factors of an integer (i.e the prime numbers which divide it with no remainder).
A model answer is available.
Information¶
You can also use ‘.’ for relative import
statements, with a syntax similar to the unix shell. So, in the example above the file /mycoolproject/extras/__init__.py
can write:
# one . for the current directory
from . import extra_stuff
## two ..s for its parent
from .. `import another_cool_module
## this also works
from ..coolmodule import foo
Licensing¶
Warning
I am not a laywer! More specifically, I am not your lawyer. Lawyers spend a lot of money on insurance, so that they are safe to give specific legal advice without the fear of liability. While I will try to be as accurate as possible in the information provided here, don’t plan on using these notes as a defence in court.
Licences grant permissions¶
As a copyright holder, you can always grant others the ability to use, copy and distribute your software. The easiest and simplest way to do this is to publish a licence (that’s the UK spelling) together with your code. As a user & developer, ensuring that software you use has a licence with terms compatible with what you intend to do with it prevents long, costly and embarrassing legal action further down the line. Best advice is thus to store a licence file in any repository people can see and copy from, and possibly even add it to the header of your source code
Although in theory you could always write your own licence, few scientists are also lawyers. Because legal text has legal meaning, it is always safer to use one of the well known and well understood existing copyright licences. If in doubt, the MIT License (that’s how the Americans spell it) is popular and well understood. If you feel strongly that your work must always remain in circulation, use the latest version of the GPL.
Note that the legal concept of licensing is almost entirely separate from the academic concept of plagiarism. A licence can you the legal right to reuse or modify someone else’s work, you cannot be given the moral right to falsely claim it as your own work, and should identify the original author in an appropriate manner.
Copyright¶
The author, or commissioner (for work done “for hire” for an employer) of software code has certain rights (called copyrights) to control the ability of other people to copy and distribute their work, just as the authors of a book or the producers of a film do.
country |
UK |
EU |
USA |
China |
India |
---|---|---|---|---|---|
copyright period |
life+70 |
life+70 |
life+70 |
life+50 |
life+60 |
There are some exceptions to these time periods. In the UK, “where a work is made by Her Majesty or by an officer or servant of the Crown in the course of his duties” it is placed under Crown Copyright. New Crown copyright material that is unpublished has copyright protection for 125 years from date of creation. Published Crown copyright material has protection for 50 years from date of publication. Meanwhile the copyright to the play Peter Pan (which the author J. M. Barrie gifted to the Great Ormond Street childrens hospital) is specifically legislated to last forever.
Copyright is automatic¶
Although various methods exist to register the date at which works were created, there is now generally no need to do anything to copyright your work. Your rights exist automatically from the moment of creation (i.e. when you first wrote the code), and continue to exist unless you explicitly give them up, or until the legally mandated time has passed. In fact, in some juristictions specifically some parts of the EU) authors are unable to opt out of their moral rights over their work.
Software Copyright¶
For computer software specifically (a “literary work”), UK copyright laws allow creators to control the acts of:
copying,
adapting,
issuing (i.e distributing),
renting and lending copies to the public. The specific relevant legislation is the UK Copyright, Designs and Patents Act 1988 (CPDA 1988) and the EU Directive 91/250/EC (the Software Directive).
Since they aren’t paid by the Universities, students in the UK (even Ph.D students) are not employees, and always own the copyright on the code they write by default. On the other hand, work done as part of their job by University staff officially belongs to the University.
When working writing software for an employer, while writing your own code in your free time, it’s important to separate the two activities. There have been legal cases (particularly in the USA) over copyright when people use work-owned resources (e.g. computers) or even worked on the same topic while developing their own code.
Free/Libre Open Source Software (FLOSS)¶
The word “free” in English has two main meanings
Without cost : “Buy one, get one free!”
Unrestrained : “They set the prisoners free.”
The free software movement is aimed at ecouraging software to be distributed under terms matching the second meaning.
Stallman’s four freedoms:¶
Freedom 0 The freedom to run the program, for any purpose.
Freedom 1 The freedom to study how the program works, and change it so it does your computing as you wish. (Access to the source code is a precondition for this.)
Freedom 2 The freedom to redistribute copies so you can help your neighbor.
Freedom 3 The freedom to distribute copies of your modified versions to others. By doing this you can give the whole community a chance to benefit from your changes. (Access to the source code is a precondition for this.)
The public domain¶
The “most free” thing you may be able do with code (depending on the local legal system) is to release it into the public domain. This is the same state that literary works are left in after the legally mandated time has expired. At this point, anyone is free to use or reapply the material in any way they see fit.
Since some legal systems (particularly the civil law practised in much of the EU) can make it practically impossible for authors to give up thair “moral rights”.
Permissive licences versus “copyleft”¶
Many licences, while retaining copyright over the work and not releasing it into the public domain, otherwise give users relatively unrestricted rights to copy, modify and distribute the code. In particular, they allow the code to be used (often with attribution) as part of greater works released under more restrictive licences (for example, ones which prohibit distributing your own copy of the source for the larger project, or the modified version of the existing code). These are often called “permissive” licences.
On the other hand, a set of licences modelled after the GNU General Public License are intended to ensure that once software is released as “free software, it stays as “free software”. As such, they place restrictions on the immediate recipient of the work, in order to ensure that people later down the chain retain their version of the four freedoms:
the freedom to use the software for any purpose,
the freedom to change the software to suit your needs,
the freedom to share the software with your friends and neighbors, and
the freedom to share the changes you make.
Specifically, the various versions of the GPL all require that when modified versions of GPL’d projects are distributed, the new version is placed under a GPL licence (e.g. they much also release the source code on demand, and allow other users the right to modify and distribute it). This “carry forward” operation has caused such licences to be called “copyleft” (a play on words from “copyright”).
Strong versus weak copyleft.¶
Various bodies, including the Free Software Federation, the organization behind GNU, have recognised that software is seldom used in isolation. One component interacts with another component, which calls a third component etc. With a “strong” copyleft licence such as the GPL, this requires every piece of code in the ecosystem to also be copyleft. In most practical environments, this is impossible to ensure past a given size, since some components (e.g. the “binary blob” provided to run your graphics card) are liable to be provided under a permissive open source or proprietary commercial licence.
As such, a second class of “weak” copyleft licences, such as the GNU Lesser General Public Library allow their code to be linked to (i.e. called in automated sense) in derivative works by code not under a (L)GPL licence. Specifically, if the code is called or used as a library then no restriction is implied, but if the code of the libray itself is modified then the standard restricitions still apply. The word “lesser” is used in terms of the rights of a theoretical third party user, who may no longer be guaranteed the right to modify the code that links to the original library.
Licence compatibility¶
Because “copyleft” licences require derivative works to also be released under suitable “copyleft” licences, it is impossible to release packages containing GPL components entirely under more permissive licences such as BSD.
Licence |
BSD |
LGPL |
GPL |
---|---|---|---|
BSD |
Yes |
No |
No |
LGPL |
Yes |
Yes |
No |
GPL |
Yes |
Yes |
Yes |
Commercial rights¶
Some licences make a distinction between “commercial” and “non-commercial” uses. In particular the work may be freely licenced for non-commercial use, with the right reserved to charge a fee for commercial use. In general “commercial use” can be interpretted fairly broadly as related to income-generating use of any kind, whether direct or indirect. This means that for code under a non-commercial license, not only should you not sell the work itself, you probably shouldn’t use it in a way that earns money.
Fortunately, academic study and pure research uses are frequently specifically excluded as non-commercial activities, avoiding the awkward question of “the code lets me do my reseach, for which a funding body pays me, is that commercial?”. However, this can be an issue when the intellectual property (IP) produced at the end of a project contractually becomes the property of an industrial partner. Many companies (including Imperial College) have lawyers on retainer to deal with this kind of question.
Installation and distribution¶
Setup.py
, distutils
and setuptools
¶
The setup.py
file is a standard name for an install script for Python packages (written in Python itself). Python even comes with a module in its standard library, distutils
, to automate this as much as possible. We will use an enhanced version called setuptools
, compatible with the Python package manager, pip
. For a simple package in plain Python, the setup.py
file might look like the following:
from setuptools import setup
setup(
name='mycoolproject', # Name of package, required
version='1.0.0', # Version number, required
packages=['mycoolproject'], # directories to install, required
# One-line description or tagline of what your project does
description='A sample implementation of quaternions.', # Optional
url='https://www.mycoolproject.com', # Optional
author='James Percival', # Optional
author_email='j.percival@imperial.ac.uk', # Optional
)
This script can be called in several modes. For pure Python packages, the most useful is probably
python setup.py install
or
python setup.py install --user
These both copy the files in the package into a directory in the standard search path. The first installs for everybody (and might need admin rights) the second installs just for the current user.
Version Numbers¶
The version
keyword in the setup.py
file allows you to specify a version number. There are many formats for version numbers used in software development. These range from the absurdly simple (eg. build 1, build 2, build 3 …) to the complicated (eg. the Linux kernel has versions like 4.15.0-36-generic), to the unusual (eg. the TeX typesetting system is currently on version 3.14159265, with a successive digit of \(\pi\) added with each new version). As is often the case, there is even a PEP about it (PEP440).
Unless you have a good reason to do something different “semantic versioning” is a convenient standard to stick with. This is just an ordered set of three integers, separated by dots, e.g. 0.2.3
or 13.4.2
. The structure is (major version).(minor version).(patch version), where a major version increment (e.g. from 10.2.3 to 11.0.0) implies big changes in the code, which are likely to break things designed for previous versions, while a minor increment means small changes which might cause problems. Incrementing the patch version implies only bug fixes, while not changing external APIs.
Because the differences between major versions can prevent people upgrading, it’s commmon to “backport” fixes and features from the mainline “trunk” of development back to new minor versions of the previous generation of code. A good example is Python itself, where version 2.7.0 was released on July 3rd, 2010 (and it’s now up to 2.7.14), whereas Python 3.0 was released on December 3rd, 2008.
Some communities (e.g. the Linux kernel developers) add on additional meaning to the semantic numbers. For example a common scheme is that odd minor versions are “development” or “unstable”, whereas even numbers are for general release, or “stable”. That means there are more likely to be bugs (and thus more patches) in the unstable versions of releases of the code base, but new features appear there first.
The pip
and conda
package managers¶
Although you can install packages yourself by hand, it is simpler to use a tool, called a “package manager”, to control things. This allows for easier installs, uninstalls and for sandboxing environments to run specific software (this described in Wednesday’s lecture).
Your Anaconda installation comes with two inbuilt package managers, conda
, specially written for Anaconda itself and pip
, which is more widely available on non-Anaconda installs. Since conda
understands about pip
, we will describe that tool in more detail here.
Dependencies¶
An individual Python package typically has its own Python dependecies (i.e. other non-system packages which this package itself imports). A common way to documents these is a requirements.txt
file consisting of a list of package names (one per line), possibly also indicating a minimum or exact version number to be installed.
requirements.txt
jupyter
numpy >= 13.1.0
scipy == 1.0.0
mpltools
The lines with just the name allow any version, the lines with >=
demand a version which is “greater than or equal to” that specified (where eg. 2.0.0 > 1.9.1 and 1.2.0 > 1.1.9) and the lines with ==
demand a specific version. The packages listed in the requirements.txt
file, or at least suitable versions of them can then be pip
installed in one go, via the compact command:
pip install -r requirements.txt
The conda
manager accepts similar files in a format called .yml
or .yaml
(short for “yet another markup language”, or possibly “YAML ain’t markup language”). YAML formatted files are normally used for software configuration, where data elements mostly consist of named strings and lists. A conda
environment.yml
file looks like
environment.yml
name: acse
dependencies:
- jupyter
- numpy
- scipy
- pip:
- mpltools
Here the pip
subsection in the dependencies lists packages for which there isn’t a full conda
package produced, and where a straight pip
install must be used instead.
As yet another route, you can also include your dependencies in your ‘setup.py’ file using the install_requires
keyword in your setup
function. While each method we recommend the pip
and requirements.txt
route as typically more common, repeatable and robust.
Exercise Nine¶
Make a setup.py
script for your module and try install
ing and uninstall
ing it using pip
. In the directory containg the setup.py
file run
pip install .`
and
pip uninstall <the name of your module>
From an interpreter console started in another directory, see when you can and can’t import your new module.
PyPI, the Python Package Index¶
When not given a local setup.py
file to work from, pip
defaults to scanning PyPI, the Python Package Index. This is a very large repository of python software, and is a very good place to check before naming your projects. It also has a useful tutorial on the packaging process, and points to https://choosealicense.com/ as a resource for picking licenses.
If you would like to test upload packages yourself, then there is an option to register for their test server, then follow the tutorial instructions to archive and upload your packages there.
With that step completed, you now know everything you need to write, package and distribute open source software to the world. All you need to add is your time and creativity.
interlude: Coupling Python and other languages¶
As you may have realised, although powerful, Python code is not always particularly fast to execute. One way around this (as followed by packages such as numpy
) is to write small, frequently called sections in a compiled language such as C. Since the usual Python implementation is itself written as C code, there is a well documented path to do so, called the Python C API.
As a concrete example, consider the following C file:
example.c:
#define PY_SSIZE_T_CLEAN
#include <Python.h>
typedef struct {
PyObject_HEAD
char data_name[255];
} exampleExample;
PyObject* exampleExample_NEW(void);
int exampleExample_Check(PyObject*);
PyTypeObject* exampleExample_Type(void);
static PyObject *
my_fun(PyObject *self, PyObject *args)
{
long l1, l2;
if (!PyArg_ParseTuple(args, "ll", &l1, &l2))
return NULL;
return PyLong_FromLong(2*l1+l2);
}
static PyMethodDef exampleMethods[] = {
{"my_fun", long_add, METH_VARARGS,
"my_fun(a, b)\n--\n\n Return 2*a+b."}, /* function documentation */
{NULL, NULL, 0, NULL} /* Sentinel indicating end of module methods */
};
static struct PyModuleDef examplemodule = {
PyModuleDef_HEAD_INIT,
"example", /* name of module */
"C based example extension", /* module documentation, may be NULL */
-1, /* size of per-interpreter state of the module,
or -1 if the module keeps state in global variables. */
exampleMethods
};
PyMODINIT_FUNC
PyInit__example(void)
{
PyObject *m;
m = PyModule_Create(&examplemodule);
if (m == NULL)
return NULL;
if (PyType_Ready(exampleExample_Type()) < 0) return NULL;
Py_INCREF(exampleExample_Type());
PyModule_AddObject(m,"Example", (PyObject*)exampleExample_Type());
return m;
}
Although this looks complicated, as with most C code it mostly follows a standard template. The most important part is the definition of the function my_fun
which we are turning into a Python module method.
To actually build the code (on a system with a suitable C compiler), we can use a slightly different form of setup.py
file:
setup.py:
#!/usr/bin/env python
from setuptools import setup, Extension
mod1 = Extension('example',
sources=["example.c"])
setup(name='Example',
version='1.0',
description='An example template',
author='James Percival',
author_email='j.percival@imperial.ac.uk',
ext_modules=[mod1]
)
Now we can (hopefully) just run python3 setup.py build_ext --inplace
.
There exist tools such as Cython (for the Python=>C side) and SWIG (for C=>Python) which somewhat simplify these workflows.
More programming exercises.¶
The website Project Euler contains a large number of computational mathematics problems which can be used as exercises in any programming language to practise thinking algorithmically (warning, some of them use complicated mathematics). We will list a few here:
Exercise: Project Euler Problem 1¶
If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.
Find the sum of all the multiples of 3 or 5 below 1000.
Exercise: Project Euler Problem 5¶
2520 is the smallest number that can be divided by each of the numbers from 1 to 10 without any remainder.
What is the smallest positive number that is evenly divisible by all of the numbers from 1 to 20?
Exercise : Project Euler Problem 8¶
Consider the 1000 digit number
73167176531330624919225119674426574742355349194934
96983520312774506326239578318016984801869478851843
85861560789112949495459501737958331952853208805511
12540698747158523863050715693290963295227443043557
66896648950445244523161731856403098711121722383113
62229893423380308135336276614282806444486645238749
30358907296290491560440772390713810515859307960866
70172427121883998797908792274921901699720888093776
65727333001053367881220235421809751254540594752243
52584907711670556013604839586446706324415722155397
53697817977846174064955149290862569321978468622482
83972241375657056057490261407972968652414535100474
82166370484403199890008895243450658541227588666881
16427171479924442928230863465674813919123162824586
17866458359124566529476545682848912883142607690042
24219022671055626321111109370544217506941658960408
07198403850962455444362981230987879927244284909188
84580156166097919133875499200524063689912560717606
05886116467109405077541002256983155200055935729725
71636269561882670428252483600823257530420752963450
The four adjacent digits in this number that have the greatest product are 9 × 9 × 8 × 9 = 5832. Find the thirteen adjacent digits in the 1000-digit number that have the greatest product. What is the value of this product?
Summary¶
In this lecture we learned:
- The difference between Python scripts, modules and packages.
- Code standards and code linters.
- To make & install your own Python package.
Tomorrow:
- More on scientific Python packages: scipy, sympy etc.
Further Reading:¶
PEP8, the Python style guide
The Google Python Style Guide
A tutorial for the Visual Studio Code IDE
The python documentation page on modules & packages.
PEP257 - docstring conventions.
The
numpydoc
docstring standardWriting C extensions for Python