The Cloud & Microsoft Azure, Part II¶
Front Matter¶
October 20th 2020 - Version 1.0.0¶
Contact Details¶
Dr. James Percival
Room 4.85 RSM building
email: j.percival@imperial.ac.uk
Teams:
@Percival, James R
in#ACSE1
or#General
, or DM me.
Learning Objectives¶
By the end of this lecture you should:¶
Understand the basic concepts of HTTP communication and RESTful APIs.
Be able to code a simple app in Flask.
Be able to serve that app from Azure.
Azure Web services & Web Apps¶
One of the key questions with cloud services is which protocol to use to access them. For Azure services their are three major options:
Remote Desktop Protocol (RDP), to access Windows (and some linux) virtual machines and to use them in the same manner as a desktop.
Secure Shell (SSH), to access a terminal on VMs (or apps on linux through X forwarding)
Hypertext Transfer Protocol (HTTP/HTTPS) to access services via the web, whether through a browser, or another application.
Lets look further how that can work:
RDP¶
RDP should be familar, either from your time with an Azure lab, or from the exercises yesterday. This allows a user to connect to a GUI on remote machine.
SSH¶
SSH (the secure shell) should again be familiar to most of you from the exercises yesterday.
HTTP/HTML¶
HTTP & HTTPS (i.e secure HTTP) addresses will be familiar to you from your experience on the web. They are an example of a uniform resource locator (url), which take the form
https://user:password@www.imperial.ac.uk:8000/example/example/example.html?val1=abc&val2=123.4
This address can be broken down into several sections
Protocol¶
The leftmost part defines the protocol (think of it as an agreed language) being used. With HTTPS encryption must be agreed between the user agent (e.g. the browser) and the server.
authentication¶
It is not frequently used with HTTP (due to relative lack of security in that service) but usernames and passwords can be provided in the URL.
Server¶
The server provides a human readable mnemonic for the IP address of the remote server being connected to. This is looked up, working from right to left by contacting helper “name servers” (aka DNS servers) to find the machine you request (so in this case, a global server will be contacted to find a .uk
DNS server, which points us at the .imperial
server, which points us at the webserver dealing with requests to www
)
Port number¶
Ports can be thought of as individual communication addresses on a single machine. Only one type of communication can happen on one port at a one time, although multiple users can be served. Some protocols have standard ports which they default to if no specific port number is given.
End point¶
The portion from the end of the server name to the beginning of the parameters is passed to the remote application connected to the port to decide what response to give. For a simple static http server this might be a directory path to a specific file. For a dynamic server, this might be a more complicated incantation.
Parameters¶
The text beyond the ?
consists of a set of parameters, encoded in a key=value
format, which is again passed on to the server application to control its output.
RESTful APIs¶
You may remember our script to look up TFL train line statuses on the tube which we introduced last week:
status.py:
from urllib.request import urlopen
import json
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("mode", nargs='*',
help="transport modes to consider: eg. tube, bus or dlr.",
default=("tube", "overground"))
parser.add_argument("-l", "--lines", nargs='+',
help="specific lines/bus routes to list: eg. Circle, 73.")
args = parser.parse_args()
if args.lines:
url = "https://api.tfl.gov.uk/line/%s/status"%','.join(args.lines)
else:
url = "https://api.tfl.gov.uk/line/mode/%s/status"%','.join(args.mode)
status = json.loads(str(urlopen(url).read(),'ascii'))
short_status = {s['name']:s['lineStatuses'][0]['statusSeverityDescription']
for s in status}
for _ in short_status.items():
print('%s: %s'%_)
The information provided by the Transport For London Unified API is an example of a RESTful API. In general, these work by providing a set of http-based URLs (i.e., web addresses) which respond to requests by returning relevant database information. Fuller documentation is available at
For example, sending an http GET request to https://api.tfl.gov.uk/Occupancy/BikePoints/BikePoints_187 (e.g. by trying to open it in your browser) will receive a response like
[{"$type":"Tfl.Api.Presentation.Entities.BikePointOccupancy, Tfl.Api.Presentation.Entities","id":"BikePoints_187","name":"Queen's Gate (South), South Kensington","bikesCount":3,"emptyDocks":21,"totalDocks":25}]
This is an example of JSON, a data format derived from Javascript, which works similarly to the Python database definition. Other, slightly less common formats include XML, YAML and CSV.
The Python system module urllib.request
can be used to handle transmitting and receiving the requests, although non-system packages with more features are also available, and usually recommended when accessible (the most famous is probably the requests
package.). The json
, xml
and csv
modules can also be used to provide basic data processing on responses although, for large data sets, a package such as pandas
may be more appropriate.
Connect to some other RESTful apis.
Some examples include:
The [UK National Archive]
but there are many more out there.
More on JSON¶
JSON data is very similar to Python script, with only a few key differences, along with some minor variations in terminology. The Python bulit-in json
module can be used for automatic translation and many third-party packages (such as the data processing package Pandas you’ll learn about later in the week)
Flask - Python Web Apps¶
The are now many frameworks for creating web services, ranging from the simple but lightweight to the complicated but powerful. We will introduce a Python framework called Flask, originally created as an April Fool’s joke, which is on the lightweight end of the spectrum, but makes it very easy to create one file Web Apps driven by form processing.
A “Hello World” Flask program¶
As a Python package, we can install Flask using pip
or conda
using a command like:
pip install flask
With Flask installed, we can write a short example program and give it the “magic” name app.py
.
app.py
from flask import Flask
app = Flask(__name__)
@app.route("/hello")
def root():
return "<b>Hello</b> World!"
In this program we create a Flask application, and write a short function root
, which we assign using a Python decorator to be called whenever an HTTP request is made to the end point /hello
.
We can test run this on our local system with the command
flask run
inside the directory with the app.py
file, which starts a web server on the local host on port 5000. We can then point our browser at the full URL <http//localhost:5000/hello> to see the final result.
Using Azure App Services to serve apps¶
Azure Web Apps Services delivers http based (especially Flask based) Apps direct from GitHub
Exercise¶
Log in to the Azure portal and create a web app from some of your Flask code stored on GitHub.
Local Python GUIs¶
There exist a number of GUI Toolkits compatible with Python, including TK, GTK+ and QT5. We’ll give an example of the use of the last one, since it interacts well with Anaconda.
The following requires the qtpy
package.
from qtpy import QtWidgets, QtCore
import sys
class MainWindow(QtWidgets.QMainWindow):
def __init__(self, parent=None):
super().__init__()
self.setWindowTitle("Hello world!")
widget = QtWidgets.QWidget()
self.setCentralWidget(widget)
layout = QtWidgets.QVBoxLayout(self)
widget.setLayout(layout)
self.label = QtWidgets.QLabel("A qt GUI", self)
self.label.setAlignment(QtCore.Qt.AlignCenter)
layout.addWidget(self.label)
self.greet_button = QtWidgets.QPushButton("Greet", self)
self.greet_button.clicked.connect(self.greet)
layout.addWidget(self.greet_button)
self.close_button = QtWidgets.QPushButton("Close", self)
self.close_button.clicked.connect(self.close)
layout.addWidget(self.close_button)
def greet(self, widget, callback_data=None):
print("Greetings!")
def quit(self):
self.app.exit()
app = QtWidgets.QApplication(sys.argv)
win = MainWindow(app)
win.show()
if __name__ == "__main__":
sys.exit(app.exec_())
When run, this script creates a basic windoxbox, with two buttons. The “Greet” button directs a greeting to your console, the “Close” button closes the window. Although small, this toy example demonstrates the use of Python to generate and control a widget, and can easily be extended.
Note that this code is written to work locally in a terminal. If you are attempting to run it in a Jupyter session then:
The session will have to be running on a local system or one which you connect to via a windowing system (e.g. RDP, or with a suitable SSH connection with X forwarding).
You will need to use the
%gui qt
iPython magic (or whichever is appropriate for your choice of GUI toolkit).
Security and the Cloud¶
Firewalls¶
In general, computers and services connected to the internet for a significant time should expect to be attacked by malicious users, whether in order to gain illicit access to the system to suborn it to their own purposes, or to deny it to others via denial of service attacks, whether from a single location, or from a distributed network. One protection against this is to use firewalls to limit access to systems to come from from IP addresses from which requests are accepted.
Azure in particular provides controls on network interfaces to limit the ports and services which are available over the network. Default options (and the safest option) usuall denies access unless it is specifically permitted.
Authentication & Authorization¶
Single Sign On (SSO)¶
Understanding of how to deal with passwords has improved over the years, but it is still very easy to make a mistake. On the other hand, as a technically trained person it’s possible that it’s something you will one day be asked to organize (or manage). Current best practice is at or above the following protocols:
Use HTTPS for your initial communication.
When a user picks a password, add a “salt” to it, and then apply a cryptographic hashing algorithm.
Store the salt & hashed password along with your immutable user key (not necessarily username) as your password database. Forget the clear text password as soon as possible.
When user logs in (sending the clear text password) apply the same algorithm as in step 2 and then compare the results.
Regardless, secure your database and only grant access on a need to know basis.
In terms of password strength
All this is complicated, both for you and the user, and it would often be easier to make it someone else’s problem. Single Sign On (SSO) makes this possible by redirecting authentication requests to a single large provider, who then responds with short lived “tokens” which assert the user’s identity to the third party website. The full path of communication is shown in the image below.
There are many providers of SSO services, including famous names such as Google, Facebook, Twitter & Weibo. Many of these use a common framework called Open Authentication version 2 (also known as OAuth2).
A variety of SSO helper packages exist for Python. For Azure & Microsoft Active directory, the relevant package is called msal
. An example use case, leveraging another package called flask-login
looks something like the following:
login.py:
import os
import secrets
import msal
from flask import Flask, request, flash, redirect,\
url_for, render_template, session
from flask_login import LoginManager, current_user, UserMixin,\
login_user, logout_user, login_required
app = Flask(__name__)
__all__ = ['login', 'logout']
login_manager = LoginManager()
login_manager.login_view = 'login'
login_manager.init_app(app)
client_id = os.environ.get('CLIENT_ID', None)
client_secret = os.environ.get('CLIENT_SECRET', None)
tenant_id = os.environ.get('TENANT_ID', None)
csrf_token = secrets.token_urlsafe()
authority = f'https://login.microsoftonline.com/{tenant_id}'
aad = msal.ConfidentialClientApplication(client_id,
client_secret,
authority)
class User(UserMixin):
def __init__(self, user_id):
global aad
self.id = user_id
print('account', aad.token_cache._cache)
@property
def username(self):
return self.id.split('@')[0]
@property
def is_authenticated(self):
global aad
account = aad.get_accounts(self.id)
print('is_authenticated', account)
if account:
return 'access_token' in aad.acquire_token_silent([], account[0])
return False
@login_manager.user_loader
def load_user(user_id):
print(user_id)
return User(user_id)
@app.route('/login')
def login():
if current_user.is_authenticated:
return redirect(url_for('index'))
code = request.args.get('code')
if code:
if request.args.get('state') != csrf_token:
flash('CSRF error!')
return(url_for('login'))
response = aad.acquire_token_by_authorization_code(code,
[])
if response and 'access_token' in response:
user = User(response['id_token_claims']['preferred_username'])
login_user(user)
flash('Logged in successfully via AAD.')
return redirect(url_for('index'))
return redirect(aad.get_authorization_request_url([], state=csrf_token))
@app.route('/logout')
def logout():
global aad
account = aad.get_accounts(current_user.get_id())
if account:
aad.remove_account(account[0])
logout_user()
ms_uri = 'https://login.microsoftonline.com/common/oauth2/v2.0/logout'
site = 'https://localhost:5050'
return redirect(ms_uri+f'?post_logout_redirect_uri={site}'+url_for('index'))
To use this pattern we must create an application secret inside the Active directory blade in the Azure portal, as well as looking up the relevant Tenant ID (the hash which identifies which user directory we are going to be using). These are read from local environment using the os.environ
object. This is a very common pattern to use for secret data which should never be stored inside code repositories.
Multifactor Authentication (MFA)¶
Currently the gold standard for authentication involves 2 factor authentication (or more). Under this philosophy, a user needs to present at least 2 responses from two different categories out of:
Something you know (e.g. a password)
Something you have (e.g. your phone)
Something you are (e.g. your fingerprint).
The idea is that a bad actor needs to steal several things from you in order to obtain unauthorised access. The most common implementation on the web uses a passcode system sent via text message. On cost and convenience grounds it is frequently only used when additional security is required (for dangerous behaviour or when permanently modifying profiles).
The GDPR and other legal requirements¶
The UK and the EU countries have all passed similar data protection law, normally called the General Data Protection Regulation (GDPR) which protects the personal data of those living in the European Economic Area. The law allows individuals to access and correct their identifiable personal data when stored in easily searchable forms such as on computer. It also places constraints on the forms this information should be stored in, whether they can transfer outside the EEA and the rules over who can access them.
Also computational science is less affected than, for example, medicine, it is still possible that they (or their successor regulations) will one day apply to you. Although the core Articles are relatively complex they boil down to the idea that identifable personal information (individual records linked names, addresses, phone numbers etc, or to personal descriptions) should only be kept for as long as strictly necessary, and only be accessible by those that need to access it for the reason it was originally collected.
Individuals have the right to request a copy of the records held on them by companies (or other bodies engaged in “economic activity”), and to correct any wrong information which is being stored.
Additional Services¶
Azure Functions¶
Azure Functions is a service which allows a Python function to be accessed directly from the web via parameters passed through a URL. An example will be shown in the lecture.
Data¶
Azure has several systems available to store data, depending on its format. This might be unstructured binary data, structured databases or something in between
Blob Storage¶
To quote Microsoft, blob storage is designed to hold:
Serving images or documents directly to a browser.
Storing files for distributed access.
Streaming video and audio.
Writing to log files.
Storing data for backup and restore, disaster recovery, and archiving.
Storing data for analysis by an on-premises or Azure-hosted service.
The data is accessed via a network interface, with charges depending on how frequent access is expected to be and the volume of data transferred. In general a URL is assigned to each item, which can be used in multiple ways, including those listed above, to access the blob object.
SQL¶
Azure provides a number of ways to access data in databases. Most of them are built around the SQL database language. SQL, which dates back to 1974, follows a hierarchical approach, with a database server holding databases, each of which can hold multiple tables holding records each of which has multiple values in multiple columns. A useful mental reference is to multiple spreadsheet files (e.g. Excel) each containing multiple sheets with rows with data in multiple columns. However as so often with scriptable text interfaces, access is more powerful, although difficult for newcomers.
Python comes with inbuilt support for SQL in SQLite format, in which individual databases are stored in local files, via the builtin package sqlite3
. To use a full fat SQL server on Azure appropriate additional software should be downloaded e.g the MySql connector. However the basic syntax to connect to, read and update individual databases remains similar.
import sqlite3
#Connect to/create db file
conn = sqlite3.connect('my_db.sqlite')
cur = conn.cursor()
try:
cur.execute("CREATE TABLE fruit(id INTEGER PRIMARY KEY AUTOINCREMENT, name VARCHAR(50), price INTEGER)")
print("Table created")
except sqlite3.OperationalError:
print("Table exists")
# Write some data
cur.execute("INSERT INTO fruit (name, price) VALUES (?,?);", ("apple", 300))
# Read some data
cur.execute("SELECT * FROM fruit;")
rows = cur.fetchall()
for row in rows:
print(row)
cur.execute("SELECT price FROM fruit WHERE id=?;", "1")
row = cur.fetchone()
print('Price:', row)
conn.commit()
cur.close()
conn.close()
For complicated interactions, packages such as Pandas or SQLAlchemy which wrap together Python types to SQL more closely may be more useful.
Summary¶
You should now:
Understand the basic concepts of HTTP communication and RESTful APIs.
Be able to code a simple app in Flask.
Be able to serve that app from Azure.
Further Reading¶
The Flask documentation.
The Flask Login documentation.
The
msal
documentation.More information on SQL and Python.