CM3010 Topic 04: Using Databases: Libraries and Templating

Main Info

Title: Using Databases: Libraries and Templating
Teachers: David Lewis
Semester Taken: April 2022
Parent Module: cm3010: Databases and Advanced Data Techniques

Description

In this topic, you will be introduced to ways of connecting to a database from within programming languages.

Key Reading

Lab Summaries

4.005 is a SQL exercise with a 3 table movie database. It practices joins, aggregate functions, and data cleaning.

4.103 involves modifying an Express app sitting on an astronomy database to present different views (using mustache templates).

4.203 involved modifying a form which sends a POST request to the server for updating dates of birth and gender in the Actors table.

4.305 has you connect to MySQL with python and plot some data in matplotlib. Matplotlib didn’t install for me, not bothering to debug.

Lecture Summaries

4.002: The Core Vocabulary

A review of the core sql commands, in a mysql flavour:

SELECT

Gets information in the form of a table from one or more tables. eg:

SELECT *
FROM Planets
WHERE diameter > 5;

You can do approximate matching with the LIKE keword, eg SELECT ... FROM ... WHERE name LIKE '%er';

UPDATE

For changing data that exists in the table (change an existing row).

eg:

UPDATE Planets
SET name='Mars'
WHERE name='Mers';

INSERT

For adding new rows.

INSERT INTO Planets
  (name, diameter)
VALUES
  ('Athshe', 8210),
  ('Gethen', 9483);

columns are optional, if missing it assumes we’re specifying all columns.

We can add as many new rows as we want by separating the bracketed rows (tuples) by commas.

We can insert the results of a query into a table like this:

INSERT INTO Planets
    (name)
  SELECT DISTINCT name
  FROM Moons;

Useful when normalizing tables.

DELETE

Deletes from a table. So immediately takes the table as input.

DELETE FROM Planets
WHERE diameter < 2000;

There are also a set of commands about changing the structure of the database itself:

DROP

eg DROP TABLE Planets;

Deletes the entire table

This is distinct from:

TRUNCATE

eg TRUNCATE TABLE Planets;

This removes all entries from the table but leaves the structure intact.

How is this different from DELETE FROM Planets WHERE *;? Delete triggers cascades, but truncate doesn’t trigger that logic, it’s just a straight removal.

CREATE

This has a lot of different forms, used to create db, create table etc.

CREATE TABLE Planets (
  PlanetName CHAR(8),
  DayLength INT,
  YearLength INT,
  PRIMARY KEY
    (PlanetName)
);

ALTER

Finally ALTER is used to modify tables

ALTER TABLE Planets
ADD COLUMN
  Diameter INT;

Getting Data In and Out

To export data as a text file or csv file you can do this:

SELECT *
FROM Planets
INTO OUTFILE 'Planets.txt';

To import data from a file into a table you can do this:

LOAD DATA INFILE 'Planets.txt'
INTO TABLE Planets;

4.004 MySQL

Before jumping into programming language interfaces, shows direct interface with the MySQL REPL.

Then turns to Aggregate Functions, our means to group and analyse data.

Uses the example of a shopping trip where you buy a number of items for different people. You might start with a table like this:

Item	BoughtFor	Price
—	—	—-
Milk	Mum	1.00

You then want to sum the price for each person. You can do this in SQL first by grouping the rows by ‘BoughtFor’ and then applying a sum function, like this:

SELECT BoughtFor,
       SUM(Price)
FROM Shopping
GROUP BY BoughtFor;

We have to tell it how to summarise the grouped data where there will be multiple values.

There are other aggregate functions like AVG STD and VARIANCE.

There are also MAX and MIN - which can be used on dates (chronological value) and other data types.

COUNT just counts the number of items that we’ve aggregated.

COUNT(DISTINCT <Column>) note the brackets. This just removes duplicates from the count.

GROUP_CONCAT works on strings, it takes all the strings that have been grouped and concatenates them, with an optional separator.

4.007 Connecting to a DBMS

Typically we don’t interact with a command line REPL, we connect through libraries in programming languages.

The libraries all do much the same thing:

Create a (persistent) connection
Send commands
Receive (and structure) response

Creating the Connection

We’ll do one of two things. Either we’ll define a connection object and then call a connect method, something like:

conn <- newConnection(host, username, password, database)
conn.connect()

or we do that in one go like this:

conn <- connect(host, username, password, database)

Sending commands

This is almost always string based, you make a query string and then call an execute or query method on the connection. Something like:

resource <- conn.execute(query)
resource.fetchData()

result <- conn.query(query)

Either you’ll get results or some kind of resource object where you need to run fetch me the data. The library has a choice, either to grab the data in one go, or retrieve it a bit of a time and let the server buffer it. Generally for the server it’s easier to get rid of the data in one go. Often the libraries give a choice.

Receive Response

Generally the response will be an iterable of some kind. Each row of data could be an object where we interact via column names as object keys or arrays where we interact numerically. The server always provides the column names.

4.1 Connecting for Web Page Serving (Node and PHP)

4.101 Using Node and Mustache

Why use JS? Most likely we’re building a web app with a db backend. The video shows using Express and Mustache to create web pages from a database.

Walks through express basics (recapping DNW module). Usually your port 80/443 will be an Apache or NGINX server, and your app will listen on another port. It will be up to NGINX or whatever to direct relevant traffic to the app.

Runs through Mustache. Then the mysql node package:

const mysql = require('mysql');

const db = mysql.createConnection({
    host: "???",
    user: "???",
    password: "???",
    database: "???"
});

db.connect();

const queryString = "SELECT....";

function resCallback(err, res) {
    //result callback
}

db.query(queryString, resCallback);

Note that the declaration of the connection won’t actually connect.

The query involves a callback, that will run when the result is returned. The callback needs an error, results, and the fieldnames as arguments.

4.104 PHP

Library shown is called mysqli

Start a connection by initializing the object, if a connection already exists it uses that connection, otherwise it starts a new one.

$db = new mysqli(host, user, pwd, db);

Executing the command is just a matter of passing the query to the method and getting the results back:

$result = $db->query("SELECT * FROM Planets");

Returns a mysqli_result object.

We can treat that as an iterable and get objects:

foreach ($result as $moon){
  echo $moon['radius'];
}

//or get all of them as a plain nested array

$moons = $result->fetch_all();
echo $moons[5][2]

//or get all of them as an array of objects
$moons = $result->fetch_all(MYSQLI_ASSOC);
echo $moons[5]['radius']

4.2 Read/Write Applications

So far we’ve just been using libraries to read from the db, but other types of query work much the same.

There are security considerations of course. And we might need to keep track of IDs.

SQL injection is possible for any user or HTTP-supplied data.

We can control user privileges, and escape user input.

Libraries have syntax to help, eg here is an example from Node:

const addActor = `
  INSERT into Actors
  Values
    (?, ?, ?);
`;

connect.query(addActor,
  [actor.name, actor.gender, actor.birthDate]);

If the query string contains question mark characters, the next argument to the query command can be an array of values that will then be escaped and inserted into the command in the question mark locations.

On the DBMS side we can use stored procedures.

To do this we first have to tell the DBMS that our delimiter will not be the usual semi-colon (since we’ll have one of those in our procedure). Then we can create the procedure:

delimiter //
CREATE PROCEDURE addActor
  (IN name, gender, dob)
BEGIN
  INSERT INTO Actors
    VALUES (name, gender, dob);
END //
delimiter ;

We define the procedure’s inputs (here name, gender and dob), then we define the procedure’s steps within the BEGIN END block.

Note we set the delimiter back to the semi-colon after we’re done.

Then from the library side we can call a procedure as follows:

connect.query(`
  CALL addActor(
    "Richard Gere", "male", "1949-08-31");
`);

What’s good about this? Now we can just give the web user the privilege to execute this procedure and nothing else like this:

GRANT EXECUTE ON addActor TO webUser;

Finally we need to bear in mind IDs, if we insert an item in the db we’ll typically want to know its id. INSERT commands result in a new ID being generated. There’s a command SELECT last_insert_id() that will return the most recent id.

There are library specific ways of getting the id of an item you’ve inserted eg in Node you can pass a callback and the response object will have an insertId property like this:

conn.query(insertCommand,
  (err, res, cols) => res.insertId);

4.3 Python

JS and PHP are popular web app server technologies, so why Python? Python’s very popular among data scientists, so it’s common to want to get data into your Python environment.

Python’s used for graphing and data science, so heavy data requirements. And web scraping.

Python’s sql library is similar to those we’ve seen, connect as follows conn = connect(host, user, pw)

Then we take the connection and get a cursor as follows: cursor = conn.cursor(). The cursor is the object that will make the queries.

So we execute a query on the cursor like this cursor.execute(query).

The result will then be available on the cursor object directly. The cursor is an iterable so you can do this:

for row in cursor:
  print(row)
  print(row[0])

Each row is subscriptable.

We can also fetch all the results like this: result = cursor.fetchall() if we want the whole results without iteration.

If we want the rows as dictionaries rather than lists we can specify as follows: cursor = conn.cursor(dictionary="true") Then we can access the column names on the row dictionary:

for row in cursor:
  print(row['diameter'])

Note that auto-commit is off by default. So everything you do is regarded as being part of a transaction. You can call cursor.execute multiple times and it will be part of the same transaction.

The transaction is only committed by calling cursor.commit()

Alex's Notes