Alex's Notes

CM3035 Topic 07: Working with External APIs

Main Info

Description

API integrations and OpenAPI

Key Reading

Other Reading

Lecture Summaries

7.1 Background

An API, or Application Programming Interface is an interface between two running processes, defining how two components of some larger system can communicate with one another.

Typically we have a calling process, and a process that will execute and return a response to the call.

This provides a way of abstracting the implementation of the component - it’s not required for either component to understand anything about each other’s implementation.

An API is conceptually made up of two components - its definition (what methods are available, what the input messages must look like, what the return messages will look like), and its implementation (the code that actually implements the interface, which could be any language/platform).

APIs have one of three access modes:

  • Private - internal to some organisation or system

  • Partner - available only to trusted partners

  • Public - available to all

We might make certain operations public (like read) and others private (like delete).

APIs are found throughout computer science:

OO class specifications are APIs - defining public, private, friend methods

OS/Kernel interfaces are APIs, apps running on the computer can request operations from the kernel.

Remote APIs, like RPCs

Web APIs.

7.104 History of data APIs

Prehistory: Since computers have existed there has been a desire to connect them. The ARPANET was up and running by 1969.

The forerunners to web APIs were:

  • Remote Procedure Calls (RPC): request a remote computer to run a procedure. Practical versions by 1982.

  • Common Object Request Broker Architecture (CORBA): Remote Method Invocation (RMI), available in 1991. Followed the rise of object orientation.

Pre-2000 rise of Web data APIs involved a mishmash of html hacking:

  • Automated HTTP clients

  • spiders

  • form ‘hacking’

  • Common Gateway Inteface (CGI)

  • big download files on FTP

Spiders are processes that can read html pages and follow links. They can gather data that you’re interested in. You can use this as a pseudo-API for data that is only published on html pages.

Form ‘hacking’ is based on the premise that we use html forms to build ‘queries’, the form captures our query and typically returns a new html page with information addressing the query. There’s no need to use the form if we can format the http request that correponds with submitting it.

Common Gateway Interface (CGI) was another common method to offer computational methods to users. CGI is an interface specification for web servers. Lets servers execute code, not just server pages. URLs terminate at programs (like https://www.example.com/script.cgi?input=myinput). Programs will typically then return web pages. This is the forerunner of web app frameworks. Often used to return dynamic web pages before the rise of modern frameworks. They may return data.

Custom APIs, early examples include the US NIH one, would build on this to provide APIs that might return xml data.

Here’s a more recent history of more modern APIs:

MethodDateDescriptionData encoding
———-———–————-
XML-RPC1998Remote Procedure Invocation over HTTPXML
SOAP1999Transport Agnostic (HTTP or TCP or SMTP) messaging protocolXML
REST2000Interface to manipulate web resourcesJSON
JSON-RPC2005Remote Procedure Invocation over HTTPJSON
WebSockets2011HTTP compatible bi-directional communicationAny (JSON common)
gRPC2015Modernised framework for RPCAny

XML-RPC

Update of RPC for the web.

Calls ‘remote procedures’ not just for fetching data.

Business-to-business targeted, industry applications.

HTTP protocol, XML encoding.

SOAP

Update of XML-RPC. Has W3C standardization.

Exchange protocol agnostic, can use HTTP, SMTP, whatever.

Data encoding is XML.

Verbose, not human-friendly, interaction model must be customized.

Quickly gave way to REST.

REST

Most common web service standard today.

Protocol is HTTP.

Focused on data exchange.

Data encoding can be anything, but typically JSON.

Well-defined consistent interaction model.

JSON-RPC

Another adaptation of XML-RPC prompted by the rise of JSON.

Again procedures not just data fetching.

Uses JSON rather than XML.

Lightweight - less complex.

WebSockets

2-way data communication.

HTTP compatible.

Data-encoding agnostic.

gRPC

Modern reworking of RPC.

Scalable, clusters, data centres.

Multi-language support.

7.109 Data Exchange Formats

The two most common data exchange formats are xml and json.

XML is designed to be both human and machine readable. A standard was in place by 1998.

XML allows you to define your own datatypes and elements. Schemas accompany documents to define this data model.

The drawbacks of XML are that it can become complex and verbose very easily (ie loses the human-readable selling point).

Its tree structure is not always appropriate for your data.

There’s not natural mapping from xml tags to variable types in code.

JavaScript Object Notation (JSON) is also intended to be human and machine readable.

It’s aim is to be a lightweight markup approach that serializes JS objects (or equivalent structures in other languages).

Drawbacks of JSON - we lose attributes (can no longer annotate our tags).

Not great for merging data from different systems

no support for arbitrary types.

7.2 Working with APIs

7.202 Consuming APIs in JS

Covers the old method of consuming APIs with the XMLHttpRequest class.

We intantiate an object of that class, assign a callback function to the object’s onreadystatechange property, which will be called when the response is received.

Then we open a connection with the open method, passing the HTTP method and the endpoint. Then we send the request.

Nowadays though we’re more likely to use the fetch function. It might look like this:

async function getData() {
  let response = await fetch("http://example.com/");
  let data = await response.json();
  return data;
}

getData().then(
  //do something with data
)

You will also come across the jQuery approach, which looks like this:

$.ajax({
  url: "http://example.com",
  type: "GET",
  success: function(result) { //do something },
  error: function(error) { // do something with error }
});

Finally there’s the websocket approach we saw in the last topic:

let socket = new WebSocket("wss://example.com");

socket.onopen = (e) => socket.send("My name is John");

socket.onmessage = (e) => console.log(e);

socket.onclose = (e) => {
    if (e.wasClean) {
	//cleanly closed
    } else {
	//do something when dropped
    }

socket.onerror = (error) => console.log(error);

7.203 Accessing APIs on the command line

Introduces curl and wget

For wget we just say wget <url> and it fetches the resource.

You can use wget in recursive mode with the -r control flag. It will then follow links and retrieve pages at the end of them. We can specify the depth of the spidering. The result will be a directory with pages that we’ve gathered.

We can also get data using this method.

We can specify the output file with the -O flag.

Curl is very similar, it returns the response to standard output, we can use the > flag to send the standard output to a file.

In curl we don’t add parameters to the end of the url we have the -d flag that sets the parameters which we can follow with eg "db=taxonomy&id=9606 for our url parameters.

7.205 Script for API interaction

writes a script for interacting with a statistical service api and short-polling to get the results.

7.207 Taverna

Workflow management systems are an alternative way to work with APIs.

They are designed to execute ‘workflows’, a series of data transformation steps. Repeatable and stable.

In the context of web APIs these can connect together remote data services/APIs.

We can start with a piece of data, send it to one API, get the results, send those results to another API, get those results, and so on.

Taverna is one example of this.

It’s a client-side WMS, emerging from computational biology to become an Apache Foundation project.

you can use it on the command line or GUIs. Seems to be retired though, with no recent releases.

Alternatives include Apache ODE, Camunda BPM, ProActive, Common Workflow Language (CWL).

7.3 OpenAPI

7.301 Intro

The OpenAPI spec was originally known as the Swagger specification. It began development in 2010, with the desire to standardise rest interfaces so that they could be generated and handled more automatically.

It was standardised and adopted by the Linux foundation.

REST APIs used to be ad hoc, many still are. OpenAPI provides a specification format for machine-readable interfaces, that standardises creating, describing, and consuming APIs.

Benefits include that it’s easier to collaborate on design. Autogen code reduces errors, we can autogen documentation. Easy to publish standard specs for everyone to understand.

7.305 The Spec

The aims of the spec are that descriptions of APIs should be standardised, language-agnostic, human and computer readable, and enable users to discover service capabilities without reading code.

An OpenAPI document is itself just a JSON (or yaml) document, the JSON types are mapped to API types.

You can nest JSON documents - refer to other documents within your document.

There are certain required header fields:

FieldJSON TypeDescription
—-——————–
openapistringversion of the spec
infonested info objectAPI metadata
pathsnested paths objectAvailable paths/URIs for the API. Contains path item objects

The info object include title, version, contact info etc.

The server object includes info on servers: it has a servers key, the value of which is an array of server endpoints.

Introduces the paths object that contains the methods, and then the responses for each method.

7.308 Schema Generation

We can generate schemas in django using the python manage.py generateschema --format openapi-json command.

Maybe we want to generate it automatically, so we don’t have to remember to change it every time.

We can do that with the get_schema_view function in DRF, it generates it on the fly.

For more on that see the DRF docs

7.310 Documentation Generation

Now we can generate documentation too. Walks through the automated swagger documentation from the project website.

For more see the DRF docs

7.4 OpenAPI Clients

Moves to the client side of using OpenAPI.

We can use a tool called swagger-codegen (available via apt) to programmatically interact with a schema.

We give it a schema document, and specify a programming language, and it will generate a client library in that language for interacting with the API.

EG swagger-codegen generate -i <schema doc> -l python -o <client output path>

Then walks through the process of using the client library from another script.