Alex's Notes

CM3035 Topic 04: Build a CRUD and RESTful API

Main Info

Description

CRUD API. Server side aspects including testing.

Not covered in the course is specifying and desiging the API through OpenAPI. This is noted in OpenAPI (and Swagger)

Key Reading

Other Reading

Lecture Summaries

4.1 CRUD and REST

CRUD (Create, Read, Update, Delete) is the basic operations a persistent storage media must support. They underpin the behaviour of nearly every software - from address books, to games, to word processing, to social media.

Here’s the mapping in the lecture adapted from the linked Wikipedia entry:

CRUDSQLREST (formal)REST (typical)
—-—-
CreateINSERTPUTPOST/PUT
ReadSELECTGETGET
UpdateUPDATE/INSERTPATCHPATCH/POST/PUT
DeleteDELETE/DROPDELETEDELETE

These aren’t specific to databases, file systems and other storage methods apply too. But they are at the heart of modern web service design.

4.105 REST

REpresentational State Transfer (REST), is a software and server design pattern for creating web services. Services adhering to the pattern are called RESTful services.

It’s a product of Roy Fielding’s thesis published in 2000 (available here).

As we moved from the 90s to the 00s the web was used beyond serving html, to serve arbitrary files, or execute code on request. This led to the development of HTTP to support a myriad of file types.

as servers became generic, we needed a more sophisticated way of thinking about server architecture. This is the problem addressed by REST.

There are six features of REST that if followed present a logically consistent interface to clients and user applications. Clients should be able to navigate the application through the use of URIs and http operations. With each successful request, the client receives a response indicating the state of the resource, and can continue to apply operations.

On the server side the features are:

  1. Client-server architecture - the separation of the UI from data storage concerns. This increases portability of the UI and scalability by simplifying the server task. Client and server components can develop independently.

  2. Statelessness - the server stores no information about the client request, and the client request should not depend on information that the server knows about the client request history. “Session state is… kept entirely on the client” (Fielding).

    The architecture has trade-offs - on the positive side it improves visibility (because you only need to see a single request to know the full state of the request), reliability (easy recovery from failure) and scalability (servers can quickly free resources).

    On the negative side, it increases repetitive data sent in a series of requests as it can’t be left on the server. It reduces control on the server side, as it’s depending on state management in the client.

  3. Cacheable - clients and intermediaries need to know when resources can be cached. A server must explicitly or implicitly label whether a resource is cacheable, and if yes the client has the right to rely on that cache.

  4. Layered System - eg if the server uses CDNs, load balancers etc they must be invisible. The server implementation details should not be visible to the client. It should appear as a single destination.

  5. Uniform Interface - The most complex requirement, a uniform interface regardless of media type, and links to enable the client to navigate the application without knowing its structure. “Hypermedia as the engine of application state”

  6. Code-on-demand (optional) - Should be possible for the server to retain blocks of code for the client that the client can run. This means the client code does not have to be fully set at shipping, and can be updated on the fly.

Applications that use these patterns are RESTful, and their APIs are RESTful APIs.

There are three common features for clients:

  1. The application location is defined by a base URI

  2. They use only standard HTTP methods (POST, PUT, DELETE, etc)

  3. The API should define the media type that’s being sent to and from eg application/xml, application/json or text/xml, text/html etc.

The data elements from REST are:

Data ElementModern Web Examples
resourcethe intended conceptual target of a hypertext reference
resource identifierURL, URN
representationHTML document, JPEG image
representation metadatamedia type, last-modified time
resource metadatasource link, alternates, vary
control dataif-modified-since, cache-control

Resources and Resource Identifiers (Fielding)

The key abstraction of information in REST is a resource. Any information that can be named can be a resource: a document or image, a temporal service (e.g. “today’s weather in Los Angeles”), a collection of other resources, a non-virtual object (e.g. a person), and so on. In other words, any concept that might be the target of an author’s hypertext reference must fit within the definition of a resource. A resource is a conceptual mapping to a set of entities, not the entity that corresponds to the mapping at any particular point in time.

More precisely, a resource R is a temporally varying membership function MR(t), which for time t maps to a set of entities, or values, which are equivalent. The values in the set may be resource representations and/or resource identifiers. A resource can map to the empty set, which allows references to be made to a concept before any realization of that concept exists – a notion that was foreign to most hypertext systems prior to the Web [61]. Some resources are static in the sense that, when examined at any time after their creation, they always correspond to the same value set. Others have a high degree of variance in their value over time. The only thing that is required to be static for a resource is the semantics of the mapping, since the semantics is what distinguishes one resource from another.

For example, the “authors’ preferred version” of an academic paper is a mapping whose value changes over time, whereas a mapping to “the paper published in the proceedings of conference X” is static. These are two distinct resources, even if they both map to the same value at some point in time. The distinction is necessary so that both resources can be identified and referenced independently. A similar example from software engineering is the separate identification of a version-controlled source code file when referring to the “latest revision”, “revision number 1.2.7”, or “revision included with the Orange release.”

This abstract definition of a resource enables key features of the Web architecture. First, it provides generality by encompassing many sources of information without artificially distinguishing them by type or implementation. Second, it allows late binding of the reference to a representation, enabling content negotiation to take place based on characteristics of the request. Finally, it allows an author to reference the concept rather than some singular representation of that concept, thus removing the need to change all existing links whenever the representation changes (assuming the author used the right identifier).

REST uses a resource identifier to identify the particular resource involved in an interaction between components. REST connectors provide a generic interface for accessing and manipulating the value set of a resource, regardless of how the membership function is defined or the type of software that is handling the request. The naming authority that assigned the resource identifier, making it possible to reference the resource, is responsible for maintaining the semantic validity of the mapping over time (i.e., ensuring that the membership function does not change).

Traditional hypertext systems, which typically operate in a closed or local environment, use unique node or document identifiers that change every time the information changes, relying on link servers to maintain references separately from the content. Since centralized link servers are an anathema to the immense scale and multi-organizational domain requirements of the Web, REST relies instead on the author choosing a resource identifier that best fits the nature of the concept being identified. Naturally, the quality of an identifier is often proportional to the amount of money spent to retain its validity, which leads to broken links as ephemeral (or poorly supported) information moves or disappears over time.

Representations (Fielding)

REST components perform actions on a resource by using a representation to capture the current or intended state of that resource and transferring that representation between components. A representation is a sequence of bytes, plus representation metadata to describe those bytes. Other commonly used but less precise names for a representation include: document, file, and HTTP message entity, instance, or variant.

A representation consists of data, metadata describing the data, and, on occasion, metadata to describe the metadata (usually for the purpose of verifying message integrity). Metadata is in the form of name-value pairs, where the name corresponds to a standard that defines the value’s structure and semantics. Response messages may include both representation metadata and resource metadata: information about the resource that is not specific to the supplied representation.

4.108 Web Clients and REST

We’ve looked mainly at the server side, what about the client? We can see two broad categories of URI that the client might use:

URI TypeExample
Data Requesthttps://api.test.com/books
Operation Requesthttps://api.test.com/calculate%5Fsales

We might have data operations - fetching or updating data on the server. And computational operations - calculate something.

These correspond formally to the http methods:

HTTP MethodIntended Resource Behaviour
GETRetrieve a Resource (data)
PUTSend data and create a resource
POSTRequest an operation
PATCHSend data and update a resource
DELETEDelete a resource

In practice people will often use a reduced set of these, like only GET and POST, or mixing PUT/POST in operations.

Clean and Messy URIs

In the early days of the web (and still) you’d often see long, messy URIs that captured all sorts of state inside it.

Now we try to adhere to the tidy URI standard.

From Wikipedia:

A URL will often comprise a path, script name, and query string. The query string parameters dictate the content to show on the page, and frequently include information opaque or irrelevant to users—such as internal numeric identifiers for values in a database, illegibly encoded data, session IDs, implementation details, and so on. Clean URLs, by contrast, contain only the path of a resource, in a hierarchy that reflects some logical structure that users can easily interpret and manipulate.

Original URLClean URL
http://example.com/about.htmlhttp://example.com/about
http://example.com/user.php?id=1http://example.com/user/1
http://example.com/index.php?page=namehttp://example.com/name
http://example.com/kb/index.php?cat=1&id=23http://example.com/kb/1/23
http://en.wikipedia.org/w/index.php?title=Clean%5FURLhttp://en.wikipedia.org/wiki/Clean%5FURL

4.2 Django Rest Framework

4.205 introduces serializer syntax. You create a class that inherits from one of the base serializer classes. You can either manually specify each class field that corresponds to your model, or declare the Meta class within the serializer one, and then specify its base model, and the fields of the model that you want the user to be able to read and write. The latter looks like this:

class GeneSerializer(serializers.ModelSerializer):
    class Meta:
	model = Gene
	fields = ['gene_id', 'entity'...]

Then you can implement API controllers like this:

@csrf_exempt
def gene_detail(request, pk):
    try:
	gene = Gene.objects.get(pk=pk)
    except Gene.DoesNotExist:
	return HttpResponse(status=404)
    if request.method == 'GET':
	serializer = GeneSerializer(gene)
	return JsonResponse(serializer.data)

@csrf_exempt
def genes_list(request):
    try:
	genes = Gene.objects.all()
    except:
	return HttpResponse(status=502)
    if request.method == 'GET':
	serializer = GeneSerializer(genes, many=True)
	return JsonResponse(serializer.data, safe=False)

You end up writing a lot of boiler plate with these functions. DRF also provides class based views to simplify this.

A basic one would look like this:

class GeneDetail(mixins.CreateModelMixin,
		    mixins.RetrieveModelMixin,
		    mixins.UpdateModelMixin,
		    mixins.DestroyModelMixin,
		    generics.GenericAPIView):

    queryset = Gene.objects.all()
    serializer_class = GeneSerializer

    def post(self, request, *args, **kwargs):
	return self.create(request, *args, **kwargs)

    def get(self, request, *args, **kwargs):
	return self.retrieve(request, *args, **kwargs)

    def put(self, request, *args, **kwargs):
	return self.update(request, *args, **kwargs)

    def delete(self, request, *args, **kwargs):
	return self.destroy(request, *args, **kwargs)

4.3 API Testing

We want to be sure that our API code returns the data we want.

Code testing is a way that we can be sure our code works as intended and meets the specifications.

Code tests are functions that run our code and test its outputs are as expected. It’s good practice to write unit tests, though if you have a large existing code base it’s not usually worth going back and testing everything you can see is working.

We’ll use factory_boy (via pip) to do our testing.

Our test setup will be in two main parts - one managing creating data for the database that will be instantiated for each test run, the other for the test logic.

A fixture is a consistent, repeatable piece of code that can be used across the testing suite. Here our fixtures will be the data that we add to the test db.

We can create them like this:

import factory
from random import choice, randint

  class SequencingFactory(factory.django.DjangoModelFactory):
      sequencing_factory = "Sanger"
      factory_location = "UK"

      class Meta:
	  model = Sequencing

  class GeneFactory(factory.django.DjangoModelFactory):
      gene_id = factory.Sequence(lambda n: f'gene{n}')
      entity = choice(['Plasmid', 'Chromosome'])
      start = randint(1,100000)
      stop = start + randint(1,100000)
      sense = "+"
      start_codon = "M"

      sequencing = factory.SubFactory(SequencingFactory)
      ec = factory.SubFactory(ECFactory)

      class Meta:
	  model = Gene

Check out factory.Faker for random text generation if you need it.

Now we can write some tests. Every function or class we create should be covered by at least one test. Tests should build up to prove the application is working correctly.

There are some helper functions we can use.

The django.urls package has a reverse function that can take the name of a url from the urls file (give them a name!) and then reconstruct the url that would be used by the client to hit that endpoint.

The rest_framework.test package has an APITestCase class we can subclass for our API tests. This gives us:

class GeneTest(APITestCase):

    gene1 = None
    gene2 = None
    good_url = ''
    bad_url = ''
    delete_url = ''

    def setUp(self):
	self.gene1 = GeneFactory.create(pk=1, gene_id="gene1")
	self.gene2 = GeneFactory.create(pk=2, gene_id="gene2")
	self.gene2 = GeneFactory.create(pk=3, gene_id="gene3")

	self.good_url = reverse('gene_api', kwargs={'pk': 1})
	self.bad_url = '/api/gene/H/'
	self.delete_url = reverse('gene_api', kwargs={'pk': 3})


    def tearDown(self):
	EC.objects.all().delete()
	Sequencing.objects.all().delete()
	Gene.objects.all().delete()
	ECFactory.reset_sequence(0)
	SequencingFactory.reset_sequence(0)
	GeneFactory.reset_sequence(0)

    def test_geneDetailReturnSuccess(self):
	response = self.client.get(self.good_url)
	response.render()
	self.assertEqual(response.status_code, 200)

	data = json.loads(response.content)
	self.assertTrue('entity' in data)
	self.assertEqual(data['entity'], 'Plasmid')

    def test_geneDetailReturnFailOnBadPK(self):
	response = self.client.get(self.bad_url)
	self.assertEqual(response.status_code, 404)

    def test_geneDetailDeleteIsSuccessful(self):
	response = self.client.delete(self.delete_url, format='json')
	self.assertEqual(response.status_code, 204)