CM3035 Topic 06: Asynchronous Web Services

Main Info

Title: Asynchronous Web Services
Teachers: Daniel Buchan
Semester Taken: April 2022
Parent Module: cm3035: Advanced Web Development

Description

Celery and Websockets

Key Reading

Lecture Summaries

6.1 Synchronous and Asynchronous Web Services

In software, synchronous design is a design pattern where all components are in sync with one another.

The ‘message’ sender waits for a response before continuing. In the context of synchronous servers the client sends an http request the web server routes it to the web app, the web app creates a response, passes it to the server and then the web server sends the response to the client.

Asynchronous design is based on non-blocking code execution. Here the client and server execute code in parallel. There are different approaches to achieving this.

One common pattern is use of non-blocking IO for the client-side javascript. We use callbacks for web requests, the server design here remains as in the synchronous model, the client side code executes when the response arrives.

Another approach is server-side asynchrony. Here the server might need to respond to user requests, while running long-running computations. It might offload computation to another system if it takes longer than, say, a second.

Python celery is one such parallel execution system.

A third approach is websockets - here the client and server open a permanent connection.

6.104 Task Queues

When we make web servers, we might have the issue that some computations are long - compressing audio, rescaling images, sending emails etc.

We want the server to be responsive to requests while these computations are running.

One solution for this is the use of task queues - here we offload computation to a parallel system. A series of ‘workers’ monitor a queue for units of work to complete.

Typical queues/brokers include the Rabbit MQ system, an open source message broker. Or Redis, an in memory key-value datastore which is often used as a cache or message broker.

There are queue frameworks like Celery, Dramatiq, Sidekiq, RQ, Delayed_job, RabbitMQ, AnyMQ.

The benefits are that the computations can be moved to other servers. Now we can separate concerns, we can have separate servers for our web server, our queue broker, and our workers.

We’ll be looking at Celery. It’s a task queue system, clients submit units of work to a queue broker. Workers watch the queue broker and complete work as it appears. Multiple sets of brokers or workers are supported. Implementations exist in Python, Node, and PHP. Good integration with Django.

6.111 Celery Intro

We get celery up and running by installing dependencies and then writing some basic setup:

from celery import Celery

app = Celery('simple_example', broker='redis://localhost/', backend='redis://localhost/')

@app.task
def my_func():
    return 'hello world'

Then you can use these workers by importing that module and saying:

result = my_func.delay() # returns the Async result object.
result.state # returns the state of the result.
result.get() # returns the actual result.

The remaining lectures in week 11 walk through an example of setting up an image processing server using celery and pillow.

6.3 Asynchronous Servers

6.301 Websockets

The first version of websockets appeared in 2008. They addressed a need to standardise a looser set of emerging practice in real-time web communication. They supplanted the ‘comet’ web app model, which hacked together a set of prior technologies in a difficult way.

Comet was based on the XMLHttpRequest AJAX model.

We’d use callbacks like this:

const req = new XMLHttpRequest();
req.onreadystatechange = function(){};
req.open("GET", "/api/example");
req.send();

But what if the data on the server is changing regularly? How do I update a website when I know the data will change but it’s a long running calculation and I don’t know when?

We can use short polling:

setInterval(function() {
  const req = newXMLHttpRequest();
  req.onreadystatechange = function(){};
  req.open("GET", "/api/example");
  req.send();
  }, 5000);

There are problems with this. How often should I poll? What if a lot of users poll the server? Additional server load, potentially 100s per page.

Alternative is long polling here we open the request, the server does not respond until the data is ready. The client immediately sends a new request. This is core to the ‘comet’ model.

This reduces the server hits per page, emulating a ‘server push’ client-side.

Another approach is http streaming. Here we open the request, the server responds before the complete request is received, the client does not terminate the request.

So the client sends the header and enough of the body for the server to start work. The client does not send the terminating EOF part of the request. The client can send further data, or wait for a response from the server. This hacks the traditional request-response model to provide ongoing, 2-way communication for as long as the client needs it.

Eventually the client sends the EOF part of the body and the connection is closed.

These are all clever but hacky, they repurpose tools that aren’t designed for the purpose.

For long polling, requests were intended to be fulfilled on arrival, not deferred indefinitely.

For streaming, it’s hard to manage connections as there’s no standard here or reliable way to open/close them.

Both of these require a lot of bespoke code.

Websockets were a response to the issues with these methods, intended to remove the unreliable methods, and a dedicated design pattern for long-running connections. It is ‘compatible’ with HTTP, rather than a hack of it.

6.305 Websocket Protocol over TCP

The protocol developed over initial conversations in the W3C mailing list, it was released in 2008 and browser support started by 2010.

It aims to provide a ‘thin’ transport layer or protocol over TCP/IP.

It’s an attempt to provide pure TCP data exchange within a web app.

It integrates with HTTP.

It’s a very lightweight protocol:

Establishing a connection
No specification for the data type and messaging
data encapsulation by agreeing a ‘subprotocol’ (JSON, XML etc).

It’s designed to be flexible, specifying as little as possible to enable diverse applications to be built on it.

A websocket connection starts with a handshake, a standard HTTP GET request.

GET /index.html HTTP/1.1
Host: www.example.com
Connection: Upgrade
Upgrade: websocket
Sec-WebSocket-Key: ajiojj2oi12l1kj1l121321lj
Sec-WebSocket-Protocol: json

The header includes a request to upgrade the connection, and to upgrade the connection type to websocket.

If the server can fulfil the request, it responds as follows:

HTTP/1.1 101 Switching Protocols
Date: Wed, 03 Oct 2022 10:05:34 GMT
Connection: Upgrade
Upgrade: WebSocket
Sec-WebSocket-Accept: asdazoioup2q2424poiu21240008u7a

It sends back a socket protocol handshake to confirm that the upgraded connection is created. client and server can now exchange data bidirectionally over TCP until one or other terminates it.

On the clientside, we can establish the connection fairly trivially with the WebSocket class and event listeners:

var socket = new WebSocket("wss://www.example.com/", "json");
socket.send("some example data");

socket.onmessage = function(event) {
  console.log(event.data)
  };

We just specify the server address and the subprotocol. If the request is successful we can then send arbitrary data and use the socket’s onmessage function to handle incoming data.

Messaging

Websockets don’t stream data over TCP, instead packets are arranged into messages called frames.

Frames have lightweight headers, like a TCP IP header, and a payload.

Very large data messages are going to be split over multiple frames.

Each frame has an opcode that indicates its type, for instance whether it’s the start or end of a message.

The data is obfuscated using XOR 32-bit encryption.

Opcodes include ping and pong which can be used to check that the other side is still listening. One side sends the ping and expects to hear the pong.

Closing the connection involves sending a frame with the close connection opcode.

Issues

This is not a pure TCP streaming protocol, though streaming extensions are available.

There is no built in method of message acknowledgement.

Websocket connections can persist indefinitely and clients can open any number. Like short polling can overwhelm a server, too many websocket requests can cause a server to run out of resources.

6.307: WebSockets in Django: Channels

The Django channels package provides the Django implementation of the WebSockets protocol.

It replaces the core of Django routing, views, and serving internals.

A channel becomes a subscribable stream of data, that various user clients can connect to and read data from.

HTTP request handling is implemented as a pre-made channel.

Channels are divided into two components - scopes and events:

Scopes are teh details that define the connection and a request to join a channel.
Events are the stream of user interactions for the channel that get sent to every attached client.

So if we imagine a putative chat channel the flow would be:

the user would request to start a chat
the chat scope on that channel will be opened up
The user or users can send one or more messages on that channel
The server pushes each message to all users attached to the channel in that scope, as received
After an unspecified time, a user closes their connection
The chat scope is closed

The HTTP channel works slightly differently, opening temporary scopes:

The user sends an HTTP request
The HTTP scope is established
http.request event is constructed for the application (handled much like standard django)
http.response constructed and sent to the user
The HTTP scope is closed.

A Django channel will have ‘consumers’, a process that consumes events. They are analogous to Django views.

Channel requests are ‘routed’ to the appropriate consumer.

Once a client is connected to a consumer it receives all events on that channel.

Channels support http, WebSockets, and other protocols.

Implementation

The remaining lectures walk through implementing a chat application using Django channels.

6.405 covers configuration.

Setting up the project structure, he creates a consumers.py file for the channel consumers, a routing.py file, like a urls.py for sockets.

He creates two templates a homepage and a chatroom page.

In the settings along with the web application (WSGI_APPLICATION) he adds an async application (ASGI_APPLICATION) which is set to application_name.routing.application

The routing is the mechanism for how the channels package knows how to route different protocols to different parts of the application.

6.406 covers implementing the chat server website.

Starts by implementing the basic page views - the home page, and the chat room page.

Builds the client-side functionality for opening the connection and sending and receiving messages.

6.407 covers implementing the server side socket handliing.

Sets up the websocket router on the core app, then the actual chat app routing, which looks like a urls file.

The ‘view’ that it routes to is in the consumers.py.

The consumers have similar generic class-based views as the web based views.

So you can use from channels.generic.websocket import WebSocketConsumer

Then override the methods you need:

class ChatConsumer(WebSocketConsumer):
  def connect(self):
    # could authenticate, log the connection etc
    self.accept() # accepts the connection

  def disconnect(self, close_code):
    #could log session etc
    pass

  def receive(self, text_data):
    json_data = json.loads(text_data)
    message = json_data['message']

    self.send(text_data=json.dumps({'message': message}))

6.408 Handles making it async and multi-user

We’re going to change the architecture so that when a user posts a message, we echo it to Redis as a queue of messages. every client that is subscribed to a chatroom will pick the messages from the queue and the message will be sent to the front end.

We do this by using channel layers. In our project settings we can do this:


CHANNEL_LAYERS = {
  'default': {
    'BACKEND': 'channels_redis.core.RedisChannelLayer',
    'CONFIG': {
      'hosts': [(127.0.0.1, 6379)]
    }
  }
}

Now we have to update the websocket consumer, so it works asynchronously and subscribes to a channel of information on the Redis queue:


class ChatConsumer(AsyncWebscoketConsumer):
  async def connect(self):
    self.room_name = self.scope['url_route']['kwargs']['room_name']

    self.room_group_name = f'chat_{self.room_name}'

    await self.channel_layer.group_add(
      self.room_group_name,
      self.channel_name
    )

    await self.accept()

  async def disconnect(self, close_code):
    await self.channel_layer.group_discard(
      self.room_group_name,
      self.channel_name
    )

  async def receive(self, text_data):
    json_data = json.loads(text_data)
    message = json_data['message']

    await self.channel_layer.group_send(
      self.room_group_name,
      {
	'type': 'chat_message',
	'message': message
      }
    )

  async def chat_message(self, event):
    message = event['message']

    await self.send(text_data=json.dumps({
      'message': message
    })

Alex's Notes