Tag Archives: SPDY

The Real-time Web in REST Services at IMVU

By Jon Watte, VP Technology @ IMVU

IMVU has built a rich, graph-shaped REST (REpresentational State Transfer) API (Application Programming Interface) to our data. This data includes a full social network, as well as e-commerce, virtual currencies, and the biggest 3D user generated content catalog in the world. This post discusses how IMVU addresses two of the bigger draw-backs of REST-based service architectures for real-time interactive content: Cache Invalidation (where users want to know about new data as soon as it becomes available,) and Request Chattiness (where request latency kills your performance.)

Cache Invalidation

REST principles like cacheability and hypertext-based documents work great for exposing data to a variety of clients (desktop, web, and mobile,) but runs into trouble when it meets the expectation of real-time interaction. For example, when a user changes their Motto in their profile, they would like for the world to see the new Motto right away — yet, much of the scalability wins of REST principles rely on caching, and the web does not have a good invalidation model. Here is an illustration of the problem:

At 10:03 am, Bob logs in and the client application fetches the profile information about his friend Alice. This data potentially gets cached at several different layers:

  • in application caches at the server end, such as Varnish or Squid
  • in content delivery network caches at the network edge, such as Akamai or Cloudfront
  • in the user’s own browser (every web browser has a local cache)

Let’s say that our service marks the data as cacheable for one hour.

At 10:04 am, Alice updates her Motto to say “there’s no business like show business.”

At 10:05 am, Alice sends a message to Bob asking “how do you like my motto?”

At 10:06 am, Bob looks at Alice’s profile, but because he has a stale version in cache, he sees the old version. Confusion (and, if this were a TV show, hilarity) ensues.

HTTP provides two options to solve this problem. One is to not do caching at all, which certainly “solves” the problem, but then also removes all benefits of a caching architecture. The other is to add the “must-validate” option to the cache control headers of the delivered data. This tells any client that, while it might want to store the data locally, it has to check back with the server to see whether the data has changed or not before re-using it. In the case that data has not changed, this saves on bytes transferred — the data doesn’t need to be sent twice — but it still requires the client to make a request to the server before presenting the data.

In modern web architectures, while saving throughput is nice, the real application performance killer is latency, which means that even the “zero bytes” response of checking in with the server introduces an unacceptable cost in end-user responsiveness. Cache validation and/or E-tags might sound like a big win, but for a piece of data like a “motto,” the overhead in HTTP headers (several kilobytes) dwarfs the savings of 30 bytes of payload — the client might as well just re-get the resource for approximately the same cost.

Another option that’s used in some public APIs is to version all the data, and when data is updated, update the version, which means that the data now has a new URL. A client asking for the latest version of the data would then not get a cached version. Because of HATEOAS (Hypertext As The Engine Of Application State) we would be able to discover the new URL for “Alice’s Profile Information,” and thus read the updated data. Unfortunately, there is no good way to discover that the new version is there — the client running on Bob’s machine would have to walk the tree of data from the start to get back to Alice’s new profile link, which is even more round-trip requests and makes the latency even worse.

A third option is to use REST transfer for the bulk data, but use some other, out-of-band (from the point of view of the HTTP protocol) mechanism to send changes to interested clients. Examples of this approach include the Meteor web framework, and the MQTT based push approach taken by Facebook Mobile Messenger. Meteor doesn’t really scale past a few hundred online users, and has an up-to-10-seconds-delay once it’s put across multiple hosts. Even with multiple hosts and “oplog tailing,” it ends up using a lot of CPU on each server, which means that a large write volume ends up with unacceptably low performance, and a scalability ceiling determined by overall write load, that doesn’t shard. At any time, IMVU has hundreds of thousands of concurrent users, which is a volume Meteor doesn’t support.

As for the MQTT-based mobile data push, Facebook isn’t currently making their solution available on the open market, and hadn’t even begun talking about it when we started our own work. Small components of that solution (such as MQTT middleware) are available for clients that can use direct TCP connections, and could be a building block for a solution to the problem.

The good news is that we at IMVU already have a highly scalable, multi-cast architecture, in the form of IMQ (the IMVU Message Queue.) This queue allows us to send lightweight messages to all connected users in real-time (typical latencies are less than 10 milliseconds plus one-way network delay.) Thus, if we can know what kinds of things that a user is currently interested in seeing, and we can know whether those things change, we can let the user know that the data changed and needs to be re-fetched.

Jonblog1

 

The initial version of IMQ used Google Protocol Buffers on top of a persistent TCP connection for communications. This works great for desktop applications, and may work for some mobile applications as long as the device is persistently connected, but it does not work well for web browsers with no raw TCP connection ability, or intermittently connected mobile devices. To solve for these use cases, we added the ability to connect to IMQ using the websockets protocol, and additionally to fall back to an occasionally polled mail-drop pick-up model over HTTP for the worst-case connectivity situations. Note that this is still much more efficient than polling individual services for updated data — IMQ will buffer all the things that received change notifications across our service stack, and deliver them in a single HTTP response back to the client, when the client manages to make a HTTP request.

To make sure that the data for an endpoint is not stale when it is re-fetched by the client, we then mark the output of real-time updated REST services as non-cacheable by the intermediate caching layers. We have to do this, because we cannot tell the intermediate actors (especially, the browser cache) about the cache invalidation — even though we have JavaScript code running in the browser, and it knows about the invalidation of a particular URL, it cannot tell the browser cache that the data at the end of that URL is now updated.

Instead, we keep a local cache inside the web page. This cache maps URL to JSON payload, and our wrapper on top of XMLHttpRequest will first check this cache, and deliver the data if it’s there. When we receive an invalidation request over IMQ, we mark it stale (although we may still deliver it, for example for offline browsing purposes.)

Request Chattiness (Latency)

Our document-like object model looks like a connected graph with URL links as the edges, and JSON documents as the nodes. When receiving a particular set of data (such as the set of links that comprises my friends list) it is very likely that I will immediately turn around and ask for the data that’s pointed to by those links. If the browser and server both support the SPDY protocol, we could pre-stuff the right answers into the SPDY connection, in anticipation of the client requests. However, not all our clients have this support, and not even popular server-side tools like Nginx or Apache HTTPd support pre-caching, so instead we accomplish the same thing in our REST response envelope.

Instead of responding with just a single JSON document, we respond with a look-up table of URLs to JSON documents, including all the information we believe the client will want, based on the original request. This is entirely optional — the server doesn’t have to add any extra information; the client doesn’t have to pay attention to the extra data; but servers and clients that are in cahoots and pay attention will end up delivering a user experience with more than 30x fewer server round-trips! On internet connections where latency matters more than individual byte counts, this is a huge win. On very narrow-band connections (like 2G cell phones or dial-up modems,) the client can provide a header that tells the server to never send any data more than what’s immediately requested.

Jonblog2

Because the server knows all the data it has sent (including the speculatively pre-loaded, or “denormalized” data,) the server can now make arrangements for the client to receive real-time updates through IMQ when the data backing those documents changes. Thus, when a friend comes online, or when a new catalog item from a creator I’m interested in is released, or when I purchase more credits, the server sends an invalidation message on the appropriate topic through the message queue, and any client that is interested in this topic will receive it, and update its local cache appropriately.

Putting it Together

This, in turn, ties into a reactive UI model. The authority of the data, within the application, lives in the in-process JSON cache, and the IMQ invalidation events are received by this cache. The cache can then know whether any piece of UI is currently displaying this data; if so, it issues a request to the server to fetch it, and once received, it updates the UI. If not, then it can just mark the element as stale, and re-fetch it if it’s later requested by some piece of UI or other application code.

The end-to-end flow is then:

  1. Bob loads Alice’s profile information
  2. Specific elements on the screen are tied to the information such as “name” or “motto”
  3. Bob’s client creates a subscription to updates to Alice’s information
  4. Alice changes her motto
  5. The back-end generates a message saying “Alice’s information changed” to everyone who is subscribed (which includes Bob)
  6. Bob’s client receives the invalidation message
  7. Bob’s client re-requests Alice’s profile information
  8. The underlying data model for Alice’s profile information on Bob’s display page changes
  9. The reactive UI updates the appropriate fields on the screen, so Bob sees the new data

All of these pieces means re-thinking a number of building blocks of the standard web stack, which means more work for our foundational libraries. In return, we get a more reactive web application, where anything you see on the screen is always up to date, and changes respond quickly, both through the user interface, and through the back-end, with minimal per-request overhead.

This might seem complex, but it ends up working really well, and with the proper attention to library design for back- and front-end development, building a reactive application like this is no harder than building an old-style, slow polling (or manually refreshed) application.

It would be great if SPDY (and, future, HTTP2) could support pre-stuffing responses in the real world. It would also be great if the browser DOM had an interface to the local cache, so that the application could tell the browser about a particular URL being invalidated. However, the solution we’ve built up achieves the same benefits, using existing protocols, which goes to show the fantastic flexibility and resilience inherent in the protocols and systems that make up the web!

 

Web Platform Limitations, Part 1 – XMLHttpRequest Priority

The web is inching towards being a general-purpose applications platform that rivals native apps for performance and functionality. However, to this day, API gaps make certain use cases hard or impossible.

Today I want to talk about streaming network resources for a 3D world.

Streaming Data

In IMVU, when joining a 3D room, hundreds of resources need to be downloaded: object descriptions, 3D mesh geometry, textures, animations, and skeletons. The size of this data is nontrivial: a room might contain 40 MB of data, even after gzip. The largest assets are textures.

To improve load times, we first request low-resolution thumbnail versions of each texture. Then, once the scene is fully loaded and interactive, high-resolution textures are downloaded in the background. High-resolution textures are larger and not critical for interactivity. That is, each resource is requested with a priority:

High Priority Low Priority
Skeletons High-resolution textures
Meshes
Low-resolution textures
Animations

Browser Connection Limits

Let’s imagine what would happen if we opened an XMLHttpRequest for each resource right away. What happens depends on whether the browser is using plain HTTP or SPDY.

HTTP

Browsers limit the number of simultaneous TCP connections to each domain. That is, if the browser’s limit is 8 connections per domain, and we open 50 XMLHttpRequests, the 9th would not even submit its request until the 8th finished. (In theory, HTTP pipelining helps, but browsers don’t enable it by default.) Since there is no way to indicate priority in the XMLHttpRequest API, you would have to be careful to issue XMLHttpRequests in order of decreasing priority, and hope no higher-priority requests would arrive later. (Otherwise, they would be blocked by low-priority requests.) This assumes the browser issues requests in sequence, of course. If not, all bets are off.

There is a way to approximate a prioritizing network atop HTTP XMLHttpRequest. At the application layer, limit the number of open XMLHttpRequests to 8 or so and have them pull the next request from a priority queue as requests finish.

Soon I’ll show why this doesn’t work that well in practice.

SPDY

If the browser and server both support SPDY, then there is no theoretical limit on the number of simultaneous requests to a server. The browser could happily blast out hundreds of HTTP requests, and the responses will make full use of your bandwidth. However, a low-priority response might burn bandwidth that could otherwise be spent on a high-priority response.

SPDY has a mechanism for prioritizing requests. However, that mechanism is not exposed to JavaScript, so, like HTTP, you either issue requests from high priority to low priority or you build the prioritizing network approximation described above.

However, the prioritizing network reduces effective bandwidth utilization by limiting the number of concurrent requests at any given time.

Prioritizing Network Approximation

Let’s consider the prioritizing network implementation described above. Besides the fact that it doesn’t make good use of the browser’s available bandwidth, it has another serious problem: imagine we’re loading a 3D scene with some meshes, 100 low-resolution textures, and 100 high-resolution textures. Imagine the high-resolution textures are already in the browser’s disk cache, but the low-resolution textures aren’t.

Even though the high-resolution textures could be displayed immediately (and would trump the low-resolution textures), because they have low priority, the prioritizing network won’t even check the disk cache until all low-resolution textures have been downloaded.

That is, even though the customer’s computer has all of the high-resolution textures on disk, they won’t show up for several seconds! This is an unnecessarily poor experience.

Browser Knows Best

In short, the browser has all of the information needed to effectively prioritize HTTP requests. It knows whether it’s using HTTP or SPDY. It knows what’s in cache and not.

It would be super fantastic if browsers let you tell them. I’ve seen some discussions about adding priority hints, but they seem to have languished.

tl;dr Not being able to tell the browser about request priority prevents us from making effective use of available bandwidth.

FAQ

Why not download all resources in one large zip file or package?

Each resource lives at its own URL, which maximizes utilization of HTTP caches and data sharing. If we downloaded resources in a zip file, we wouldn’t be able to leverage CDNs and the rest of the HTTP ecosystem. In addition, HTTP allows trivially sharing resources across objects. Plus, with protocols like SPDY, per-request overhead is greatly reduced.