The Real-time Web in REST Services at IMVU

By Jon Watte, VP Technology @ IMVU

IMVU has built a rich, graph-shaped REST (REpresentational State Transfer) API (Application Programming Interface) to our data. This data includes a full social network, as well as e-commerce, virtual currencies, and the biggest 3D user generated content catalog in the world. This post discusses how IMVU addresses two of the bigger draw-backs of REST-based service architectures for real-time interactive content: Cache Invalidation (where users want to know about new data as soon as it becomes available,) and Request Chattiness (where request latency kills your performance.)

Cache Invalidation

REST principles like cacheability and hypertext-based documents work great for exposing data to a variety of clients (desktop, web, and mobile,) but runs into trouble when it meets the expectation of real-time interaction. For example, when a user changes their Motto in their profile, they would like for the world to see the new Motto right away — yet, much of the scalability wins of REST principles rely on caching, and the web does not have a good invalidation model. Here is an illustration of the problem:

At 10:03 am, Bob logs in and the client application fetches the profile information about his friend Alice. This data potentially gets cached at several different layers:

  • in application caches at the server end, such as Varnish or Squid
  • in content delivery network caches at the network edge, such as Akamai or Cloudfront
  • in the user’s own browser (every web browser has a local cache)

Let’s say that our service marks the data as cacheable for one hour.

At 10:04 am, Alice updates her Motto to say “there’s no business like show business.”

At 10:05 am, Alice sends a message to Bob asking “how do you like my motto?”

At 10:06 am, Bob looks at Alice’s profile, but because he has a stale version in cache, he sees the old version. Confusion (and, if this were a TV show, hilarity) ensues.

HTTP provides two options to solve this problem. One is to not do caching at all, which certainly “solves” the problem, but then also removes all benefits of a caching architecture. The other is to add the “must-validate” option to the cache control headers of the delivered data. This tells any client that, while it might want to store the data locally, it has to check back with the server to see whether the data has changed or not before re-using it. In the case that data has not changed, this saves on bytes transferred — the data doesn’t need to be sent twice — but it still requires the client to make a request to the server before presenting the data.

In modern web architectures, while saving throughput is nice, the real application performance killer is latency, which means that even the “zero bytes” response of checking in with the server introduces an unacceptable cost in end-user responsiveness. Cache validation and/or E-tags might sound like a big win, but for a piece of data like a “motto,” the overhead in HTTP headers (several kilobytes) dwarfs the savings of 30 bytes of payload — the client might as well just re-get the resource for approximately the same cost.

Another option that’s used in some public APIs is to version all the data, and when data is updated, update the version, which means that the data now has a new URL. A client asking for the latest version of the data would then not get a cached version. Because of HATEOAS (Hypertext As The Engine Of Application State) we would be able to discover the new URL for “Alice’s Profile Information,” and thus read the updated data. Unfortunately, there is no good way to discover that the new version is there — the client running on Bob’s machine would have to walk the tree of data from the start to get back to Alice’s new profile link, which is even more round-trip requests and makes the latency even worse.

A third option is to use REST transfer for the bulk data, but use some other, out-of-band (from the point of view of the HTTP protocol) mechanism to send changes to interested clients. Examples of this approach include the Meteor web framework, and the MQTT based push approach taken by Facebook Mobile Messenger. Meteor doesn’t really scale past a few hundred online users, and has an up-to-10-seconds-delay once it’s put across multiple hosts. Even with multiple hosts and “oplog tailing,” it ends up using a lot of CPU on each server, which means that a large write volume ends up with unacceptably low performance, and a scalability ceiling determined by overall write load, that doesn’t shard. At any time, IMVU has hundreds of thousands of concurrent users, which is a volume Meteor doesn’t support.

As for the MQTT-based mobile data push, Facebook isn’t currently making their solution available on the open market, and hadn’t even begun talking about it when we started our own work. Small components of that solution (such as MQTT middleware) are available for clients that can use direct TCP connections, and could be a building block for a solution to the problem.

The good news is that we at IMVU already have a highly scalable, multi-cast architecture, in the form of IMQ (the IMVU Message Queue.) This queue allows us to send lightweight messages to all connected users in real-time (typical latencies are less than 10 milliseconds plus one-way network delay.) Thus, if we can know what kinds of things that a user is currently interested in seeing, and we can know whether those things change, we can let the user know that the data changed and needs to be re-fetched.

Jonblog1

 

The initial version of IMQ used Google Protocol Buffers on top of a persistent TCP connection for communications. This works great for desktop applications, and may work for some mobile applications as long as the device is persistently connected, but it does not work well for web browsers with no raw TCP connection ability, or intermittently connected mobile devices. To solve for these use cases, we added the ability to connect to IMQ using the websockets protocol, and additionally to fall back to an occasionally polled mail-drop pick-up model over HTTP for the worst-case connectivity situations. Note that this is still much more efficient than polling individual services for updated data — IMQ will buffer all the things that received change notifications across our service stack, and deliver them in a single HTTP response back to the client, when the client manages to make a HTTP request.

To make sure that the data for an endpoint is not stale when it is re-fetched by the client, we then mark the output of real-time updated REST services as non-cacheable by the intermediate caching layers. We have to do this, because we cannot tell the intermediate actors (especially, the browser cache) about the cache invalidation — even though we have JavaScript code running in the browser, and it knows about the invalidation of a particular URL, it cannot tell the browser cache that the data at the end of that URL is now updated.

Instead, we keep a local cache inside the web page. This cache maps URL to JSON payload, and our wrapper on top of XMLHttpRequest will first check this cache, and deliver the data if it’s there. When we receive an invalidation request over IMQ, we mark it stale (although we may still deliver it, for example for offline browsing purposes.)

Request Chattiness (Latency)

Our document-like object model looks like a connected graph with URL links as the edges, and JSON documents as the nodes. When receiving a particular set of data (such as the set of links that comprises my friends list) it is very likely that I will immediately turn around and ask for the data that’s pointed to by those links. If the browser and server both support the SPDY protocol, we could pre-stuff the right answers into the SPDY connection, in anticipation of the client requests. However, not all our clients have this support, and not even popular server-side tools like Nginx or Apache HTTPd support pre-caching, so instead we accomplish the same thing in our REST response envelope.

Instead of responding with just a single JSON document, we respond with a look-up table of URLs to JSON documents, including all the information we believe the client will want, based on the original request. This is entirely optional — the server doesn’t have to add any extra information; the client doesn’t have to pay attention to the extra data; but servers and clients that are in cahoots and pay attention will end up delivering a user experience with more than 30x fewer server round-trips! On internet connections where latency matters more than individual byte counts, this is a huge win. On very narrow-band connections (like 2G cell phones or dial-up modems,) the client can provide a header that tells the server to never send any data more than what’s immediately requested.

Jonblog2

Because the server knows all the data it has sent (including the speculatively pre-loaded, or “denormalized” data,) the server can now make arrangements for the client to receive real-time updates through IMQ when the data backing those documents changes. Thus, when a friend comes online, or when a new catalog item from a creator I’m interested in is released, or when I purchase more credits, the server sends an invalidation message on the appropriate topic through the message queue, and any client that is interested in this topic will receive it, and update its local cache appropriately.

Putting it Together

This, in turn, ties into a reactive UI model. The authority of the data, within the application, lives in the in-process JSON cache, and the IMQ invalidation events are received by this cache. The cache can then know whether any piece of UI is currently displaying this data; if so, it issues a request to the server to fetch it, and once received, it updates the UI. If not, then it can just mark the element as stale, and re-fetch it if it’s later requested by some piece of UI or other application code.

The end-to-end flow is then:

  1. Bob loads Alice’s profile information
  2. Specific elements on the screen are tied to the information such as “name” or “motto”
  3. Bob’s client creates a subscription to updates to Alice’s information
  4. Alice changes her motto
  5. The back-end generates a message saying “Alice’s information changed” to everyone who is subscribed (which includes Bob)
  6. Bob’s client receives the invalidation message
  7. Bob’s client re-requests Alice’s profile information
  8. The underlying data model for Alice’s profile information on Bob’s display page changes
  9. The reactive UI updates the appropriate fields on the screen, so Bob sees the new data

All of these pieces means re-thinking a number of building blocks of the standard web stack, which means more work for our foundational libraries. In return, we get a more reactive web application, where anything you see on the screen is always up to date, and changes respond quickly, both through the user interface, and through the back-end, with minimal per-request overhead.

This might seem complex, but it ends up working really well, and with the proper attention to library design for back- and front-end development, building a reactive application like this is no harder than building an old-style, slow polling (or manually refreshed) application.

It would be great if SPDY (and, future, HTTP2) could support pre-stuffing responses in the real world. It would also be great if the browser DOM had an interface to the local cache, so that the application could tell the browser about a particular URL being invalidated. However, the solution we’ve built up achieves the same benefits, using existing protocols, which goes to show the fantastic flexibility and resilience inherent in the protocols and systems that make up the web!

 

14 thoughts to “The Real-time Web in REST Services at IMVU”

  1. One thing I’ve done also is to optionally include a diff in the invalidation message. That is, a message saying “Alice’s info is now version N. If you had version N-1 of Alice’s info, apply this diff to be up-to-date.” If the receiving client has Alice’s info of version <= N-2, the resource is fetched in its entirety.

    This allows the entire state of the resource is represented without an additional request. The trade-off of course is that the messages may be long for large diffs.

  2. Ok but suppose instead of “Alice” you have “Kim K” with not one or two followers but 23M or so. And, she changes her motto all the time, like every minute. And, her follower list experiences significant churn. Whoops, now she just got into a flame war with her sis “Kendall J” whose 20M own followers are not happy about it one bit!

    How do cache invalidation messages and client side subscriptions scale up 3-4 orders of mag?

  3. Subscribers in this sense are usually people who are actively on the site – not simply those interested in Alice’s updates. When the user disconnects, the subscription is usually destroyed.

    If you’re dealing with hundreds of millions of concurrent connections, then yes, you’d have to address issues like this.

  4. Congratulations, you just re-invented Push notifications (some call that Comet) with light pings.
    If you send updated content with the pings, then it would be called “fat pings”.
    By the way, there are public services where you can offload keeping open those thousands of connections for the whole session, e. g. Google Channel IO.

    Also, things will start getting interesting once many publishers start updating many subscribers, like in Yunn McKrinlew’s comment above. Then you’ll have to reinvent PubSub. Although that’s kinda adventurous, so I’d recommend considering existing solutions.

  5. I’m troubled by the “Meteor doesn’t scale” argument. Yes, if you setup your state to be totally shared between clients, then yes, every write will cause a message to be sent to every connected client, which in a synchronous environment generates noticeable latency. Couldn’t you implement something like a “subscribe” predicate to all write traffic, such that it only gets pushed to a small minority of clients?

  6. You are right — it’s the number of active online users that matter (see the beginning of the article.) Users that log in later don’t need an invalidation message, as they will just fetch the full data.

    In addition — the IMQ messaging fabric is fairly robust and high performance. With a modern networking fabric, a million-subscriber queue would be “chunky” (as in high latency) but probably would still “work” (as in deliver the message to all subscribers.)

    Broadcast latencies within IMQ are typically in the order of 1 millisecond with the queue lengths we’re seeing in practice, and the system is built for linear scaling where it currently just has a few hardware nodes.

  7. That’s an interesting comment. Perhaps I’m not clear enough in the article?

    We have hundreds of thousands of active, open connections, in production, today. Trying to do it on Google App Engine would not be feasible. Also, we already re-invented large-scale pub/sub in 2011, because there was no big enough solution available to purchase/use.

    The gist of this article is to illustrate how web pages and REST services can benefit from a massive, reactive, pub/sub infrastructure, and how the existing web infrastructure (all the way from CDNs and caches to browser DOM APIs) need to be subverted to make it actually be great.

  8. The implementation of Meteor is that each gateway receives the full MongoDB replication stream for all writes. This means that you pay CPU (and networking) cost for the volume of writes, times the number of gateways. And, because both write volume and number of gateways grows with number of users, Meteor scales by the square of the number of users.

    Once the write volume gets to the point where it saturates a gateway, all the gateways will be saturated, and adding more gateways won’t help. (Although, economically, the tipping point is much earlier.)

    There exist system architectures that get around this problem, including architectures we chose in this system, but the existing Meteor implementation is not there, and it would be a very big job to try to get it there while supporting the existing Meteor client API.

  9. Thank you, Anonumous! The architecture in that article looks fine for broacasting a single source of data. The problem comes when each new user is also a new source of data. Because the architecture uses multicast to send data changes to all gateways, as more users (and thus more data changes) come online, the gateways will saturate just trying to process the data change stream, which ends up with the same N-squared scalability problem as the Meteor framework (see above) but perhaps at a higher constant-factor level.

Leave a Reply