Distributed Redis Transactions

By Eric Hohenstein, IMVU, Inc.

Preface

Four times a year, employees of IMVU (engineers and non-engineers) are encouraged to participate in what we like to call “Hack Week“. Hack week is when we are given the opportunity to work on anything that we want so long as there is the possibility that it could eventually achieve one or more IMVU business objectives. Within that constraint, we have total freedom. On the Monday following hack week we get to show off what we accomplished to the rest of the company. Some people choose to work on relatively small projects by themselves and others will organize and participate in larger projects with many people.

Our last hack week was November 28 through December 2, 2011. This is a description of the project that I worked on in conjunction with two other people, David Johnson and Peter Offenwanger, both software engineers at IMVU.

What we built

We built an ACID compliant distributed key value database implementing a small but critical subset of the redis command set and wire protocol:

GET
SET
DEL
WATCH
UNWATCH
MULTI
EXEC

The source is available on github and has been released as open-source under the MIT license.

The system should theoretically be able to scale to arbitrary limits since there is no singular bottleneck in the design. Clients connect to a homogeneous pool of client session hosts. The client sessions hash keys to buckets and each bucket maps to a separate erlang process and back-end redis storage instance. The number of possible buckets is limited only by the number of bits used to generate the hash (currently 2^128). The client sessions and the buckets communicate with the a homogeneous pool of transaction monitors. Only the cluster startup/monitoring process can currently limit scalability but this process can also be distributed if necessary though that should only be needed for a frighteningly large cluster.

The motivation for this investigation was to explore the performance limits of redis beyond what a single redis host can offer. Any solution based on standard redis has an upper bound on throughput without sacrificing consistency. Some applications of key value stores require very strong consistency (for instance a separate index view of a data set maintained in parallel with the data set). Redis is remarkably performant, especially given that it operates with just a single thread. However if an application requires strong data consistency and higher throughput than redis (or any other key value data store) can offer, the options with existing solutions are very limited. The redis team is in the process of adding a clustering solution but it will not support distributed transactions, only consistent hashing.

How we built it

The project started as an application I built to learn erlang shortly after the previous hack week in July. Before we started this last hack week the project consisted of about 700 lines of code that implemented the entire supported command set but only accessible from the erlang shell and without any persistence and without a transaction monitor. By the end of this last hack week we had around 1200 lines of code and a working prototype. The main pieces missing from the prototype are crash recovery and process monitoring.

How we benchmarked it

We spent a fair amount of time during hack week trying to reuse a standard redis benchmarking tool. This gave us good visibility into redis and dtm-redis performance on a single host with and without the appendfsync redis configuration. After hack week I spent some time building a custom benchmarking tool in C (included in the dtm-redis project) to work around some limitations of the standard redis benchmarking tool which include:

  • no support for transactions
  • no latency measurements, only throughput
  • no support for multiple listening hosts

This custom tool accepts a list of host:port parameters on the command line, a parameter for the number of clients per host, the number of seconds to run the test, and the type of benchmark to perform (get_set or trans). It produces output like the following:

ehohenstein@localhost:~/dtm-bench$ ./dtm-bench host1:6379,host2:6379,host3:6379,host4:6379 100 60 get_set
creating 100 clients connecting to each of 4 hosts
starting clients
clients running for 60 seconds
stopping clients
total requests:  17604943
total latency:   23993.386276540
average latency: 0.001363
max latency:     0.217846477
elapsed time:    60.007721772
requests/sec:    293377

 

When testing transactions, a “request” represented a transaction involving five round trips: WATCH, GET, MULTI, SET, EXEC. The average latency is calculated across the entire transaction though the max latency is calculated across each individual round trip. In at least one instance this resulted counter-intuitively in the measured max latency being lower than the average latency.

Most tests were performed with 100 clients per listening host. Experimentation showed that throughput grew up to that level and flattened out once that number of clients per host was reached.

The IMVU operations team was very helpful by setting up and configuring 4 hosts in the production cluster with redis installed to use for benchmarking. These hosts had dual six-core Xeon 2.66 GHz processors with hyper threading enabled and with spinning disks. When testing dtm-redis, 8, 16, or 32 buckets were configured, each with its own separate redis instance listening on 127.0.0.X.

Benchmark results

Outside of transactions, the system was able to easily beat standard redis throughput performance. Throughput appears to increase linearly with the number of hosts (at least with the number we used in the benchmark test). Latency of dtm-redis was mostly similar to standard redis but max latency appears to increase exponentially with the number of hosts which is troubling.

Transactions were tested in both synchronous I/O (durable) and non-synchronous I/O (non-durable) modes. The durable mode was not totally durable since when performing this test, the redis configuration was not altered to make it use the appendfsync persistence option. It was nevertheless a fair approximation since in this mode, dtm-redis was performing 2 synchronous writes to disk per transaction or write operation.

In non-durable mode, dtm-redis throughput again beat standard redis and had linear growth with the number of hosts. This time, average latency of dtm-redis was roughly constant and max latency again grew exponentially, exceeding redis by 20x with four hosts.

In durable mode, dtm-redis throughput was extremely low. When testing with just a single host and appendfsync persistence, both redis and dtm-redis allowed roughly 300 requests per second which is roughly 3 orders of magnitude below the throughput without appendfsync. This is not surprising since the disk would be the bottleneck in both cases. In durable mode, dtm-redis again had roughly linear growth in throughput with the number of hosts though only up to a maximum of 1925 transactions per second with four hosts. Average latency and max latency were both roughly constant though note that max latency reached more than 3 seconds in the 3 host test. This is especially alarming given that max latency was measured for individual round trips rather than the transaction as a whole.

Durable mode performance could likely have been greatly increased using a host with a SSD. The IMVU operations team didn’t have one available for us to use for benchmarking so we aren’t able to compare apples to apples. However, I noted on my development system, which does have an SSD, while working on the benchmark tool that I was able to get over 5000 transactions per second locally with only 1 CPU in a linux VM being shared by redis, dtm-redis, and the benchmark tool. Chances are that SSD would increase durable mode performance by a factor of at least 10x.

One thing that was interesting was that in durable mode the throughput, average latency, and max latency all improved with the number of clients per host. This is almost certainly a side effect of an optimization which was chosen in dtm-redis to batch together multiple I/O operations waiting in the binlog process queue.

Another thing that was very interesting is that when running the non-transaction and non-durable transaction benchmarks, the hosts under benchmark test spent almost all of their CPU time running erlang code. Each of the 8 redis processes had roughly 20-25% CPU utilization and erlang had 800-1000%.

Conclusions

This prototype was built with only 1200 lines of code. If we were to build a fully functional and robust system, it’s probably fair to estimate that it would require at least a 10x increase in the amount of code necessary. That being said, this wasn’t a giant project. I can’t say if this would provide enough value to IMVU to build, but it is at least within reason to imagine us building it.

The use of erlang for this application is somewhat questionable. It was useful as a learning tool to build this prototype in erlang. I suspect however that erlang is not efficient enough to match the kind of performance typically expected from redis. The difference in average latency and the unacceptably high max latency of dtm-redis together with the CPU utilization of dtm-redis would suggest that erlang is spending too much time copying data from process to process. It’s hard to be sure without building another prototype but I believe that C would be a better choice. Building this system in C would be significantly more code however and a corresponding difference in development and maintenance costs.

Of course in durable mode, erlang performance is really not an important factor. In this case, the disk is the limiting factor in performance. The scalability of this system (as currently defined) is actually based solely on the ability to distribute disk I/O. Adding a large number of disks would likely allow a significant performance increase but might also reduce the mean time to failure.

After having completed the prototype it was clear that the durable mode is likely not going to be useful due to the very low throughput and high latency. An alternate design that I would like to explore is one that would be more or less equivalent to the default redis mode where the data-set is periodically checkpointed. It should be possible to obtain the performance characteristics of dtm-redis seen in non-durable mode while maintaining (check-pointed) consistency by periodically synchronizing all bucket processes and having them each initiate a save operation. This will cause occasional high latency but the average latency should remain low.


Gephi Plugins Developers Workshop at IMVU

by Paco Nathan

At IMVU, the Data team often works with large data sets which represent customer interactions across social graphs. This kind of data cannot be explored very well using relational databases, such as MySQL. Instead, we rely on other tools for analyzing social graphs.

One tool which we really like is called Gephi, an open source platform for visualization and analysis of graphs. Our team uses Gephi as an interactive UI for exploring relationships and patterns in graphs. We also use it for calculating social graph metrics and other statistics which help us to characterize large graphs. One favorite feature in Gephi is how readily it can be extended by writing plugins.

We’ve received much appreciated help and guidance from the Gephi developer community. Now we are glad to be able to host the first Gephi workshop about developing plugins. Bring your laptop, install the required software from sources provided, and work with mentors to learn how to write your own Gephi plugins! And get to network with other people in the Gephi developer community.

To RSVP, please click through the Meetup.com link below.

Title: Gephi Plugins Developers Workshop

Organizers: Mathieu Bastian and Paco Nathan

Date and Time: Thursday, October 6, 2011, 7:00-9:30pm

Location: IMVU, Inc.. 100 W. Evelyn Ave #110, Mountain View, CA

Cost: No cost

Food and drinks: Food, drinks, and wifi will be provided

Agenda: This is the first Gephi workshop dedicated to Gephi Plugins developers! Come and learn how to write your first Gephi plugins and ask questions.

The workshop will start with a 1-hour presentation of Gephi’s architecture and the different types of plugins that can be written with examples. Details about Gephi’s API, code examples and best practices will be presented in an interactive “live coding” way. The Gephi Toolkit will also be covered in details. The second part of the workshop will be dedicated to help individuals with their projects and answer questions. Depending on the audience’s needs we can discuss plugin design, how to code UI, databases, toolkit, data sources or any plug-in idea you have.

Gephi is modular software and can be extended with plugins. Plugins can add new features like layout, filters, metrics, data sources, etc. or modify existing features. Gephi is written in Java so anything that can be used in Java can be packaged as a Gephi plugin! Visit the Plugins Portal on the wiki and follow the tutorials.

Link to event on meetup.com: Gephi Plugins Developers Workshop

Silicon Valley Open Source Business Intelligence Meetup at IMVU

By Tim Levine and Paco Nathan

At IMVU, our leaders are increasingly leveraging data to inform decisions. In response to this we have embarked on leveraging an off-the-shelf business intelligence solution to enable ready access to data.  Considering our needs, present and future, and the personality of the company we have deployed a commercial version of Pentaho.  Using this suite, we have been able to provide fresher data (daily refresh versus monthly), more interactive data (using ROLAP cubes), and newly processed data (30-day rolling of usage).
Our success has gotten to the point where we would like to contribute and draw from the larger community and so we will be hosting  a meetup of those interested in open source business intelligence.  To RSVP, please click through the meetup.com link below.

Title: 

Silicon Valley Open Source Business Intelligence Meetup

Organizer: 

Tim Levine and Paco Nathan

Date and Time:

Thursday, September 15, 2011
7:30-9:30pm

Location:

IMVU Corporate Headquarters
100 W. Evelyn Ave #110 , Mountain View, CA

Cost:
No cost

Food and drink:
Food and drinks will be provided

Agenda:

1 Food and networking
2 Introductions around the room, find burning questions
3 Tim Levine and Paco Nathan will present on how Pentaho fits into our technology stack to solve problems involving little and Big Data alike. Specifically how we use Pentaho to fetch, munge, and report data as well as bringing in R and hadoop in the AWS cloud to provide otherwise impossible insights.
4 Discussion of burning questions
5 Solicit ideas for next meeting, decide on frequency
6 Announcements

Link to event on meetup.com
 Silicon Valley Open Source Business Intelligence Meetup


Partner Integrations

By Rusty Larner, IMVU

IMVU partners with several companies to provide extra services on our website that are not our core competencies, especially when it makes more sense to use others that already implement the service.  Each of these integration efforts is a new process, but we have found several practices that make them easier to accomplish and more likely to succeed.

Doing a technical review during the initial partner evaluation sounds obvious, but is extremely critical.  Not just at the marketing level, but having our engineers talk with theirs about the expectations we have on performance (connections per second, etc.) as well as the actual integration work helps reduce confusion before the effort starts.  Several times we have surprised our partners with either the usage we expect for a new feature, or how we want to integrate their services into our entire development cycle.  Sometimes the technical review at this stage determines that it won’t really be reasonable to do the integration – helping us to ‘fail fast’ instead of waiting until attempting the integration.

Continuing this communication on through the initial implementation effort is also important. The best way to accomplish this is having both sets of engineers co-located during the initial integration effort.  We have found that having the engineers from our partner work directly with our engineers helps cut down on the communication delays.  There are always questions during any integration, about API documentation or error messages or just adding logging to determine why a given request isn’t working – having engineers in the same room helps immensely.  The timing on this is also critical – we do not want to have someone fly out before we’ve started the integration work, but we also do not want to go too far down the rabbit hole before having an expert there to help.  Having flexible partners helps – during one integration process we ran into difficultly far earlier than expected – our east-coast partner flew out an engineer the next day to work with our team, and he stayed until we determined the problem.

Another part is being able to integrate tests against our external partners into our continuous integration process.  We have several solutions here – first, we implement a full ‘fake’ version of the partner’s API that we use during standard commits.  This allows our large suite of unit and functionality tests to not be dependent on external connections that would make them slower and sometimes intermittent.  We also have a separate set of tests that we run against the actual API.  These include tests that confirm our fake APIs work the same as the actual APIs.  These external tests are run every couple of hours, so we can detect if the partner’s APIs change.  The combination of these tests allow us to feel confident with our continuous integration efforts, even when doing things like changing our Music Store (which talks to our music partner).

Metrics and graphing are also a very important part of our process.  They let us quickly determine if there is a problem with the connection to our partner.  The faster you know there is a problem, the faster you can respond!  (A couple of times during my first years at IMVU, we only found out about problems through our forums – not the way to respond quickly to problems!  Plus, it becomes far harder to fix the problem when you don’t know exactly when it started…)  For these integrations, we keep several levels of metrics.  First, each point where we talk to an external customer gets metrics – for example, each music purchase attempt gets counted from our side.  But we also ask our partners to expose a service to report their own metrics to us.  This gives us better pointers to problems when they do occur – i.e. if there is a problem with the network connections between our servers and our partners’ servers.

The metrics and graphing are only a part of the continuing maintenance that we put in place.  Another major part is communication paths.  We create an internal mailing list that we ask our partners to send all email to – so when we internally change responsible individuals we don’t have to have all of our partners change their address books.  We also ask our partners to let us know as soon as possible whenever they detect a problem, then follow up later on to let us know how they solved the problem.  Where available, we’ve asked our partners to include our contact info for some of their monitoring alerts.  Our operations staff also has contact details and a call escalation path for our partner and solution instructions so they know how to handle any alerts that are triggered by our monitoring.

So, in review – for partners of IMVU, here is the checklist we (currently) follow:

* Initial engineering technical review before any contract is signed

* Co-located engineering support early in the integration schedule (usually within the first week or two)

* Regular testing calls being made to the partner’s APIs (including test purchases, etc.)

* Expose a web service containing real-time metrics important for the integration

* Call escalation path for operations if issues arise

* Continuous communication about scheduled maintenance, unplanned issues, or upcoming upgrades.

IMVU and Wealthfront Presenting Panel on Continuous Deployment

Most people following IMVU’s engineering practices know that we are big advocates of Continuous Deployment, the process in which every check-in is shipped live to production just minutes after commit.  IMVU joined with Wealthfront, another big advocate of Continuous Deployment, to put-together a panel to share experiences moving fast and always shipping to customers.

The panel represents companies from small startups with 1-2 engineers to larger-scale operations with over 40 engineers, each with very different technology stacks.  It is a great learning opportunity for anybody interested in succeeding with Continuous Deployment.

Space is limited.  If you would like to request an invite, please e-mail your name and work info to cd-panel-invite@imvu.com and you will receive a password to http://continuousdeployment.eventbrite.com/  We look forward to seeing you!

Moderator:

Jez Humble, author of Continuous Delivery

Panel:

Votizen
Industrial Logic
Wealthfront
IMVU

Date and Time:

Thursday, May 26, 2011
6:00 PM – Gathering begins
6:30 Panel discussion
7:30 Socializing / networking

Location:

The location in in Palo Alto, less than a 10-minute walk from University Ave. Caltrain station.  Details will be provided in the invite (see above to request an invite).

More Upcoming Events

Anybody interested in Continuous Deployment can also see IMVU presenting at upcoming conferences:

Login 2011, Seattle, WA, May 16-18, 2011
Real-time learning using A/B testing and Continuous Deployment

GDC Europe, Cologne, Germany, August 15-17, 2011
Using Continuous Deployment to Move Fast and Release without Pain


IMVU’s Employee-Friendly Policy on Side Projects

By Brett G. Durrett, VP Engineering & Operations


I am excited to share IMVU’s policy regarding projects that employees build in their personal time.

IMVU’s approach breaks from the standard way most companies deal with employee side projects, both by addressing potential conflicts of interest before an employee invests their time and by protecting the employee against future claims to their side project work.  I hope this policy will encourage other companies to adopt policies that recognize how many people apply their professional talents to both work and recreation.   Since I am sharing this in the engineering blog, the focus is programmer-centric; however, the policy applies to all IMVU employees.

What’s the Problem?

To understand why IMVU wanted to take a new approach it is helpful to understand how companies usually handle employee side projects.  State and federal laws generally provide for ownership interests for an employer in work done by employees, and typically this is bolstered by having each employee sign an Employee Inventions Agreement that provides the company with broad ownership rights to work the employee produces.  Most developer-level employees are salaried (not paid hourly) so there is no clear demarcation between “work time” and “personal time” with regard to their professional lives.  Also, companies change their business interests, so what may apparently have been outside of the scope of the Employee Inventions Agreement when an employee joined a company may become subject to company ownership as the company expands or changes its actual or anticipated business interests.  Or a particular employee may be unaware of work being done or contemplated elsewhere in the company.  The combination of these can lead to situations where ownership rights are ambiguous, or worse, lead to litigation to resolve, with much of the burden falling on the employee.

At IMVU, we like to attract and hire creative people who love to build things: curious tinkerers, hackers, people who love to code for both work and pleasure.  When reviewing job candidates, we see involvement with open source and community projects as added value.  This is not unique to IMVU – there are many other companies in Silicon Valley looking for the exact same types of creative builders.

But here is where potential problems arise.  The same “always creating” DNA that makes an employee so appealing for a company can create trouble for them under the broadly defined ownership terms of their Employee Inventions Agreement.  Once at a company, an employee’s personal project can become subject to company ownership if the company asserts that the project used company resources or is related to the business or research interests of the company.  For understandable reasons, “business interests” are open ended or broadly defined in an Employee Inventions Agreement.  So a company in the mobile space may be able to argue that any mobile development is related to the company business.  In practice, when personal projects have insignificant value companies have little motivation to deal with the conflict associated with asserting claim to the work.  However, if a personal project becomes valuable, the company has an incentive to claim ownership.  Even if a company doesn’t immediately claim ownership, this is a possibility later (e.g. an acquiring company or new management looking for potential intellectual property gains in the company’s assets).  In the worst-case scenario, somebody could work on what they believe to be a personal project for years only to find out later that a company owns the entirety of the work.

IMVU’s Approach to Foster Creativity

Rather than leave employees in this ambiguous state, IMVU takes a different approach.   IMVU employees can declare that they are starting a personal project and, before the project is underway, the company will explicitly state whether or not it believes it has an interest in the project.  If the company does not claim interest, it will provide to the employee a formal release that acknowledges the employee’s ownership rights to the project, and the employee will be protected even with future changes to management or control of the company.

While the policy’s benefits to employees may be obvious, I see this policy as a win for the company as well.  In a typical company, conflicts of this kind are identified only after an employee has invested a lot of time in a project, making resolution more difficult and potentially tarnishing the relationship between the company and the employee.  IMVU’s policy allows potential conflict to be addressed at the start of a side project, where the investment is low and there is not yet a tangible asset to contest.  Perhaps most importantly, it encourages honesty and transparency around side projects that are already happening.  In contrast, I am aware of a large company with a “zero side projects” policy that drives the issue underground and results in some employees using pseudonyms and creating side projects anyway (which will not protect the employee if the project becomes valuable).

I hope to see other companies adopt side project policies similar to IMVU’s, or at least encourage companies with zero-tolerance policies to evaluate the need (and value) of such a prohibitive culture.  Instead of making employees hide their recreational projects and fear repercussions from the company, encourage openness and creativity, all the time!

HTML5 – New UI Library for Games [Annotated Slides]

[slideshare id=7322096&doc=html5-newuilibraryforgameschadaustinannotated-110320042423-phpapp01&type=d]

HTML5: The New UI Library for Games

At GDC 2011, on behalf of IMVU Engineering, I presented our thesis that HTML is the new UI library for developing desktop applications and games.

HTML is Winning

The browser wars are hotter than ever. We have four major browser vendors, each with significant market shares, competing in earnest. Features such as SVG, Canvas, GPU acceleration, FAST JavaScript engines, and even WebGL are widely available.

It’s a great time to be a web developer!

Some Terminology

When I refer to HTML, I’m really referring to the entire stack of web technologies: markup, style sheets, scripts, canvas, websockets, etc. Also, Mozilla’s branding is confusing: Firefox the browser is built by Mozilla the company and is powered by Gecko/XULRunner the technology. Internally, we use these terms interchangeably.

History of IMVU’s UI

2004-2007: Win32, OpenGL, and C++

In 2004, when the company was started, IMVU’s entire UI was written in Win32, OpenGL, and C++. This is a fairly traditional, though old school, way to build UIs. However, our framework was extremely rigid, and it was hard to hire people to work on it. We tell legendary stories of “I just wanted to move that button from here to there, but I couldn’t figure it out in a few hours, so I gave up”.

In addition, our iteration speeds were terrible. You had to recompile, maybe add some new functionality to our UI framework, and then restart the client. This made UI development hard enough that we simply didn’t do it. The product didn’t change much for 3 years.

2007-2009: Flash

In 2007, a couple engineers realized we had a problem and spent a weekend integrating Adobe Flash. Flash is widely available and, if you get permission from Adobe, you can distribute Flash with your application. We implemented support for Flash UI panels and rendering Flash to translucent overlays atop the 3D scene.

Using Flash to develop UI was definitely a huge improvement. We used the Flex UI library and its standard components to get a head start. We were suddenly able to iterate on our UI, and we could add capabilities like animation and video to our client.

That said, Flash wasn’t all roses. It consumed a great deal of memory, especially address space, and mxmlc compiles were sloooow. Finally, even though Flex is open source, we found several bugs.

2009+: HTML

In 2009, we began a project to rebuild our entire experience from first principles. Our very talented design team spent months iterating with paper prototypes and user tests, and produced a very snazzy client architecture. At that point, the engineering team had some implementation choices:

1) Should we build this UI with a custom framework?
2) Should we use our existing Flash technology?
3) Or should we try integrating a browser and use HTML?

We decided to give a browser a shot, and we started with Firefox, since a couple of our people had familiarity with the Gecko codebase.

Within days we had something up and running, and the performance was amazing! It loaded quickly, was super-responsive, and used hardly any memory. We instantly decided to use HTML for the entire project.

The UI we built in HTML matched the intended design almost to the pixel. Iteration was nearly live: and we could even do our development in Firefox the browser with the Firebug add-on.

We patched Gecko to support rendering documents into textures so that we could overlay them on top of the 3D scene.

This let us build sophisticated UI elements across our entire product.

Benefits of HTML

Lingua Franca

IMVU is a Silicon Valley company, and we hire people from all backgrounds. We don’t necessarily hire people for the skills they already have; instead, we assume they’ll be able to learn whatever we need. That said, it’s valuable for people to have some background knowledge, and almost everyone these days knows a bit of the web stack. You can’t expect expertise in the nuances of the box model, but everyone knows a bit of HTML and JavaScript.

Amazing Tools

The web stack brings with it a plethora of tools like Firebug and Firebug Lite. We use these tools every day to rapidly iterate on our markup and style sheets. We can even integrate Firebug Lite into our downloadable client and work with live data.

Ecosystem of Libraries

In the last few years, we’ve seen a proliferation of amazing JavaScript libraries such as jQuery and YUI. If you build your UI with web technologies, you can leverage these libraries directly.

Advertising

When we integrated web technologies into our application, we discovered an unintended benefit. We could now directly benefit from the tens-of-billion dollar Internet advertising industry, which has built an entire advertising pipeline around HTML, JavaScript, and Flash. One engineer integrated ads into our product in less than one week.

Demo

At this point in the presentation, I gave two mini demonstrations. First, I showed what Firebug Lite was capable of: JavaScript commands, real-time editing of attributes and styles.

Then I demonstrated the process by which we iterate on our UI by, in 5 minutes, adding a jQuery accordion to an HTML overlay and demonstrating that the change had already taken effect.

Performance?

The biggest concern we hear from others is “Wouldn’t the web stack be slow and take a lot of RAM?” There was a time when browsers were comparatively slow and heavy, but with today’s fast CPUs and the continual pressure to make browsers faster (rich web experiences, tabbed browsing), we’ve found that HTML has never been a bottleneck for us.

Gecko consumes less than 1 MB of RAM for each loaded document, and we even think we’d be able to make every seat node in the scene into an HTML overlay.

You do need to be careful with DOM structures with huge numbers of children. Many DOM operations are linear in the number of children or siblings, so you’ll want to structure your DOM with a b-tree.

You also need to be careful with image tag and canvas memory usage. Avoid allocating for elements that aren’t immediately visible.

Drawbacks

Flash is still better at animation, but I think this will change. This may simply be a tool-chain issue.

3D on the web is possible now, but WebGL isn’t quite prime time. This year we expect WebGL penetration to reach 50% of the market (Chrome and Firefox), but we still depend on our software renderer for our customers without strong or reliable GPUs.

Tracing JITs are still RAM-hungry and don’t match native code performance yet. Core game components will still be written in close-to-the-metal C++.

Who else uses HTML for UI?

The independent game studio Wolfire uses WebKit for an in-game material and shader editor. They used the off-the-shelf editor component, CodeMirror, for their text editors.

Electronic Arts ported WebKit to PlayStation 3 for Skate 3. About 50% of their UI was written in HTML and JavaScript.

Netflix on PlayStation 3 uses WebKit so they can split-test their UI after they’ve shipped. That way they can improve their UI without shipping new software.

Several MMOs have added browsers so they can include forums, wiki, and help within their games.

How do I get started?

There are several open source and commercial libraries to help you get started. I’ve included a list in the slides above.

Let me know how it goes!

I’d like to conclude by saying that HTML has worked very well for us, and I strongly encourage you to give it a shot. Send me a message at chad@imvu.com to let me know if it works for you or not!

Q&A

I’d like to clarify some points I made during the Q&A session:

A man from Khronos approached me and said I was a bit hard on WebGL. If I came across that way, I didn’t mean it. I love WebGL and the work that Khronos has done, but I don’t think it’s prime time _yet_. We can’t port our existing product, which often runs on machines without a reasonable GPU, to WebGL yet. We do, however, intend to write a WebGL backend eventually. By the end of the year, we expect 50% WebGL adoption.

Someone asked about initial startup overhead. I said we probably consumed 30 MB of RAM and address space in base Gecko initialization, but that we’d done absolutely no work to reduce that. On the console, I expect it would be worth optimizing further.

There was some discussion around how our designers and engineers interacted. I mentioned that our teams are often cross-functional: our designers typically know enough CSS to make changes if they want and some of our engineers have strong design sense. We find it more productive to just sit down together and have an engineer and a designer work together over an hour to polish up the experience.

Thanks!

Thanks to Chuck Rector, Ben McGraw, Chris Dolezalek, Enno Rehling, Andy Friesen, James Birchler, Brett Durrett, Jon Watte, Laura Austin, Ben and Staci Scott, and Michael and Samantha Van Waardhuizen for providing valuable feedback during the preparation of these materials.

Buildbot and Intermittent Tests

By Jon Watte, Technical Director at IMVU, Inc.

At IMVU, we do continuous deployment. You may have read previous blog entries describing how we ship code to production 50 times a day. Today, I’d like to talk a little bit about what we’ve learned as IMVU has grown as a company and engineering organization, and what we’re doing to support this growth.

The workflow for anyone who wants to check code or assets into our website looks something like this:



As part of developing a feature increment, we also build the automated tests that prove that the feature works as intended. Over time, as we develop more and more features, we also accumulate more and more tests. Each time we check in code or assets, that check-in is run through all of the tests that have been developed over the years — at last count, this is over 40,000 tests, with seven digits’ worth of individual test assertions (the exact number varied depending on how you count).

The great thing about this is that, if your feature accidentally breaks some part of the code that you didn’t think or know about, the buildbot will tell you. You have the ability to run any test you want in your local sandbox before you check in, and you’ll generally run the tests that you think may be relevant, but in a code base the size and complexity of all of IMVU, you can’t possibly know or run everything.We have an informal goal for our buildbot to run all the tests in 12 minutes or less. This means that we can build 5 times an hour. Some contributors show up for work 8 am (Eastern time); others work until late in the night (Pacific time), so there’s opportunity for well over 50 builds per day.

However, not all builds result in success — the whole reason for having a buildbot is to catch the mistakes that inevitably slip in. When a submitted change causes one or more tests to fail, we call that “making buildbot red,” and that change is not allowed out onto the live website. Instead, you as committer will back out those changes (we have scripts that make this easy and painless), and investigate the failure (buildbot saves the appropriate log files and other information) to fix it before you re-submit.

Unfortunately, tests, like code, are written by humans. This means that tests will have bugs, Some bugs will immediately be obvious — but others will sit silently, working 99% of the time (or, more likely, 99.999% of the time). Then, once in a blue moon, some timing issue or bad assumption will surface, and the test will spuriously fail. What’s worse is that, most often, running the test again will likely not fail, so tracking down the failure requires a fair amount of diligence and effort. When this happens in your local sandbox, it’s not so bad, but if one of these intermittent test failures happens on buildbot, it’s a bit of a problem, because nobody can deploy their code to production while buildbot is red. At a minimum, the test failure has to be investigated to the point that someone understands that it’s an intermittent failure, rather than caused by the code that you committed.

Common causes of intermittent tests may be things that are hard to get right for humans (like systems with sophisticated caching strategies), or things that are hard or expensive to control for in a testing environment (assuming some pre-existing data, or absence of data, in particular databases, for example), or just because of the underlying platform. For example, we test user visible web pages using a system that drives web browsers, and then we assert things about both the state of the web page within the browser, the requests that the browser has made, and even screen shots taken of certain pages. Unfortunately, web browsers will sometimes decide to crash for no discernible reason. There are other components that are not 100% reliable, either, including tests that have to integrate with external services over the internet, although I’ll refrain from going too deep into that particular discussion in this post.

We have been trying to flush out intermittent tests by running tests all night, every night. Because no code changes between the runs, any test that goes red during this nightly grind is intermittent. We will look at the status in the morning, and any test that went red is distributed to the team that seems likely to know the most about the particular failure, for fixing.

Still, intermittent tests have been one of the most annoying thorns in the side of this entire process. As we scale the organization, buildbot will be busy most of the day, generally building three to five different change sets as part of the same build. Any of those committers will be prevented from deploying their changes to production until at least the next build has completed. This can add half an hour to the build-tests-deploy cycle, which hashes the mellow of continuous deployment driven development goodness!

To illustrate, the worst week of our internal buildbot for 2010 looked like this:

Clearly, in an environment that values quick iteration and agile development, if you find yourself in that position, you want to do something about it. So we did!

The key insight is that an intermittent test only fails some of the time, but a real failure caused by new code will fail all of the time. The second insight was that we could gather the specific outcome of each test within each test run, to track metrics on tests over time. We started by just adding these measurements and tracking tests over time. Once we learned that we could predict an intermittent test based on two successive test runs, and that we get clear green-or-red status out of each test run, we re-structured the way that we run tests.

Because we have 40,000 tests to run, allocated over several thousand test implementation files, we split the entire batch of tests across a large number of virtual machines, hosted on a few really beefy server-class machines (raid-6 SSD, 48 GB RAM, dual-CPU six-core-Xeon type machines). Our first implementation spreads the build files across different machines, and uses measurements of how long each test file is to statically sort the files into reasonably equal run lengths. This worked for many years, but now we have better information!

We updated our test runners to first discover all tests that need to run, and put those into a database of available tests. Each build slave will then request one test at a time from this database (which we call the Test Distributor), run the test, and report back. We start by doling out the slowest tests, and then dole out faster and faster tests, to achieve almost 100% perfect balancing in runtime. Even better, though — we know whether a test fails, while the build is still running. Thus, we made the distributor re-distribute a test that is reported as red to some other build slave. If the test comes back as green, we then classify the test as intermittent, and do not make that one red test get in the way of declaring the change set we’re testing as successful. In addition, we remove the build slave that reported the intermittent from the pool of available builders, so that we can analyze the cause of the intermittency whenever convenient.

Does this work?
Oh, yes! Here is a screen shot from the buildbot availability status this week. You will see that the buildbot caught some real, red check-ins, but intermittent tests are not getting in the way anymore. Thursday, we had an entire day with 100% green availability!

Pretty nifty, huh? I’m glad that I work in an environment as dynamic as IMVU that really cares about keeping our productivity up. The more we can remove annoying roadblocks from the path between developers doing hard work, and them seeing their hard work pay off, the better we will do.

REST and Games Don’t Mix

By Jon Watte, Technical Director at IMVU, Inc.

In web development, the REST (REpresentational State Transfer) approach to services often works well to build scalable, robust systems. Unfortunately, the principles behind REST collide with some necessary requirements of games, and the end result is that many problems faced by multiplayer games cannot be solved using REST techniques.

One pillar of REST design is that all necessary data for servicing a request is included in the request, and a client imposes no load on a server when not within the context of a request. For example, a web server serves a file to you when you pass it a URL; how to serve that file may be governed by cookies sent by your web browser; but once the file is served, the server conceptually needs to do nothing else to satisfy the client. If you are building, say, a photo sharing site, then this makes a lot of sense. A request may upload a new photo for some particular user, identified through URL tokens or cookies. Another request may list photos that match some particular criteria. A third request may remove or otherwise edit metadata surrounding the photos that are uploaded, by uploading some JSON or XML document to go with the photo. All in all, each request is isolated, and the server needs to do nothing (other than store the photos on disk) between requests.

Compare to a game of chess: Each move in a game of chess isn’t a complete new document (or chess board), but instead a delta over some previous state. Moves are checked for validity before they are accepted, by being interpreted in the context of the board. You could, conceivably, remember the old board state, and have a player upload a new board state that represents the state after his move, and derive what his move was by running a diff to the board. Unfortunately, even that is not enough to properly implement the rules of chess, because certain moves (castling, en passant pawn capture) depend on historical state of the board. To work around this, we can include a full set of historical state with the board, as well as the actual board state, and still implement a chess game using REST techniques.

All that remains, then, is to figure out how your opponent learns about a move having taken place, and knows how to download the new board state to display it. This is something that REST techniques simply do not solve, because the infrastructure of REST depends on the direction of requests, the completeness of state transfer, and the cacheability of data. To get rid of cache problems in the chess game above, you’d have to also keep a “next move id” counter as part of the request URL, so that the state of the board, including history, is addressed by a separate URL for each new move.

Another problem with the implementation of the chess game as a REST service is that it transfers the entire game state, including the board, for each move. This uses more network bandwidth than necessary. For something simple, like a chess board, this doesn’t really matter that much. Broadband makes bandwidth cheap, and even a cell phone will transfer 50 kilobytes of data in less than a second these days. However, for more complex games, like a 3D role-playing game or a real-time military simulation game, the amount of data transferred per frame, and the rate at which frames need to be transferred, makes REST bandwidth requirements a real problem.

However, that’s not the worst part. Some games have rules that require secrecy. For example, in a game of Texas Hold’em poker, some state is known, such as what the community cards are and what the amounts of the pot and each player’s chip stack is. Other state, though, is hidden. I do not know what cards you have in your hand, and to avoid cheating, I should have no way of finding this out until and unless you reveal your hand at the end of the game. What cards each of us have in our hands is state that is shared between only us and the server. This means that a player cannot upload a “new game state” that includes the cards in each hand. Yet, to determine the winner, all the state is required. Additionally, the deck that cards are being dealt from is secret state that only the server knows. Thus, at a minimum, to apply REST to our poker game, we’d have to decompose the game into many separate states (each hand, the deck, the pot, each player’s chip stack, etc), and we’d have to enforce separate access control to each of these values. More troubling is the fact that the cards in a player’s hand are changed by the dealer, which means it’s an action someone else takes that affects what my state is. Again, REST does not support server to client push, nor does it generally support “active” state, so implementing even a simple game like Poker requires us to break the REST mold.

The World Wide Web is a large instance of largely REST-ful services. Various work-arounds exist to get around the limitations of the REST architecture, such as long-poll XMLHttpRequests from Javascript, chunked-encoding IFRAME script execution, etc. The main reason to use the Flash player or other browser plug-ins is to break the REST mold. Once games expand outside the simple rules of something like poker or chess, the contortions necessary to work around the limitations of REST become byzantine enough that the benefits of running within a browser may be outweighed by the costs, both in development, deployment and operation. You simply can’t describe a typical server instance of a game like World of Warcraft using a REST API and expect any normal consumer network or computer hardware to keep up.

Why does this matter? The World Wide Web is built on REST. New web standards that allow us to do more and more with the internet browser are driven by the same people who developed REST. As a case in point, the HTML5 standards umbrella does not address streaming media. You can “play” a sound or video URL, and the implementation of those URLs may optimize by not delivering the data until you need it, but the only way to “record” sound or video is to record into a static piece of data, which can later be uploaded. This means that voice or video chat cannot be done in a web browser without relying on third party plug-ins, such as the Flash player. Additionally, HTML5 describes a mechanism called Web Sockets to try to work around the server push limitations of REST, but that technology has run into significant implementation problems, and will not be part of the round of web browsers releasing in the first half of 2011.

It’s clear that people want to play games and do other interactive, bidirectional activities with their network-connected computers. The success of networked games and the success of even non-interactive games like FarmVille or Backyard Monsters leave this without a doubt. It would be great if those who understand interactive entertainment (such as game makers) could make friends with those who are currently driving the web, and find common ground to allow for a bidirectional, interactive, real-time world wide web of human communications. HTML6, anyone? 🙂

How to Write an Interactive, 60 Hz Desktop Application

By Chad Austin

IMVU’s client application doesn’t fit neatly into a single development paradigm:

  • IMVU is a Windows desktop application. Mouse clicks, window resizes, and dialog boxes must all respond with imperceptible latency. Running IMVU should not significantly affect laptop battery life.
  • IMVU is an interactive 3D game. The 3D scene must be simulated and drawn at smooth, interactive frame rates, 60 Hz if possible.
  • IMVU is a networked application. Sending and receiving network packets must happen quickly and the UI should never have to wait for I/O.

Thus, let us clarify some specific requirements:

  • Minimal CPU usage (and thus battery consumption) when the application is minimized or obscured.
  • Minimal CPU usage in low-complexity scenes. Unlike most games, IMVU must never unnecessarily consume battery life while waiting in spin loops.
  • Animation must continue while modal dialog boxes and menus are visible. You don’t have control over these modal event loops, but it looks terrible if animation pauses while menus and dialogs are visible.
  • Animation must be accurate and precise. It looks much better if every frame takes 22 milliseconds (45 Hz) than if some frames take 30 milliseconds and some take 15 milliseconds (averaging 45 Hz).
  • Animation must degrade gracefully. In a really complex room with a dozen avatars, IMVU can easily spend all of a core’s CPU trying to animate the scene. In this case, the frame rate should gradually drop while the application remains responsive to mouse clicks and other input events.
  • Support for Windows XP, Vista, and 7.

Naive Approach #1

Windows applications typically have a main loop that looks something like:

MSG msg;
while (GetMessage(&msg, 0, 0, 0) > 0) {
    TranslateMessage(&msg);
    DispatchMessage(&msg);
}

What went wrong

Using SetTimer/WM_TIMER sounds like a good idea for simulation and painting, but it’s way too imprecise for interactive applications.

Naive Approach #2

Games typically have a main loop that looks something like the following:

while (running) {
    // process input events
    MSG msg;
    while (PeekMessage(&msg, 0, 0, 0, PM_REMOVE)) {
        TranslateMessage(&msg);
        DispatchMessage(&msg);
    }

    if (frame_interval_has_elapsed) {
        simulate_world();
        paint();
    }
}

What went wrong

The above loop never sleeps, draining the user’s battery and burning her legs.

Clever Approach #1: Standard Event Loop + timeSetEvent

void runMainLoop() {
    MSG msg;
    while (GetMessage(&msg, 0, 0, 0) > 0) {
        TranslateMessage(&msg);
        DispatchMessage(&msg);
    }
}

void customWindowProc(...) {
    if (message == timerMessage) {
        simulate();
        // schedules paint with InvalidateRect
    }
}

void CALLBACK TimerProc(UINT, UINT, DWORD, DWORD, DWORD) {
    if (0 == InterlockedExchange(&inFlight, 1)) {
        PostMessage(frameTimerWindow, timerMessage, 0, 0);
    }
}

void startFrameTimer() {
    RegisterClass(customWindowProc, ...);
    frameTimerWindow = CreateWindow(...);
    timeSetEvent(FRAME_INTERVAL, 0, &TimerProc, 0, TIME_PERIODIC);
}

What went wrong

The main loop’s GetMessage call always returns messages in a priority order. Slightly oversimplified, posted messages come first, then WM_PAINT messages, then WM_TIMER. Since timerMessage is a normal message, it will preempt any scheduled paints. This would be fine for us, since simulations are cheap, but the dealbreaker is that if we fail to maintain frame rate, WM_TIMER messages are entirely starved. This violates our graceful degradation requirement. When frame rate begins to degrade, code dependent on WM_TIMER shouldn’t stop entirely.

Even worse, the modal dialog loop has a freaky historic detail. It waits for the message queue to be empty before displaying modal dialogs. When painting can’t keep up, modal dialogs simply don’t appear.

We tried a bunch of variations, setting flags when stepping or painting, but they all had critical flaws. Some continued to starve timers and dialog boxes and some degraded by ping-ponging between 30 Hz and 15 Hz, which looked terrible.

Clever Approach #2: PostThreadMessage + WM_ENTERIDLE

A standard message loop didn’t seem to be getting us anywhere, so we changed our timeSetEvent callback to PostThreadMessage a custom message to the main loop, who knew how to handle it. Messages sent via PostThreadMessage don’t go to a window, so the event loop needs to process them directly. Since DialogBox and TrackPopupMenu modal loops won’t understand this custom message, we will fall back on a different mechanism.

DialogBox and TrackPopupMenu send WM_ENTERIDLE to their owning windows. Any window in IMVU that can host a dialog box or popup menu handles WM_ENTERIDLE by notifying a global idle handler, which can decide to schedule a new frame immediately or in N milliseconds, depending on how much time has elapsed.

What Went Wrong

So close! In our testing under realistic workloads, timeSetEvent had horrible pauses and jitter. Sometimes the multimedia thread would go 250 ms between notifications. Otherwise, the custom event loop + WM_ENTERIDLE approach seemed sound. I tried timeSetEvent with several flags, but they all had accuracy and precision problems.

What Finally Worked

Finally, we settled on MsgWaitForMultipleObjects with a calculated timeout.

Assuming the existence of a FrameTimeoutCalculator object which returns the number of milliseconds until the next frame:

int runApp() {
    FrameTimeoutCalculator ftc;

    for (;;) {
        const DWORD timeout = ftc.getTimeout();
        DWORD result = (timeout
            ? MsgWaitForMultipleObjects(0, 0, TRUE, timeout, QS_ALLEVENTS)
            : WAIT_TIMEOUT);
        if (result == WAIT_TIMEOUT) {
            simulate();
            ftc.step();
        }

        MSG msg;
        while (PeekMessage(&msg, 0, 0, 0, PM_REMOVE)) {
            if (msg.message == WM_QUIT) {
                return msg.wParam;
            }

            TranslateMessage(&msg);
            DispatchMessage(msg);
        }
    }
}

Well, what about modal dialogs?

Since we rely on a custom message loop to animate 3D scenes, how do we handle standard message loops such as the modal DialogBox and TrackPopupMenu calls? Fortunately, DialogBox and TrackPopupMenu provide us with the hook required to implement frame updates: WM_ENTERIDLE.

When the standard DialogBox and TrackPopupMenu modal message loops go idle, they send their parent window a WM_ENTERIDLE message. Upon receiving WM_ENTERIDLE, the parent window determines whether it’s time to render a new frame. If so, we animate all visible 3D windows, which will trigger a WM_PAINT, which triggers a subsequent WM_ENTERIDLE.

On the other hand, if it’s not time to render a new frame, we call timeSetEvent with TIME_ONESHOT to schedule a frame update in the future.

As we saw previously, timeSetEvent isn’t as reliable as a custom loop using MsgWaitForMultipleObjectsEx, but if a modal dialog or popup menu is visible, the user probably isn’t paying very close attention anyway. All that matters is that the UI remains responsive and animation continues while modal loops are open. Code follows:

LRESULT CALLBACK ModalFrameSchedulerWndProc(HWND hwnd, UINT message, WPARAM wparam, LPARAM lparam) {
    if (message == idleMessage) {
        stepFrame();
    }
    return DefWindowProc(hwnd, message, wparam, lparam);
}

struct AlmostMSG {
    HWND hwnd;
    UINT message;
    WPARAM wparam;
    LPARAM lparam;
};

void CALLBACK timeForPost(UINT, UINT, DWORD_PTR user_data, DWORD_PTR, DWORD_PTR) {
    AlmostMSG* msg = reinterpret_cast<AlmostMSG*>(user_data);
    PostMessage(msg->hwnd, msg->message, msg->wparam, msg->lparam);
    delete msg;
}

void PostMessageIn(DWORD timeout, HWND hwnd, UINT message, WPARAM wparam, LPARAM lparam) {
    if (timeout) {
        AlmostMSG* msg = new AlmostMSG;
        msg->hwnd = hwnd;
        msg->message = message;
        msg->wparam = wparam;
        msg->lparam = lparam;
        timeSetEvent(timeout, 1, timeForPost, reinterpret_cast<DWORD_PTR>(msg), TIME_ONESHOT | TIME_CALLBACK_FUNCTION);
    } else {
        PostMessage(hwnd, message, wparam, lparam);
    }
}

class ModalFrameScheduler : public IFrameListener {
public:
    ModalFrameScheduler() { stepping = false; }

    // Call when WM_ENTERIDLE is received.
    void onIdle() {
        if (!frameListenerWindow) {
            idleMessage = RegisterWindowMessageW(L"IMVU_ScheduleFrame");
            Assert(idleMessage);

            WNDCLASS wc;
            ZeroMemory(&wc, sizeof(wc));
            wc.hInstance = GetModuleHandle(0);
            wc.lpfnWndProc = ModalFrameSchedulerWndProc;
            wc.lpszClassName = L"IMVUModalFrameScheduler";
            RegisterClass(&wc);

            frameListenerWindow = CreateWindowW(
                L"IMVUModalFrameScheduler",
                L"IMVUModalFrameScheduler",
                0, 0, 0, 0, 0, 0, 0,
                GetModuleHandle(0), 0);
            Assert(frameListenerWindow);
        }

        if (!stepping) {
            const unsigned timeout = ftc.getTimeout();
            stepping = true;
            PostMessageIn(timeout, frameListenerWindow, idleMessage, 0, 0);
            ftc.step();
        }
    }
    void step() { stepping = false; }

private:
    bool stepping;
    FrameTimeoutCalculator ftc;
};

How has it worked out?

A custom message loop and WM_ENTERIDLE neatly solves all of the goals we laid out:

  • No unnecessary polling, and thus increased battery life and performace.
  • When possible, the 3D windows animate at 60 Hz.
  • Even degradation. If painting a frame takes 40 ms, the frame rate will drop from 60 Hz to 25 Hz, not from 60 Hz to 15 Hz, as some of the implementations did.
  • Animation continue to play, even while modal dialogs and popup menus are visible.
  • This code runs well on XP, Vista, and Windows 7.