Writing Resilient Unit Tests

By Andrew Wilcox

I happened to be looking at a unit test I wrote a couple months ago and was dismayed to realize it wasn’t working any longer:

test("don't launch the rocket in stormy weather", function () {
    setup_rocket_systems_controller();
    setup_fake_ground_based_internet();
    set_weather_to_stormy();
    equal(ready_to_launch(), false);
});

Today this test will always go green — regardless of whether the code launches the rocket in stormy weather or not.

My unit test worked when I first wrote it.  But my test was also fragile: it was easily broken in the normal course of code development.

Here’s the code being tested as it appears today:

function ready_to_launch() {
     var controller = connect_to_rocket_systems_controller();
     if (! controller) {
         return false;
     }

     if (controller.get_fuel_level() < needed_fuel()) {
         return false;
     }

     var internet = controller.connect_to_ground_based_internet();
     if (! internet) {
         return false;
     }

     if (internet.obtain_weather_report() == WEATHER_STORMY) {
         return false;
     }

     return true;
}

Of course we could argue whether this is the best coding style to use: whether each check could be split out into a separate function, or whether exceptions should be thrown instead of returning false, and so on.  But with working unit tests such improvements in design are easy to make.

Without unit tests (or with broken unit tests!) changing the code can easily break things without us noticing.  With my broken unit test, someone could be working on improving or enhancing the code, and they could make a small typo that broke the code which prevents launches in stormy weather, and they would think that everything was OK because — after all — there’s clearly a unit test that checks for that!

Why did my unit test break?  What happened was:

  • We noticed that launching our rockets without enough fuel often resulted in a crash;

  • An engineer carefully reviewed all the changes that would be needed to implement a fix, and wrote unit tests to check that the code would not launch the rocket when it didn’t have enough fuel;

  • A couple places in the code that needed to be updated were overlooked but caught by unit tests going red, and so were quickly found and fixed;

  • A few unit tests broke by going red because they themselves now needed to fuel the rocket in order to do their test, but they were also quickly found and fixed;

  • My unit test also now needed to fuel the rocket to do its test, but what it checked was just that the rocket wasn’t ready to launch… and so it was now going green not because of stormy weather but because it was failing to fuel the rocket!

At this point the code to check for stormy weather could be inadvertently broken by further code changes and no one might notice.  We could even delete the check for stormy weather entirely and my unit test would still go green!  Bit rot has set in… there’s impressive looking code to avoid launching the rocket when we’re not supposed to… and impressive looking unit tests to test that… and yet when stormy weather comes we could actually launch the rocket anyway.

There are two ways for a unit test to be broken, false-red tests and false-green tests:

  • false-red tests go red even if the code feature they’re testing is correct;

  • false-green tests go green even if the code they’re testing is incorrect.

Because we run all unit tests automatically before pushing to production, red tests are quickly found and fixed.  Thus any buggy unit tests that might be able to sneak into the code base are the false-green ones: the ones that say “fine!  everything’s fine!  no problem!”… even when the code is broken.

“Happy path” unit tests rarely break by going false-green.  A happy path test is one that checks that the desired result is achieved when all the necessary conditions are in place.  E.g., we launch our rocket, and it makes it to low Earth orbit, and it inserts into the correct orbit, and it matches velocity with the space station, and it docks without crashing, etc. etc.  Or, in the case of IMVU, a happy path test would be that customers can place orders when they have enough credits, the product is available, and so on.

Happy path unit tests rarely go false-green because of entropy: there are typically many more ways for something to fail than there are for something to succeed.  It’s possible that I might manage to break a test and break the code at the same time and have my changes conspire to get the test to go green anyway…

test("2 + 2", function () {
    equal(plus(2, 2), 5);
})

but it’s rather unlikely.

“Sad path” unit tests on the other hand check that things do in fact fail when they’re supposed to.  “The rocket can’t be launched in stormy weather” is one example.  Another example would be “customers can’t buy a virtual product when they don’t have enough credits”.

Sad path unit tests are often important for the integrity of the business.  Suppose the check for not being able to buy a product when the customer had insufficient credits wasn’t working.  We might not notice this — just using our application ourselves — because when we use our application we don’t generally try to buy things when we don’t have enough credits.

But though they are often quite important, these sad path unit tests are more easily broken as the code continues to be developed because there are many ways that things can fail — and if the test is just checking for “something failed” then lots of things might be able to do that.  For example, this unit test will go green if the code throws any exception:

test("names must be a string", function () {
    raises(function () { set_name(3.14159) });
});

and so it will be broken by any bug that manifests by throwing an exception.  In fact when I review existing unit tests that merely check if an exception has been thrown, I’ve found that they’ve become useless more often than not!

Thus extra care is needed when writing sad path tests.  It’s not enough to just check whether the code failed — we need to check whether it specifically failed for the reason that we’re testing for.

For this we need to do an analysis: are there any other code paths that could generate the same result as we’re testing for?  For example, it is better to check for the specific exception being thrown — but that could still be buggy if there’s more than one way that specific exception could be thrown.

In the rocket example, one option would be to instrument the code to make it more testable by having it return a string indicating why we’re not ready to launch:

function ready_to_launch() {
    var controller = connect_to_rocket_systems_controller();
    if (! controller) {
        return "the rocket systems controller is offline";
    }

    if (controller.get_fuel_level() < needed_fuel()) {
        return "there is insufficient fuel to launch";
    }

    var internet = controller.connect_to_ground_based_internet();
    if (! internet) {
        return "the ground based Internet service is not available";
    }

    if (internet.obtain_weather_report() == WEATHER_STORMY) {
        return "flight protocols do not permit launching in stormy weather";
    }

    return null;
}

Code will often need some enhancements to make it testable, though you may be able to find ways that make the testability requirement less intrusive.

And so if I had written my unit test at the time to check for the specific condition I was testing for:

test("don't launch the rocket in stormy weather", function () {
    ...
    set_weather_to_stormy();
    equal(
        ready_to_launch(),
        "flight protocols do not permit launching in stormy weather"
    );
});

then when the fuel requirement had been added later my test would have broken by going falsely red instead of going falsely green — and it would have been fixed.

My test would have been a more resilient unit test — one that over time would have had a better chance of being kept working as development proceeded.

Distributed Redis Transactions

By Eric Hohenstein, IMVU, Inc.

Preface

Four times a year, employees of IMVU (engineers and non-engineers) are encouraged to participate in what we like to call “Hack Week“. Hack week is when we are given the opportunity to work on anything that we want so long as there is the possibility that it could eventually achieve one or more IMVU business objectives. Within that constraint, we have total freedom. On the Monday following hack week we get to show off what we accomplished to the rest of the company. Some people choose to work on relatively small projects by themselves and others will organize and participate in larger projects with many people.

Our last hack week was November 28 through December 2, 2011. This is a description of the project that I worked on in conjunction with two other people, David Johnson and Peter Offenwanger, both software engineers at IMVU.

What we built

We built an ACID compliant distributed key value database implementing a small but critical subset of the redis command set and wire protocol:

GET
SET
DEL
WATCH
UNWATCH
MULTI
EXEC

The source is available on github and has been released as open-source under the MIT license.

The system should theoretically be able to scale to arbitrary limits since there is no singular bottleneck in the design. Clients connect to a homogeneous pool of client session hosts. The client sessions hash keys to buckets and each bucket maps to a separate erlang process and back-end redis storage instance. The number of possible buckets is limited only by the number of bits used to generate the hash (currently 2^128). The client sessions and the buckets communicate with the a homogeneous pool of transaction monitors. Only the cluster startup/monitoring process can currently limit scalability but this process can also be distributed if necessary though that should only be needed for a frighteningly large cluster.

The motivation for this investigation was to explore the performance limits of redis beyond what a single redis host can offer. Any solution based on standard redis has an upper bound on throughput without sacrificing consistency. Some applications of key value stores require very strong consistency (for instance a separate index view of a data set maintained in parallel with the data set). Redis is remarkably performant, especially given that it operates with just a single thread. However if an application requires strong data consistency and higher throughput than redis (or any other key value data store) can offer, the options with existing solutions are very limited. The redis team is in the process of adding a clustering solution but it will not support distributed transactions, only consistent hashing.

How we built it

The project started as an application I built to learn erlang shortly after the previous hack week in July. Before we started this last hack week the project consisted of about 700 lines of code that implemented the entire supported command set but only accessible from the erlang shell and without any persistence and without a transaction monitor. By the end of this last hack week we had around 1200 lines of code and a working prototype. The main pieces missing from the prototype are crash recovery and process monitoring.

How we benchmarked it

We spent a fair amount of time during hack week trying to reuse a standard redis benchmarking tool. This gave us good visibility into redis and dtm-redis performance on a single host with and without the appendfsync redis configuration. After hack week I spent some time building a custom benchmarking tool in C (included in the dtm-redis project) to work around some limitations of the standard redis benchmarking tool which include:

  • no support for transactions
  • no latency measurements, only throughput
  • no support for multiple listening hosts

This custom tool accepts a list of host:port parameters on the command line, a parameter for the number of clients per host, the number of seconds to run the test, and the type of benchmark to perform (get_set or trans). It produces output like the following:

ehohenstein@localhost:~/dtm-bench$ ./dtm-bench host1:6379,host2:6379,host3:6379,host4:6379 100 60 get_set
creating 100 clients connecting to each of 4 hosts
starting clients
clients running for 60 seconds
stopping clients
total requests:  17604943
total latency:   23993.386276540
average latency: 0.001363
max latency:     0.217846477
elapsed time:    60.007721772
requests/sec:    293377

 

When testing transactions, a “request” represented a transaction involving five round trips: WATCH, GET, MULTI, SET, EXEC. The average latency is calculated across the entire transaction though the max latency is calculated across each individual round trip. In at least one instance this resulted counter-intuitively in the measured max latency being lower than the average latency.

Most tests were performed with 100 clients per listening host. Experimentation showed that throughput grew up to that level and flattened out once that number of clients per host was reached.

The IMVU operations team was very helpful by setting up and configuring 4 hosts in the production cluster with redis installed to use for benchmarking. These hosts had dual six-core Xeon 2.66 GHz processors with hyper threading enabled and with spinning disks. When testing dtm-redis, 8, 16, or 32 buckets were configured, each with its own separate redis instance listening on 127.0.0.X.

Benchmark results

Outside of transactions, the system was able to easily beat standard redis throughput performance. Throughput appears to increase linearly with the number of hosts (at least with the number we used in the benchmark test). Latency of dtm-redis was mostly similar to standard redis but max latency appears to increase exponentially with the number of hosts which is troubling.

Transactions were tested in both synchronous I/O (durable) and non-synchronous I/O (non-durable) modes. The durable mode was not totally durable since when performing this test, the redis configuration was not altered to make it use the appendfsync persistence option. It was nevertheless a fair approximation since in this mode, dtm-redis was performing 2 synchronous writes to disk per transaction or write operation.

In non-durable mode, dtm-redis throughput again beat standard redis and had linear growth with the number of hosts. This time, average latency of dtm-redis was roughly constant and max latency again grew exponentially, exceeding redis by 20x with four hosts.

In durable mode, dtm-redis throughput was extremely low. When testing with just a single host and appendfsync persistence, both redis and dtm-redis allowed roughly 300 requests per second which is roughly 3 orders of magnitude below the throughput without appendfsync. This is not surprising since the disk would be the bottleneck in both cases. In durable mode, dtm-redis again had roughly linear growth in throughput with the number of hosts though only up to a maximum of 1925 transactions per second with four hosts. Average latency and max latency were both roughly constant though note that max latency reached more than 3 seconds in the 3 host test. This is especially alarming given that max latency was measured for individual round trips rather than the transaction as a whole.

Durable mode performance could likely have been greatly increased using a host with a SSD. The IMVU operations team didn’t have one available for us to use for benchmarking so we aren’t able to compare apples to apples. However, I noted on my development system, which does have an SSD, while working on the benchmark tool that I was able to get over 5000 transactions per second locally with only 1 CPU in a linux VM being shared by redis, dtm-redis, and the benchmark tool. Chances are that SSD would increase durable mode performance by a factor of at least 10x.

One thing that was interesting was that in durable mode the throughput, average latency, and max latency all improved with the number of clients per host. This is almost certainly a side effect of an optimization which was chosen in dtm-redis to batch together multiple I/O operations waiting in the binlog process queue.

Another thing that was very interesting is that when running the non-transaction and non-durable transaction benchmarks, the hosts under benchmark test spent almost all of their CPU time running erlang code. Each of the 8 redis processes had roughly 20-25% CPU utilization and erlang had 800-1000%.

Conclusions

This prototype was built with only 1200 lines of code. If we were to build a fully functional and robust system, it’s probably fair to estimate that it would require at least a 10x increase in the amount of code necessary. That being said, this wasn’t a giant project. I can’t say if this would provide enough value to IMVU to build, but it is at least within reason to imagine us building it.

The use of erlang for this application is somewhat questionable. It was useful as a learning tool to build this prototype in erlang. I suspect however that erlang is not efficient enough to match the kind of performance typically expected from redis. The difference in average latency and the unacceptably high max latency of dtm-redis together with the CPU utilization of dtm-redis would suggest that erlang is spending too much time copying data from process to process. It’s hard to be sure without building another prototype but I believe that C would be a better choice. Building this system in C would be significantly more code however and a corresponding difference in development and maintenance costs.

Of course in durable mode, erlang performance is really not an important factor. In this case, the disk is the limiting factor in performance. The scalability of this system (as currently defined) is actually based solely on the ability to distribute disk I/O. Adding a large number of disks would likely allow a significant performance increase but might also reduce the mean time to failure.

After having completed the prototype it was clear that the durable mode is likely not going to be useful due to the very low throughput and high latency. An alternate design that I would like to explore is one that would be more or less equivalent to the default redis mode where the data-set is periodically checkpointed. It should be possible to obtain the performance characteristics of dtm-redis seen in non-durable mode while maintaining (check-pointed) consistency by periodically synchronizing all bucket processes and having them each initiate a save operation. This will cause occasional high latency but the average latency should remain low.


Gephi Plugins Developers Workshop at IMVU

by Paco Nathan

At IMVU, the Data team often works with large data sets which represent customer interactions across social graphs. This kind of data cannot be explored very well using relational databases, such as MySQL. Instead, we rely on other tools for analyzing social graphs.

One tool which we really like is called Gephi, an open source platform for visualization and analysis of graphs. Our team uses Gephi as an interactive UI for exploring relationships and patterns in graphs. We also use it for calculating social graph metrics and other statistics which help us to characterize large graphs. One favorite feature in Gephi is how readily it can be extended by writing plugins.

We’ve received much appreciated help and guidance from the Gephi developer community. Now we are glad to be able to host the first Gephi workshop about developing plugins. Bring your laptop, install the required software from sources provided, and work with mentors to learn how to write your own Gephi plugins! And get to network with other people in the Gephi developer community.

To RSVP, please click through the Meetup.com link below.

Title: Gephi Plugins Developers Workshop

Organizers: Mathieu Bastian and Paco Nathan

Date and Time: Thursday, October 6, 2011, 7:00-9:30pm

Location: IMVU, Inc.. 100 W. Evelyn Ave #110, Mountain View, CA

Cost: No cost

Food and drinks: Food, drinks, and wifi will be provided

Agenda: This is the first Gephi workshop dedicated to Gephi Plugins developers! Come and learn how to write your first Gephi plugins and ask questions.

The workshop will start with a 1-hour presentation of Gephi’s architecture and the different types of plugins that can be written with examples. Details about Gephi’s API, code examples and best practices will be presented in an interactive “live coding” way. The Gephi Toolkit will also be covered in details. The second part of the workshop will be dedicated to help individuals with their projects and answer questions. Depending on the audience’s needs we can discuss plugin design, how to code UI, databases, toolkit, data sources or any plug-in idea you have.

Gephi is modular software and can be extended with plugins. Plugins can add new features like layout, filters, metrics, data sources, etc. or modify existing features. Gephi is written in Java so anything that can be used in Java can be packaged as a Gephi plugin! Visit the Plugins Portal on the wiki and follow the tutorials.

Link to event on meetup.com: Gephi Plugins Developers Workshop

Silicon Valley Open Source Business Intelligence Meetup at IMVU

By Tim Levine and Paco Nathan

At IMVU, our leaders are increasingly leveraging data to inform decisions. In response to this we have embarked on leveraging an off-the-shelf business intelligence solution to enable ready access to data.  Considering our needs, present and future, and the personality of the company we have deployed a commercial version of Pentaho.  Using this suite, we have been able to provide fresher data (daily refresh versus monthly), more interactive data (using ROLAP cubes), and newly processed data (30-day rolling of usage).
Our success has gotten to the point where we would like to contribute and draw from the larger community and so we will be hosting  a meetup of those interested in open source business intelligence.  To RSVP, please click through the meetup.com link below.

Title: 

Silicon Valley Open Source Business Intelligence Meetup

Organizer: 

Tim Levine and Paco Nathan

Date and Time:

Thursday, September 15, 2011
7:30-9:30pm

Location:

IMVU Corporate Headquarters
100 W. Evelyn Ave #110 , Mountain View, CA

Cost:
No cost

Food and drink:
Food and drinks will be provided

Agenda:

1 Food and networking
2 Introductions around the room, find burning questions
3 Tim Levine and Paco Nathan will present on how Pentaho fits into our technology stack to solve problems involving little and Big Data alike. Specifically how we use Pentaho to fetch, munge, and report data as well as bringing in R and hadoop in the AWS cloud to provide otherwise impossible insights.
4 Discussion of burning questions
5 Solicit ideas for next meeting, decide on frequency
6 Announcements

Link to event on meetup.com
 Silicon Valley Open Source Business Intelligence Meetup


Partner Integrations

By Rusty Larner, IMVU

IMVU partners with several companies to provide extra services on our website that are not our core competencies, especially when it makes more sense to use others that already implement the service.  Each of these integration efforts is a new process, but we have found several practices that make them easier to accomplish and more likely to succeed.

Doing a technical review during the initial partner evaluation sounds obvious, but is extremely critical.  Not just at the marketing level, but having our engineers talk with theirs about the expectations we have on performance (connections per second, etc.) as well as the actual integration work helps reduce confusion before the effort starts.  Several times we have surprised our partners with either the usage we expect for a new feature, or how we want to integrate their services into our entire development cycle.  Sometimes the technical review at this stage determines that it won’t really be reasonable to do the integration – helping us to ‘fail fast’ instead of waiting until attempting the integration.

Continuing this communication on through the initial implementation effort is also important. The best way to accomplish this is having both sets of engineers co-located during the initial integration effort.  We have found that having the engineers from our partner work directly with our engineers helps cut down on the communication delays.  There are always questions during any integration, about API documentation or error messages or just adding logging to determine why a given request isn’t working – having engineers in the same room helps immensely.  The timing on this is also critical – we do not want to have someone fly out before we’ve started the integration work, but we also do not want to go too far down the rabbit hole before having an expert there to help.  Having flexible partners helps – during one integration process we ran into difficultly far earlier than expected – our east-coast partner flew out an engineer the next day to work with our team, and he stayed until we determined the problem.

Another part is being able to integrate tests against our external partners into our continuous integration process.  We have several solutions here – first, we implement a full ‘fake’ version of the partner’s API that we use during standard commits.  This allows our large suite of unit and functionality tests to not be dependent on external connections that would make them slower and sometimes intermittent.  We also have a separate set of tests that we run against the actual API.  These include tests that confirm our fake APIs work the same as the actual APIs.  These external tests are run every couple of hours, so we can detect if the partner’s APIs change.  The combination of these tests allow us to feel confident with our continuous integration efforts, even when doing things like changing our Music Store (which talks to our music partner).

Metrics and graphing are also a very important part of our process.  They let us quickly determine if there is a problem with the connection to our partner.  The faster you know there is a problem, the faster you can respond!  (A couple of times during my first years at IMVU, we only found out about problems through our forums – not the way to respond quickly to problems!  Plus, it becomes far harder to fix the problem when you don’t know exactly when it started…)  For these integrations, we keep several levels of metrics.  First, each point where we talk to an external customer gets metrics – for example, each music purchase attempt gets counted from our side.  But we also ask our partners to expose a service to report their own metrics to us.  This gives us better pointers to problems when they do occur – i.e. if there is a problem with the network connections between our servers and our partners’ servers.

The metrics and graphing are only a part of the continuing maintenance that we put in place.  Another major part is communication paths.  We create an internal mailing list that we ask our partners to send all email to – so when we internally change responsible individuals we don’t have to have all of our partners change their address books.  We also ask our partners to let us know as soon as possible whenever they detect a problem, then follow up later on to let us know how they solved the problem.  Where available, we’ve asked our partners to include our contact info for some of their monitoring alerts.  Our operations staff also has contact details and a call escalation path for our partner and solution instructions so they know how to handle any alerts that are triggered by our monitoring.

So, in review – for partners of IMVU, here is the checklist we (currently) follow:

* Initial engineering technical review before any contract is signed

* Co-located engineering support early in the integration schedule (usually within the first week or two)

* Regular testing calls being made to the partner’s APIs (including test purchases, etc.)

* Expose a web service containing real-time metrics important for the integration

* Call escalation path for operations if issues arise

* Continuous communication about scheduled maintenance, unplanned issues, or upcoming upgrades.

IMVU and Wealthfront Presenting Panel on Continuous Deployment

Most people following IMVU’s engineering practices know that we are big advocates of Continuous Deployment, the process in which every check-in is shipped live to production just minutes after commit.  IMVU joined with Wealthfront, another big advocate of Continuous Deployment, to put-together a panel to share experiences moving fast and always shipping to customers.

The panel represents companies from small startups with 1-2 engineers to larger-scale operations with over 40 engineers, each with very different technology stacks.  It is a great learning opportunity for anybody interested in succeeding with Continuous Deployment.

Space is limited.  If you would like to request an invite, please e-mail your name and work info to cd-panel-invite@imvu.com and you will receive a password to http://continuousdeployment.eventbrite.com/  We look forward to seeing you!

Moderator:

Jez Humble, author of Continuous Delivery

Panel:

Votizen
Industrial Logic
Wealthfront
IMVU

Date and Time:

Thursday, May 26, 2011
6:00 PM – Gathering begins
6:30 Panel discussion
7:30 Socializing / networking

Location:

The location in in Palo Alto, less than a 10-minute walk from University Ave. Caltrain station.  Details will be provided in the invite (see above to request an invite).

More Upcoming Events

Anybody interested in Continuous Deployment can also see IMVU presenting at upcoming conferences:

Login 2011, Seattle, WA, May 16-18, 2011
Real-time learning using A/B testing and Continuous Deployment

GDC Europe, Cologne, Germany, August 15-17, 2011
Using Continuous Deployment to Move Fast and Release without Pain


IMVU’s Employee-Friendly Policy on Side Projects

By Brett G. Durrett, VP Engineering & Operations


I am excited to share IMVU’s policy regarding projects that employees build in their personal time.

IMVU’s approach breaks from the standard way most companies deal with employee side projects, both by addressing potential conflicts of interest before an employee invests their time and by protecting the employee against future claims to their side project work.  I hope this policy will encourage other companies to adopt policies that recognize how many people apply their professional talents to both work and recreation.   Since I am sharing this in the engineering blog, the focus is programmer-centric; however, the policy applies to all IMVU employees.

What’s the Problem?

To understand why IMVU wanted to take a new approach it is helpful to understand how companies usually handle employee side projects.  State and federal laws generally provide for ownership interests for an employer in work done by employees, and typically this is bolstered by having each employee sign an Employee Inventions Agreement that provides the company with broad ownership rights to work the employee produces.  Most developer-level employees are salaried (not paid hourly) so there is no clear demarcation between “work time” and “personal time” with regard to their professional lives.  Also, companies change their business interests, so what may apparently have been outside of the scope of the Employee Inventions Agreement when an employee joined a company may become subject to company ownership as the company expands or changes its actual or anticipated business interests.  Or a particular employee may be unaware of work being done or contemplated elsewhere in the company.  The combination of these can lead to situations where ownership rights are ambiguous, or worse, lead to litigation to resolve, with much of the burden falling on the employee.

At IMVU, we like to attract and hire creative people who love to build things: curious tinkerers, hackers, people who love to code for both work and pleasure.  When reviewing job candidates, we see involvement with open source and community projects as added value.  This is not unique to IMVU – there are many other companies in Silicon Valley looking for the exact same types of creative builders.

But here is where potential problems arise.  The same “always creating” DNA that makes an employee so appealing for a company can create trouble for them under the broadly defined ownership terms of their Employee Inventions Agreement.  Once at a company, an employee’s personal project can become subject to company ownership if the company asserts that the project used company resources or is related to the business or research interests of the company.  For understandable reasons, “business interests” are open ended or broadly defined in an Employee Inventions Agreement.  So a company in the mobile space may be able to argue that any mobile development is related to the company business.  In practice, when personal projects have insignificant value companies have little motivation to deal with the conflict associated with asserting claim to the work.  However, if a personal project becomes valuable, the company has an incentive to claim ownership.  Even if a company doesn’t immediately claim ownership, this is a possibility later (e.g. an acquiring company or new management looking for potential intellectual property gains in the company’s assets).  In the worst-case scenario, somebody could work on what they believe to be a personal project for years only to find out later that a company owns the entirety of the work.

IMVU’s Approach to Foster Creativity

Rather than leave employees in this ambiguous state, IMVU takes a different approach.   IMVU employees can declare that they are starting a personal project and, before the project is underway, the company will explicitly state whether or not it believes it has an interest in the project.  If the company does not claim interest, it will provide to the employee a formal release that acknowledges the employee’s ownership rights to the project, and the employee will be protected even with future changes to management or control of the company.

While the policy’s benefits to employees may be obvious, I see this policy as a win for the company as well.  In a typical company, conflicts of this kind are identified only after an employee has invested a lot of time in a project, making resolution more difficult and potentially tarnishing the relationship between the company and the employee.  IMVU’s policy allows potential conflict to be addressed at the start of a side project, where the investment is low and there is not yet a tangible asset to contest.  Perhaps most importantly, it encourages honesty and transparency around side projects that are already happening.  In contrast, I am aware of a large company with a “zero side projects” policy that drives the issue underground and results in some employees using pseudonyms and creating side projects anyway (which will not protect the employee if the project becomes valuable).

I hope to see other companies adopt side project policies similar to IMVU’s, or at least encourage companies with zero-tolerance policies to evaluate the need (and value) of such a prohibitive culture.  Instead of making employees hide their recreational projects and fear repercussions from the company, encourage openness and creativity, all the time!

HTML5 – New UI Library for Games [Annotated Slides]

[slideshare id=7322096&doc=html5-newuilibraryforgameschadaustinannotated-110320042423-phpapp01&type=d]

HTML5: The New UI Library for Games

At GDC 2011, on behalf of IMVU Engineering, I presented our thesis that HTML is the new UI library for developing desktop applications and games.

HTML is Winning

The browser wars are hotter than ever. We have four major browser vendors, each with significant market shares, competing in earnest. Features such as SVG, Canvas, GPU acceleration, FAST JavaScript engines, and even WebGL are widely available.

It’s a great time to be a web developer!

Some Terminology

When I refer to HTML, I’m really referring to the entire stack of web technologies: markup, style sheets, scripts, canvas, websockets, etc. Also, Mozilla’s branding is confusing: Firefox the browser is built by Mozilla the company and is powered by Gecko/XULRunner the technology. Internally, we use these terms interchangeably.

History of IMVU’s UI

2004-2007: Win32, OpenGL, and C++

In 2004, when the company was started, IMVU’s entire UI was written in Win32, OpenGL, and C++. This is a fairly traditional, though old school, way to build UIs. However, our framework was extremely rigid, and it was hard to hire people to work on it. We tell legendary stories of “I just wanted to move that button from here to there, but I couldn’t figure it out in a few hours, so I gave up”.

In addition, our iteration speeds were terrible. You had to recompile, maybe add some new functionality to our UI framework, and then restart the client. This made UI development hard enough that we simply didn’t do it. The product didn’t change much for 3 years.

2007-2009: Flash

In 2007, a couple engineers realized we had a problem and spent a weekend integrating Adobe Flash. Flash is widely available and, if you get permission from Adobe, you can distribute Flash with your application. We implemented support for Flash UI panels and rendering Flash to translucent overlays atop the 3D scene.

Using Flash to develop UI was definitely a huge improvement. We used the Flex UI library and its standard components to get a head start. We were suddenly able to iterate on our UI, and we could add capabilities like animation and video to our client.

That said, Flash wasn’t all roses. It consumed a great deal of memory, especially address space, and mxmlc compiles were sloooow. Finally, even though Flex is open source, we found several bugs.

2009+: HTML

In 2009, we began a project to rebuild our entire experience from first principles. Our very talented design team spent months iterating with paper prototypes and user tests, and produced a very snazzy client architecture. At that point, the engineering team had some implementation choices:

1) Should we build this UI with a custom framework?
2) Should we use our existing Flash technology?
3) Or should we try integrating a browser and use HTML?

We decided to give a browser a shot, and we started with Firefox, since a couple of our people had familiarity with the Gecko codebase.

Within days we had something up and running, and the performance was amazing! It loaded quickly, was super-responsive, and used hardly any memory. We instantly decided to use HTML for the entire project.

The UI we built in HTML matched the intended design almost to the pixel. Iteration was nearly live: and we could even do our development in Firefox the browser with the Firebug add-on.

We patched Gecko to support rendering documents into textures so that we could overlay them on top of the 3D scene.

This let us build sophisticated UI elements across our entire product.

Benefits of HTML

Lingua Franca

IMVU is a Silicon Valley company, and we hire people from all backgrounds. We don’t necessarily hire people for the skills they already have; instead, we assume they’ll be able to learn whatever we need. That said, it’s valuable for people to have some background knowledge, and almost everyone these days knows a bit of the web stack. You can’t expect expertise in the nuances of the box model, but everyone knows a bit of HTML and JavaScript.

Amazing Tools

The web stack brings with it a plethora of tools like Firebug and Firebug Lite. We use these tools every day to rapidly iterate on our markup and style sheets. We can even integrate Firebug Lite into our downloadable client and work with live data.

Ecosystem of Libraries

In the last few years, we’ve seen a proliferation of amazing JavaScript libraries such as jQuery and YUI. If you build your UI with web technologies, you can leverage these libraries directly.

Advertising

When we integrated web technologies into our application, we discovered an unintended benefit. We could now directly benefit from the tens-of-billion dollar Internet advertising industry, which has built an entire advertising pipeline around HTML, JavaScript, and Flash. One engineer integrated ads into our product in less than one week.

Demo

At this point in the presentation, I gave two mini demonstrations. First, I showed what Firebug Lite was capable of: JavaScript commands, real-time editing of attributes and styles.

Then I demonstrated the process by which we iterate on our UI by, in 5 minutes, adding a jQuery accordion to an HTML overlay and demonstrating that the change had already taken effect.

Performance?

The biggest concern we hear from others is “Wouldn’t the web stack be slow and take a lot of RAM?” There was a time when browsers were comparatively slow and heavy, but with today’s fast CPUs and the continual pressure to make browsers faster (rich web experiences, tabbed browsing), we’ve found that HTML has never been a bottleneck for us.

Gecko consumes less than 1 MB of RAM for each loaded document, and we even think we’d be able to make every seat node in the scene into an HTML overlay.

You do need to be careful with DOM structures with huge numbers of children. Many DOM operations are linear in the number of children or siblings, so you’ll want to structure your DOM with a b-tree.

You also need to be careful with image tag and canvas memory usage. Avoid allocating for elements that aren’t immediately visible.

Drawbacks

Flash is still better at animation, but I think this will change. This may simply be a tool-chain issue.

3D on the web is possible now, but WebGL isn’t quite prime time. This year we expect WebGL penetration to reach 50% of the market (Chrome and Firefox), but we still depend on our software renderer for our customers without strong or reliable GPUs.

Tracing JITs are still RAM-hungry and don’t match native code performance yet. Core game components will still be written in close-to-the-metal C++.

Who else uses HTML for UI?

The independent game studio Wolfire uses WebKit for an in-game material and shader editor. They used the off-the-shelf editor component, CodeMirror, for their text editors.

Electronic Arts ported WebKit to PlayStation 3 for Skate 3. About 50% of their UI was written in HTML and JavaScript.

Netflix on PlayStation 3 uses WebKit so they can split-test their UI after they’ve shipped. That way they can improve their UI without shipping new software.

Several MMOs have added browsers so they can include forums, wiki, and help within their games.

How do I get started?

There are several open source and commercial libraries to help you get started. I’ve included a list in the slides above.

Let me know how it goes!

I’d like to conclude by saying that HTML has worked very well for us, and I strongly encourage you to give it a shot. Send me a message at chad@imvu.com to let me know if it works for you or not!

Q&A

I’d like to clarify some points I made during the Q&A session:

A man from Khronos approached me and said I was a bit hard on WebGL. If I came across that way, I didn’t mean it. I love WebGL and the work that Khronos has done, but I don’t think it’s prime time _yet_. We can’t port our existing product, which often runs on machines without a reasonable GPU, to WebGL yet. We do, however, intend to write a WebGL backend eventually. By the end of the year, we expect 50% WebGL adoption.

Someone asked about initial startup overhead. I said we probably consumed 30 MB of RAM and address space in base Gecko initialization, but that we’d done absolutely no work to reduce that. On the console, I expect it would be worth optimizing further.

There was some discussion around how our designers and engineers interacted. I mentioned that our teams are often cross-functional: our designers typically know enough CSS to make changes if they want and some of our engineers have strong design sense. We find it more productive to just sit down together and have an engineer and a designer work together over an hour to polish up the experience.

Thanks!

Thanks to Chuck Rector, Ben McGraw, Chris Dolezalek, Enno Rehling, Andy Friesen, James Birchler, Brett Durrett, Jon Watte, Laura Austin, Ben and Staci Scott, and Michael and Samantha Van Waardhuizen for providing valuable feedback during the preparation of these materials.

Buildbot and Intermittent Tests

By Jon Watte, Technical Director at IMVU, Inc.

At IMVU, we do continuous deployment. You may have read previous blog entries describing how we ship code to production 50 times a day. Today, I’d like to talk a little bit about what we’ve learned as IMVU has grown as a company and engineering organization, and what we’re doing to support this growth.

The workflow for anyone who wants to check code or assets into our website looks something like this:



As part of developing a feature increment, we also build the automated tests that prove that the feature works as intended. Over time, as we develop more and more features, we also accumulate more and more tests. Each time we check in code or assets, that check-in is run through all of the tests that have been developed over the years — at last count, this is over 40,000 tests, with seven digits’ worth of individual test assertions (the exact number varied depending on how you count).

The great thing about this is that, if your feature accidentally breaks some part of the code that you didn’t think or know about, the buildbot will tell you. You have the ability to run any test you want in your local sandbox before you check in, and you’ll generally run the tests that you think may be relevant, but in a code base the size and complexity of all of IMVU, you can’t possibly know or run everything.We have an informal goal for our buildbot to run all the tests in 12 minutes or less. This means that we can build 5 times an hour. Some contributors show up for work 8 am (Eastern time); others work until late in the night (Pacific time), so there’s opportunity for well over 50 builds per day.

However, not all builds result in success — the whole reason for having a buildbot is to catch the mistakes that inevitably slip in. When a submitted change causes one or more tests to fail, we call that “making buildbot red,” and that change is not allowed out onto the live website. Instead, you as committer will back out those changes (we have scripts that make this easy and painless), and investigate the failure (buildbot saves the appropriate log files and other information) to fix it before you re-submit.

Unfortunately, tests, like code, are written by humans. This means that tests will have bugs, Some bugs will immediately be obvious — but others will sit silently, working 99% of the time (or, more likely, 99.999% of the time). Then, once in a blue moon, some timing issue or bad assumption will surface, and the test will spuriously fail. What’s worse is that, most often, running the test again will likely not fail, so tracking down the failure requires a fair amount of diligence and effort. When this happens in your local sandbox, it’s not so bad, but if one of these intermittent test failures happens on buildbot, it’s a bit of a problem, because nobody can deploy their code to production while buildbot is red. At a minimum, the test failure has to be investigated to the point that someone understands that it’s an intermittent failure, rather than caused by the code that you committed.

Common causes of intermittent tests may be things that are hard to get right for humans (like systems with sophisticated caching strategies), or things that are hard or expensive to control for in a testing environment (assuming some pre-existing data, or absence of data, in particular databases, for example), or just because of the underlying platform. For example, we test user visible web pages using a system that drives web browsers, and then we assert things about both the state of the web page within the browser, the requests that the browser has made, and even screen shots taken of certain pages. Unfortunately, web browsers will sometimes decide to crash for no discernible reason. There are other components that are not 100% reliable, either, including tests that have to integrate with external services over the internet, although I’ll refrain from going too deep into that particular discussion in this post.

We have been trying to flush out intermittent tests by running tests all night, every night. Because no code changes between the runs, any test that goes red during this nightly grind is intermittent. We will look at the status in the morning, and any test that went red is distributed to the team that seems likely to know the most about the particular failure, for fixing.

Still, intermittent tests have been one of the most annoying thorns in the side of this entire process. As we scale the organization, buildbot will be busy most of the day, generally building three to five different change sets as part of the same build. Any of those committers will be prevented from deploying their changes to production until at least the next build has completed. This can add half an hour to the build-tests-deploy cycle, which hashes the mellow of continuous deployment driven development goodness!

To illustrate, the worst week of our internal buildbot for 2010 looked like this:

Clearly, in an environment that values quick iteration and agile development, if you find yourself in that position, you want to do something about it. So we did!

The key insight is that an intermittent test only fails some of the time, but a real failure caused by new code will fail all of the time. The second insight was that we could gather the specific outcome of each test within each test run, to track metrics on tests over time. We started by just adding these measurements and tracking tests over time. Once we learned that we could predict an intermittent test based on two successive test runs, and that we get clear green-or-red status out of each test run, we re-structured the way that we run tests.

Because we have 40,000 tests to run, allocated over several thousand test implementation files, we split the entire batch of tests across a large number of virtual machines, hosted on a few really beefy server-class machines (raid-6 SSD, 48 GB RAM, dual-CPU six-core-Xeon type machines). Our first implementation spreads the build files across different machines, and uses measurements of how long each test file is to statically sort the files into reasonably equal run lengths. This worked for many years, but now we have better information!

We updated our test runners to first discover all tests that need to run, and put those into a database of available tests. Each build slave will then request one test at a time from this database (which we call the Test Distributor), run the test, and report back. We start by doling out the slowest tests, and then dole out faster and faster tests, to achieve almost 100% perfect balancing in runtime. Even better, though — we know whether a test fails, while the build is still running. Thus, we made the distributor re-distribute a test that is reported as red to some other build slave. If the test comes back as green, we then classify the test as intermittent, and do not make that one red test get in the way of declaring the change set we’re testing as successful. In addition, we remove the build slave that reported the intermittent from the pool of available builders, so that we can analyze the cause of the intermittency whenever convenient.

Does this work?
Oh, yes! Here is a screen shot from the buildbot availability status this week. You will see that the buildbot caught some real, red check-ins, but intermittent tests are not getting in the way anymore. Thursday, we had an entire day with 100% green availability!

Pretty nifty, huh? I’m glad that I work in an environment as dynamic as IMVU that really cares about keeping our productivity up. The more we can remove annoying roadblocks from the path between developers doing hard work, and them seeing their hard work pay off, the better we will do.

REST and Games Don’t Mix

By Jon Watte, Technical Director at IMVU, Inc.

In web development, the REST (REpresentational State Transfer) approach to services often works well to build scalable, robust systems. Unfortunately, the principles behind REST collide with some necessary requirements of games, and the end result is that many problems faced by multiplayer games cannot be solved using REST techniques.

One pillar of REST design is that all necessary data for servicing a request is included in the request, and a client imposes no load on a server when not within the context of a request. For example, a web server serves a file to you when you pass it a URL; how to serve that file may be governed by cookies sent by your web browser; but once the file is served, the server conceptually needs to do nothing else to satisfy the client. If you are building, say, a photo sharing site, then this makes a lot of sense. A request may upload a new photo for some particular user, identified through URL tokens or cookies. Another request may list photos that match some particular criteria. A third request may remove or otherwise edit metadata surrounding the photos that are uploaded, by uploading some JSON or XML document to go with the photo. All in all, each request is isolated, and the server needs to do nothing (other than store the photos on disk) between requests.

Compare to a game of chess: Each move in a game of chess isn’t a complete new document (or chess board), but instead a delta over some previous state. Moves are checked for validity before they are accepted, by being interpreted in the context of the board. You could, conceivably, remember the old board state, and have a player upload a new board state that represents the state after his move, and derive what his move was by running a diff to the board. Unfortunately, even that is not enough to properly implement the rules of chess, because certain moves (castling, en passant pawn capture) depend on historical state of the board. To work around this, we can include a full set of historical state with the board, as well as the actual board state, and still implement a chess game using REST techniques.

All that remains, then, is to figure out how your opponent learns about a move having taken place, and knows how to download the new board state to display it. This is something that REST techniques simply do not solve, because the infrastructure of REST depends on the direction of requests, the completeness of state transfer, and the cacheability of data. To get rid of cache problems in the chess game above, you’d have to also keep a “next move id” counter as part of the request URL, so that the state of the board, including history, is addressed by a separate URL for each new move.

Another problem with the implementation of the chess game as a REST service is that it transfers the entire game state, including the board, for each move. This uses more network bandwidth than necessary. For something simple, like a chess board, this doesn’t really matter that much. Broadband makes bandwidth cheap, and even a cell phone will transfer 50 kilobytes of data in less than a second these days. However, for more complex games, like a 3D role-playing game or a real-time military simulation game, the amount of data transferred per frame, and the rate at which frames need to be transferred, makes REST bandwidth requirements a real problem.

However, that’s not the worst part. Some games have rules that require secrecy. For example, in a game of Texas Hold’em poker, some state is known, such as what the community cards are and what the amounts of the pot and each player’s chip stack is. Other state, though, is hidden. I do not know what cards you have in your hand, and to avoid cheating, I should have no way of finding this out until and unless you reveal your hand at the end of the game. What cards each of us have in our hands is state that is shared between only us and the server. This means that a player cannot upload a “new game state” that includes the cards in each hand. Yet, to determine the winner, all the state is required. Additionally, the deck that cards are being dealt from is secret state that only the server knows. Thus, at a minimum, to apply REST to our poker game, we’d have to decompose the game into many separate states (each hand, the deck, the pot, each player’s chip stack, etc), and we’d have to enforce separate access control to each of these values. More troubling is the fact that the cards in a player’s hand are changed by the dealer, which means it’s an action someone else takes that affects what my state is. Again, REST does not support server to client push, nor does it generally support “active” state, so implementing even a simple game like Poker requires us to break the REST mold.

The World Wide Web is a large instance of largely REST-ful services. Various work-arounds exist to get around the limitations of the REST architecture, such as long-poll XMLHttpRequests from Javascript, chunked-encoding IFRAME script execution, etc. The main reason to use the Flash player or other browser plug-ins is to break the REST mold. Once games expand outside the simple rules of something like poker or chess, the contortions necessary to work around the limitations of REST become byzantine enough that the benefits of running within a browser may be outweighed by the costs, both in development, deployment and operation. You simply can’t describe a typical server instance of a game like World of Warcraft using a REST API and expect any normal consumer network or computer hardware to keep up.

Why does this matter? The World Wide Web is built on REST. New web standards that allow us to do more and more with the internet browser are driven by the same people who developed REST. As a case in point, the HTML5 standards umbrella does not address streaming media. You can “play” a sound or video URL, and the implementation of those URLs may optimize by not delivering the data until you need it, but the only way to “record” sound or video is to record into a static piece of data, which can later be uploaded. This means that voice or video chat cannot be done in a web browser without relying on third party plug-ins, such as the Flash player. Additionally, HTML5 describes a mechanism called Web Sockets to try to work around the server push limitations of REST, but that technology has run into significant implementation problems, and will not be part of the round of web browsers releasing in the first half of 2011.

It’s clear that people want to play games and do other interactive, bidirectional activities with their network-connected computers. The success of networked games and the success of even non-interactive games like FarmVille or Backyard Monsters leave this without a doubt. It would be great if those who understand interactive entertainment (such as game makers) could make friends with those who are currently driving the web, and find common ground to allow for a bidirectional, interactive, real-time world wide web of human communications. HTML6, anyone? 🙂