Code Reviews: Follow the Data

After years of reviewing other people’s code, I’d like to share a couple practices that improve the effectiveness of code reviews.

Why Review Code?

First, why review code at all? There are a few reasons:

  • Catching bugs
  • Enforce stylistic consistency with the rest of the code
  • Finding opportunities to share code across systems
  • Helping new engineers spin up with the team and project
  • Helping API authors see actual problems that API consumers face
  • Maintain the health of the system overall

It seems most people reach for one of two standard techniques when reviewing code; they either review diffs as they arrive or they review every change in one large chunk. We’ve used both techniques, but I find there’s a more effective way to spend code review time.

Reviewing Diffs

It seems the default code review technique for most people is to sign up for commit emails or proposed patches and read every diff as it goes by. This has some benefits – it’s nice to see when someone is working in an area or that a library had a deficiency needing correction. However, on a large application, diffs become blinding. When all you see is streams of diffs, you can lose sight of how the application’s architecture is evolving.

I’ve ended up in situations where every diff looked perfectly reasonable but, when examining the application at a higher level, key system invariants had been broken.

In addition, diffs tends to emphasize small, less important details over more important integration and design risks. I’ve noticed that, when people only review diffs, they tend to point out things like whitespace style, how iteration over arrays is expressed, and the names of local variables. In some codebases these are important, but higher-level structure is often much more important over the life of the code.

Reviewing diffs can also result in wasted work. Perhaps someone is iterating towards a solution. The code reviewer may waste time reviewing code that its author is intending to rework anyway.

Reviewing Everything

Less often, I’ve seen another code review approach similar to reviewing diffs, but on entire bodies of work at a time. This approach can work, but it’s often mindnumbing. See, there are two types of code: core, foundational code and leaf code. Foundational code sits beneath and between everything else in the application. It’s important that it be correct, extensible, and maintainable. Leaf code is the specific functionality a feature needs. It is likely to be used in a single place and may never be touched again. Leaf code is not that interesting, so most of your code review energy should go towards the core pieces. Reviewing all the code in a project or user story mixes the leaf code in with the foundational code and makes it harder see exactly what’s going on.

I think there’s a better way to run code reviews. It’s not as boring, tends to catch important changes to core systems, and is fairly efficient in terms of time spent.

Follow the Data

My preferred technique for reviewing code is to trace data as it flows through the system. This should be done after a meaningful, but not TOO large, body of work. You want about as much code as you can review in an hour: perhaps more than a user story, but less than an entire feature. Start with a single piece of data, perhaps some text entered on a website form. Then, trace that data all the way through the system to the output. This includes any network protocols, transformation functions, text encoding, decoding, storage in databases, caching, and escaping.

Following data through the code makes its high-level structure apparent. After all, code only exists to transform data. You may even notice scenarios where two transformations can be folded into one. Or perhaps eliminated entirely — sometimes abstraction adds no value at all.

This style of code review frequently covers code that wasn’t written by the person or team that initiated the code review. But that’s okay! It helps people get a bigger picture, and if the goal is to maintain overall system health, new code and existing shouldn’t be treated differently.

It’s also perfectly fine for the code review not to cover every new function. You’ll likely hit most of them while tracing the data (otherwise the functions would be dead code) but it’s better to emphasize the main flows. Once the code’s high-level structure is apparent, it’s usually clear which functions are more important than others.

After experimenting with various code review techniques, this approach has been the most effective and reliable over time. Make sure code reviews are somewhat frequent, however. After completion of every “project” or “story” or “module” or whatever, sit down for an hour with the code’s authors and appropriate tech leads and review the code. If the code review takes longer than an hour, people become too fatigued to add value.

Handling Code Review Follow-Up Tasks

At IMVU in particular, as we’re reviewing the code, someone will rapidly take notes into a shared document. The purpose of the review meeting is to review the code, not discuss the appropriate follow-up actions. It’s important not to interrupt the flow of the review meeting with a drawn-out discussion about what to do about one particular issue.

After the meeting, the team leads should categorize follow-up tasks into one of three categories:

  1. Do it right now. Usually tiny tweaks, for example: rename a function, call a different API, delete some commented-out code, move a function to a different file.
  2. Do it at the top of the next iteration. This is for small or medium-sized tasks that are worth doing. Examples: fix a bug, rework an API a bit, change an important but not-yet-ubiquitous file format.
  3. Would be nice someday. Delete these tasks. Don’t put them in a backlog. Maybe mention them to a research or infrastructure team. Example: “It would be great if our job scheduling system could specify dependencies declaratively.”

Nothing should float around on an amorphous backlog. If they are important, they’ll come up again. Plus, it’s very tempting to say “We’ll get to it” but you never will, and even if you have time, nobody will have context. So either get it done right away or be honest with yourself and consciously drop it.

Now go and review some code! 🙂

How IMVU Builds Web Services: Part 3

In this 3-part series, IMVU senior engineer Bill Welden describes the means and technology behind IMVU’s web services.

Part 3: Documents and Links

In the previous entry in this series I described how IMVU uses a structured network model to implement the uniform contract for our REST services and showed how they might apply to a set of services for a hypothetical high school scheduling system.

Engblog1102-01

Under this model we have Node Groups (represented by the different colors of circles in this diagram), Nodes (the circles themselves), Edge Groups (the rounded boxes hanging off of the circles) and Edges (the lines connecting the circles).

We model these various notions using HTTP documents linked together with URLs. There are four kinds of documents, corresponding to each of the four concepts.

A Node Group determines the properties and relationships of the Nodes, and the corresponding Node Group Document contains a list of URLs, each locating a Node.

A Node, through its properties and relationships, is the basic repository of information in the network. The Node Group Document contains the values of the Node’s properties, as well as a list of URLs, each locating an Edge Group.

An Edge Group groups together all of the links that go to a particular type of Node. The corresponding  Edge Group Document, specific to a Node, contains a list of URLs, each locating an Edge within that group.

An Edge is a link from its Node to another Node, usually of a different class. Edges can also contain properties, as we saw last time. An Edge Document therefore contains a URL for the Node at the other end of the Edge, as well as the values of any properties of the Edge.

Each type of document has a distinctive URL.

The URL for a Node Group Document consists of a single path segment, a singular noun describing the Node Group:

https://api.imvu.com/course

The URL for a Node consists of two path segments: the Node Group URL extended by a segment consisting of a unique identifier for the Node. The Node identifier can be any string, but we encourage the use of Node Identifiers which contain the name of the Node Group:

https://api.imvu.com/course/course-1035

The URL for an Edge Group consists of three segments: the Node URL extended either by a plural noun naming the Node Group that the Edge Group points off to, or by a noun describing the relationship created by the Edge Group:

https://api.imvu.com/course/course-1035/teachers
https://api.imvu.com/course/course-1035/roster

The URL for an Edge consists of four segments: the Edge Group URL extended by an identifier for the Edge. This identifier need only be unique within the Edge Group. Sometimes (for convenience) it is identical to the unique identifier of the Node that the Edge points to, but this is not required:

https://api.imvu.com/course/course-1035/teachers/1
https://api.imvu.com/course/course-1035/roster/student-5331

These URL formats are a constraint on the service design – a strong suggestion but not a strict requirement.

Note especially that, according to the principle of HATEOAS (Hypertext As The Engine Of Application State), the client cannot depend on any specific format, and is not allowed to construct these URLs or attempt to extract information from them. All URLs required by the client must be obtained whole, as opaque strings, from the response to an earlier service request.

Specifically, URLs returned from the server must be fully qualified, including the protocol and server name (“https://api.imvu.com”). However, in our own internal discussions and here in this presentation, we will often leave off the protocol and server name for clarity. So:

/course
/course/course-1035
/course-1035/teachers

and so forth. Wherever one of these relative URLs appears in the discussion below, the actual implementation will return a fully qualified version.

One of the biggest differences between REST services and the earlier Remote Procedure Call style is that RPC APIs define as many verbs as required by or convenient for implementation of the application. Designing RPC services is primarily about designing new verbs and specifying their parameters and their semantics.

With IMVU Rest services, the verbs are specified, and there are only five of them. Designing a service consists of designing a set of Nodes and Edges, together with their properties and relationships, in such a way that these five verbs can provide all of the required functionality.

The verbs are

GET (any endpoint)
POST (to a Node or Edge),
POST (to a Node Group or an Edge Group),
POST (to a Node Group, including Edges), and
DELETE (a Node or Edge)

GET retrieves the document associated with the URL, which can be a Node Group Document, Node Document, Edge Group Document or an Edge Document. GETs must be nullipotent, which is to say that they can have no side-effects.

POST, when applied to a Node or Edge URL makes changes to the properties and relations of the Node or Edge. Such POSTs must be idempotent, which means that sending to POST twice must have exactly the same effect as sending it once.

When POST is applied to a Node Group or Edge Group, it adds a new Node or a new Edge. Such POSTs will not be idempotent, since sending the POST twice will add two new Nodes or two new Edges.

There is a third sort of POST, a POST specifically to a Node Group which adds a new Node, but which also contains data for a set of new Edges to be added along with the Node (including Edge Groups as necessary).

Finally there is DELETE, which is used to delete a Node or an Edge. The implementation of DELETE must be idempotent.

GET returns the document associated with the URL, but in order to minimize round trips the service is allowed to return any additional documents that it thinks the client may soon need.

You can see this in the format of the JSON document returned by a GET:

{
  "id": "/course/course-1035",
  "status": "success",
  "denormalized":
  {
   "/course/course-1035": { … } 
   "/course/course-1035/teachers": { … }
   "/course/course-1035/teachers/1": { … }
   … etc …
  }
}

There is a status, “success” or “failure”, and if the GET fails, some information about why, but if it succeeds a package of endpoints is included, grouped under the response member “denormalized”.

In this example the client has asked for a Course, and the server has chosen to return not only the Course Node, but the Edge Group and Edges for all of the teachers that teach that course.

Each of the four kinds of documents in these responses has a specific JSON format.

A Node Group Document has a member “nodes” which is an array of URLs, one for each Node.

{
"nodes": [
 "/course/course-1035",  
 "/course/coures-1907", 
 ...
 ]
}

If the client wants to present the list of Nodes in a certain order, it is responsible for sorting them itself. It cannot depend on the order of entries in this array.

A Node Document has two members. The “data” member contains the Node’s properties. The “relations” member contains the URLs that link the Node to its Edge Groups and to other Nodes in the system.

{
 "data": {
 "description": "Algebra I",
 "starting_time": "10:00"
 }
"relations": {
 "teacher": "/teacher/teacher-372",
 "roster": "/course/course-1035/roster"
 }
}

Names in these objects represent a contract with the client. Based on the name, the server guarantees the semantics of the value including its type, the allowed values, the meaning of the value and if it is a link, the Node Group of the Node it points to.

Note that here “teacher” is a link to a Teacher Node. This design precludes the possibility of team teaching where there are two teachers for a class.

Links directly from one Node to another create a one-to-N relationships. One Teacher to many Courses. As a rule, however, relationships are more often N-to-N than not, and such restrictions on cardinality are a red flag. Not wrong, necessarily, but something that may be called out in design reviews.

The design could be made to support many teachers per course by implementing an Edge Group “teachers” for Course Nodes. This is a more flexible design, because the cardinality restrictions, say a maximum of two teachers, or allowing multiple teachers only for certain courses, can be implemented on the server, where they are easier to change.

An Edge Group Document has a member “edges” which is a JSON array of URLs for the Node’s Edges.

{
  "edges": [
   "/course/course-1035/roster/student-5331",
   "/course/course-1035/roster/student-1070",
   ...
  ],
}

Again, the client cannot depend on the order that the Edges come back from the server.

Finally, an Edge Document has an optional “data” member for when the Edge has properties, and a “relations” member which contains the URL for the Node at the other end of the Edge.

{
 "data": {
  "tardies": 1,
 },
 "relations": {
  "ref": "/student/student-5331"
 }
}

Here is an Edge between one Course (course-1035) and one Student (student-513312).

Edges go both ways. Here are the Edges between that same student and his courses.

{
 "edges": [
   "/student/student-53312/schedule/01",
   "/student/student-53312/schedule/02",
   ...
 ],
}

Now it’s not required to implement every Edge Group implied by the Node/Edge model, so it’s acceptable to implement only half of this Edge relationship without its symmetrical partner. When we have an Edge Group in our design, however, we think carefully about the symmetrical Edge Group. It’s often a very interesting view on the data, and it’s seldom very difficult to implement.

Note that the Edge document itself can come back in two different ways, but the underlying database record will be the same. Here is one of the Edge documents linked from the Edge Group above:

{
 "data": {
  "tardies": 1,
 },
 "relations": {
  "ref": "/course/course-1035",
 }
}

Whether you look at this link from the Student or the Course perspective, the count of tardies is the same data element.

Nodes and Edges are updated using the same JSON document format as the response from a GET, though there is no denormalization envelope.

Here is the document for a POST to a Student Node (/student/student-513312) intended to update the student’s birth date and counsellor (documents for POSTs to Edges look pretty much the same):

{
 "data": {
  "birth_date": "5/11/96",
 },
 "relations": {
  "counsellor": "/teacher/teacher-121"
 }
}

If a property is missing from the data or relations sections, it is left unchanged in the Node or Edge.

The response which comes back from a POST to a Node or Edge is the same as the response from a GET to that Node or Edge, including additional denormalized data at the discretion of the server.

New Nodes and Edges are added by posting to the corresponding Node Group or Edge Group  and providing the data and relations for the new Node or Edge in the body of the POST. This is a document POSTed to /course/course-1035/roster to create a new Edge, enrolling a new student in a course:

{
 "relations": {
   "ref": "/student/student-10706"
 }
}

All of the required properties must be included in this kind of POST. A POST creating a new Node looks pretty much the same.

Again, the response to this kind of POST is the same as you would receive from a GET to the newly created Node or Edge (though you will only know the id of the new Node or Edge once the response comes back).

It is often useful to be able to add a Node and a number of Edges in one POST request. The response would include the new Node, a new Edge Group and all of the specified Edges. Here is a POST to /student which adds a new Student along with Enrollment Edges in two courses:

{
"data": {
  "name": "Andrew",
    "birth_date": "7/21/95"
  },
  "edges": {
    "schedule": [ 
      {
        "relations": {
          "ref": "/course/course-1035"
        }
      }, 
      {
        "relations": {
          "ref": "/course/course-2995"
        }
      },
    ]
  }
}

The only other verb is DELETE, which can be applied to a Node or an Edge by providing the URL of the Node or Edge. No document is passed with a DELETE request. DELETEing a Node will also delete all of the Edge Groups and Edges for that Node.

The service implements much of the functionality of an application by implementing business rules which add (within limits) to the semantics of POSTs and DELETEs.

Business rules cannot change the fundamental semantics of these verbs. A successful POST to a Node Group or Edge Group must still add a new Node or Edge. A successful POST to a Node or Edge must still make the specified modifications to the Node or Edge, and a successful DELETE must still result in the specified deletion.

In particular, POST to a Node or Edge must remain idempotent. POSTing twice to a Node or Edge must have exactly the same effect as POSTing once.

Business rules, however, can reject requests which violate a desired constraint. We might want to disallow deletion of students prior to their 21st birthday.

Business rules can also limit operations to specific users or classes of user. Our school system might have an administrator class who are responsible for placing students in classes. An attempt to add a Student Edge to a class would be rejected if the client making the request was not identified as an administrator.

Business rules also have broad authority to make additional changes to the back end data structures based on a POST or DELETE. We could add a property to Student showing the number of classes each student is enrolled in, and then when the client POSTs to the schedule Edge Group to add a new Course, increment this count in the Student Node.

As I mentioned earlier, the client is not allowed to construct URLs, but is required to retrieve them from the server. It is, however, allowed to append query parameters. These must have no effect on the query other than limiting the set of records returned.

Here are some examples.

GET /student?name=piers*
GET /student/student-10706/schedule?tardies=0
GET /student?schedule.tardies=gt.0

In the first two cases the query is based on a property of the Node or Edge. The first is intended to retrieve all students whose name begins with “piers”, the second to get all courses where the given student is currently enrolled and has no tardies.

The third example shows a query that would be achieved with a join in SQL. It is imagined to retrieve all students which have been late for at least one class.

Note that the HTTP query syntax doesn’t really support the kind of database queries we want to perform very well, and because of that we have not yet settled on standards for specifying queries (and different projects have come up with different solutions). We are still striking out into new territory, but remain clear that we want to be working toward a company wide solution in the long term.

Some Node Groups contain a lot of Nodes. At IMVU we have a Node Group for the tens of millions of products in our catalog. We don’t currently support querying our product Node Group, but if we did, we would have to get a response back something like this:

{ …
  {
    "nodes": [
      "/product/product-10057",
      "/product/product-10092",
      ...
      "/product/product-11033",
    ],
    "next": "/product?offset=50",
  }
}

The list of Nodes includes only the first fifty, but provides us with a URL – with a built-in query parameter – which allows us to retrieve another group of fifty.

There are a number of services which do paging like this, but note that offsets don’t work very well for paging when Nodes are often being added and deleted, since the offsets associated with specific Nodes can change. Paging is another area in which our standards are still under development.

Finally, in order to allow clients to cache the responses they receive we provide a way for the service to let the client know when cached URL responses are no longer valid.

This involves the use of IMQ, IMVU’s proprietary system for efficiently pushing data to clients in real time. In our case, the data is simply a notification that an earlier response to a particular URL is no longer valid. We’ll go into detail on IMQ and how it works in a future post.

Most endpoints do not provide invalidation. IMVU product information, for example, doesn’t change often enough to make the overhead associated with invalidation worthwhile.

When invalidation is available, the response to an endpoint will include a member called “updates”, which provides the information necessary to subscribe to the appropriate IMQ queue. Here is a response to a GET of a particular enrollment Edge 
(/student/student-513312/schedule/01):
{
 {
    "data": { 
      "tardies": 1,
    },
    "relations": {
      "ref": "/course/course-1035",
    }
    "updates": "imq://inv.student.student-513312"
  }
}

Invalidations for different endpoints come in on different queues, and it is the server’s prerogative to choose the queue for each endpoint.

Currently, for the endpoints which support invalidation, we have one queue per Node. Invalidations for the Node come in on this queue, but the same queue also provides invalidations for the Node’s Edge Groups and Edges. This design allows us to achieve a balance between the number of queues we create and the number of subscribers to each queue.

In the example above, only clients that have an active interest in Student 513312 will be subscribed to this queue. With this mechanism, if someone POSTs a change to the number of tardies, every client which has this record cached will receive a notification. If the number of tardies is displayed on the screen, it can be updated immediately.

Note that under this scheme, there can be no queue associated with a Node Group. Such a queue would be used to notify all clients that, for example, a new Student Node had been added. To be useful, however, every client application would need to subscribe to the queue, and IMQ is not built to support such a large number of subscribers. There are solutions to this problem, but for the moment we do not support invalidation for Node Groups.

The discipline of REST and the specific uniform contract IMVU has adopted give us the power and flexibility to quickly create, enhance and share back end services. The principles behind REST (and especially the principles of uniform contract, HATEAOS and separation of concerns) provide us with a solid framework as we continue to complete and codify our standards.

We’ll keep you up to date as things develop.

Partner Integrations

By Rusty Larner, IMVU

IMVU partners with several companies to provide extra services on our website that are not our core competencies, especially when it makes more sense to use others that already implement the service.  Each of these integration efforts is a new process, but we have found several practices that make them easier to accomplish and more likely to succeed.

Doing a technical review during the initial partner evaluation sounds obvious, but is extremely critical.  Not just at the marketing level, but having our engineers talk with theirs about the expectations we have on performance (connections per second, etc.) as well as the actual integration work helps reduce confusion before the effort starts.  Several times we have surprised our partners with either the usage we expect for a new feature, or how we want to integrate their services into our entire development cycle.  Sometimes the technical review at this stage determines that it won’t really be reasonable to do the integration – helping us to ‘fail fast’ instead of waiting until attempting the integration.

Continuing this communication on through the initial implementation effort is also important. The best way to accomplish this is having both sets of engineers co-located during the initial integration effort.  We have found that having the engineers from our partner work directly with our engineers helps cut down on the communication delays.  There are always questions during any integration, about API documentation or error messages or just adding logging to determine why a given request isn’t working – having engineers in the same room helps immensely.  The timing on this is also critical – we do not want to have someone fly out before we’ve started the integration work, but we also do not want to go too far down the rabbit hole before having an expert there to help.  Having flexible partners helps – during one integration process we ran into difficultly far earlier than expected – our east-coast partner flew out an engineer the next day to work with our team, and he stayed until we determined the problem.

Another part is being able to integrate tests against our external partners into our continuous integration process.  We have several solutions here – first, we implement a full ‘fake’ version of the partner’s API that we use during standard commits.  This allows our large suite of unit and functionality tests to not be dependent on external connections that would make them slower and sometimes intermittent.  We also have a separate set of tests that we run against the actual API.  These include tests that confirm our fake APIs work the same as the actual APIs.  These external tests are run every couple of hours, so we can detect if the partner’s APIs change.  The combination of these tests allow us to feel confident with our continuous integration efforts, even when doing things like changing our Music Store (which talks to our music partner).

Metrics and graphing are also a very important part of our process.  They let us quickly determine if there is a problem with the connection to our partner.  The faster you know there is a problem, the faster you can respond!  (A couple of times during my first years at IMVU, we only found out about problems through our forums – not the way to respond quickly to problems!  Plus, it becomes far harder to fix the problem when you don’t know exactly when it started…)  For these integrations, we keep several levels of metrics.  First, each point where we talk to an external customer gets metrics – for example, each music purchase attempt gets counted from our side.  But we also ask our partners to expose a service to report their own metrics to us.  This gives us better pointers to problems when they do occur – i.e. if there is a problem with the network connections between our servers and our partners’ servers.

The metrics and graphing are only a part of the continuing maintenance that we put in place.  Another major part is communication paths.  We create an internal mailing list that we ask our partners to send all email to – so when we internally change responsible individuals we don’t have to have all of our partners change their address books.  We also ask our partners to let us know as soon as possible whenever they detect a problem, then follow up later on to let us know how they solved the problem.  Where available, we’ve asked our partners to include our contact info for some of their monitoring alerts.  Our operations staff also has contact details and a call escalation path for our partner and solution instructions so they know how to handle any alerts that are triggered by our monitoring.

So, in review – for partners of IMVU, here is the checklist we (currently) follow:

* Initial engineering technical review before any contract is signed

* Co-located engineering support early in the integration schedule (usually within the first week or two)

* Regular testing calls being made to the partner’s APIs (including test purchases, etc.)

* Expose a web service containing real-time metrics important for the integration

* Call escalation path for operations if issues arise

* Continuous communication about scheduled maintenance, unplanned issues, or upcoming upgrades.

Upcoming Conferences Featuring IMVU’s Development Processes

IMVU engineering and development is being featured at some upcoming conferences.  These are great opportunities to learn more about IMVU and our development processes and scaling a Lean Startup.

SCALING WITH CONTINUOUS DEPLOYMENT

Web 2.0 Expo, New York, NY, September 29, 2010

http://www.web2expo.com/webexny2010/public/schedule/detail/15825

Continuous Deployment takes continuous integration one step further, where every commit goes live to production servers. When this process is described it is frequently met with skepticism around site reliability and the ability to scale a business this way, but at IMVU it works, it scales (with challenges) and it is embraced by the entire organization. Continuous Deployment offers a substantial advantage, providing the ability to quickly react to new business opportunities. At IMVU the time from an engineer committing code to it being live on the production cluster is approximately 20 minutes. Every commit is pushed live with no interim staging or QA deployment. Success with this system requires both a sophisticated automated testing system and a fail-safe monitoring system that can detect a regression and revert it before it has material customer impact. IMVU adopted this continuous deployment system well after launch, requiring changes to processes, systems and culture. Examples are provided for all components of this system, how IMVU succeeded in the transition and the new challenges faced as the system scales with the organization.

SCALING PRODUCT DEVELOPMENT AT A LEAN STARTUP: AGILE AT IMVU

GDC Online, Austin, TX, October 6, 2010

http://schedule.gdconline.com/session/11330

IMVU is sometimes referred to as the original ‘Lean Startup’, having successfully blended Agile methodologies with Customer Development methodologies. This lecture focuses on IMVUs product development process, and the hard won learning which has allowed it to evolve and scale with company growth. Topics include foundations for successful Scrum implementation in the organization, specific Agile methodologies used and why, detailed description of the Scrum team calendar and planning process, and how IMVU uses retrospectives, post-mortems, and ‘5 Whys’ to continuously evolve the product development process and ensure Engineering delivers successfully for customers and the Product Management organization.

BUILDING A SUCCESSFUL BUSINESS AFTER LAUNCH THROUGH RAPID ITERATION

GDC Online, Austin, TX, October 7, 2010

http://schedule.gdconline.com/session/11329

Online games, social applications and virtual worlds don’t stop development after launch. Understanding customer behavior and iterating on your product is critical to growing your business, so achieving a process that decreases this iteration time greatly improves your chance of success. By creating systems that utilize A/B testing, instant customer metrics and a sophisticated deployment process IMVU is able to shorten this iteration loop and deploy new software to production up to 50 times per day. This session demonstrates real-world examples of these systems and explains a process for deploying them, even after launch.

MAKING MONEY ON VIRTUAL GOODS: THE IMVU EXPERIENCE

GDC Online, Austin, TX, October 6, 2010

http://schedule.gdconline.com/session/11552

During this presentation, IMVUs Lee Clancy, SVP of Product Management and GM of Direct Revenue will demonstrate how MMO developers can leverage the company’s expertise to create a successful virtual goods business. By providing insight into the strategy IMVU used to grow its user base to more than 40 million registered users and its digital catalog (the worlds largest) to more than four million virtual goods, MMO developers will learn how they too can develop a virtual goods business that will help them further monetize their user base. The presentation will also help the audience understand the similarities between social entertainment networks and MMOs so they can see how applying IMVUs business strategies can help grow their businesses.

Efficiently Rendering Flash in a 3D Scene

By Chad Austin

In my previous post, I talked about how to embed Flash into your desktop application, for UI flexibility and development speed. This time, I’ll discuss efficient rendering into a 3D scene.

Rendering Flash as a 3D Overlay (The Naive Way)

At first blush, rendering Flash on top of a 3D scene sounds easy. Every frame:

  1. Create a DIB section the size of your 3D viewport
  2. Render Flash into the DIB section with IViewObject::Draw
  3. Copy the DIB section into an IDirect3DTexture9
  4. Render the texture on the top of the scene

Ta da! But your frame rate dropped to 2 frames per second? Ouch. It turns out this implementation is horribly slow. There are a couple reasons.

First, asking the Adobe flash player to render into a DIB isn’t a cheap operation. In our measurements, drawing even a simple SWF takes on the order of 10 milliseconds. Since most UI doesn’t animate every frame, we should be able to cache the captured framebuffer.

Second, main memory and graphics memory are on different components in your computer. You want to avoid wasting time and bus traffic by unnecessarily copying data from the CPU to the GPU every frame. If only the lower-right corner of a SWF changes, we should limit our memory copies to that region.

Third, modern GPUs are fast, but not everyone has them. Let’s say you have a giant mostly-empty SWF and want to render it on top of your 3D scene. On slower GPUs, it would be ideal if you could limit your texture draws to the region of the screen that are non-transparent.

Rendering Flash as a 3D Overlay (The Fast Way)

Disclaimer: I can’t take credit for these algorithms. They were jointly developed over years by many smart engineers at IMVU.

First, let’s reduce an embedded Flash player to its principles:

  • Flash exposes an IShockwaveFlash [link] interface through which you can load and play movies.
  • Flash maintains its own frame buffer. You can read these pixels with IViewObject::Draw.
  • When a SWF updates regions of the frame buffer, it notifies you through IOleInPlaceSiteWindowless::InvalidateRect.

In addition, we’d like the Flash overlay system to fit within these performance constraints:

  • Each SWF is rendered over the entire window. For example, implementing a ball that bounces around the screen or a draggable UI component should not require any special IMVU APIs.
  • If a SWF is not animating, we do not copy its pixels to the GPU every frame.
  • We do not render the overlay in transparent regions. That is, if no Flash content is visible, rendering is free.
  • Memory consumption (ignoring memory used by individual SWFs) for the overlay usage is O(framebuffer), not O(framebuffer * SWFs). That is, loading three SWFs should not require allocation of three screen-sized textures.
  • If Flash notifies of multiple changed regions per frame, only call IViewObject::Draw once.

Without further ado, let’s look at the fast algorithm:

Flash notifies us of visual changes via IOleInPlaceSiteWindowless::InvalidateRect. We take any updated rectangles and add them to a per-frame dirty region. When it’s time to render a frame, there are four possibilities:

  • The dirty region is empty and the opaque region is empty. This case is basically free, because nothing need be drawn.
  • The dirty region is empty and the opaque region is nonempty. In this case, we just need to render our cached textures for the non-opaque regions of the screen. This case is the most common. Since a video memory blit is fast, there’s not much we could do to further speed it up.
  • The dirty region is nonempty. We must IViewObject::Draw into our Overlay DIB, with one tricky bit. Since we’re only storing one overlay texture, we need to render each loaded Flash overlay SWF into the DIB, not just the one that changed. Imagine an animating SWF underneath another translucent SWF. The top SWF must be composited with the bottom SWF’s updates. After rendering each SWF, we scan the updated DIB for a minimalish opaque region. Why not just render the dirty region? Imagine a SWF with a bouncing ball. If we naively rendered every dirty rectangle, eventually we’d be rendering the entire screen. Scanning for minimal opaque regions enables recalculation of what’s actually visible.
  • The dirty region is nonempty, but the updated pixels are all transparent. If this occurs, we no longer need to render anything at all until Flash content reappears.

This algorithm has proven efficient. It supports multiple overlapping SWFs while minimizing memory consumption and CPU/GPU draw calls per frame. Until recently, we used Flash for several of our UI components, giving us a standard toolchain and a great deal of flexibility. Flash was the bridge that took us from the dark ages of C++ UI code to UI on which we could actually iterate.

How to Embed Flash Into Your 3D Application

By Chad Austin

Writing user interfaces is hard. Writing usable interfaces is harder. Yet, the design of your interface is your product.

Products are living entities. They always want to grow, adapting to their users as users adapt to them. In that light, why build your user interface in a static technology like C++ or Java? It won’t be perfect the first time you build it, so prepare for change.

IMVU employs two technologies for rapidly iterating on and refining our client UIs: Flash and Gecko/HTML. Sure, integrating these technologies has a sizable up-front cost, but the iteration speed they provide easily pays for them.

Rapid iteration has some obvious benefits:

  1. reduces development cost
  2. reduces time to market

and some less-obvious benefits:

  1. better product/market fit: when you can change your UI, you will.
  2. improved product quality: little details distinguish mediocre products from great products. make changing details cheap and your Pinto will become a Cadillac.
  3. improved morale: both engineers and designers love watching their creations appear on the screen right before them. it’s why so many programmers create games!

I will show you how integrating Flash into a 3D application is easier than it sounds.

Should I use Adobe Flash or Scaleform GFx?

The two most common Flash implementations are Adobe’s ActiveX control (which has a 97% installed base!) and Scaleform GFx.

Adobe’s control has perfect compatibility with their tool chain (go figure!) but is closed-source and good luck getting help from Adobe.

Scaleform GFx is an alternate implementation of Flash designed to be embedded in 3D applications, but, last I checked, is not efficient on machines without GPUs. (Disclaimer: this information is two years old, so I encourage you to make your own evaluation.)

IMVU chose to embed Adobe’s player.

Deploying the Flash Runtime

Assuming you’re using Adobe’s Flash player, how will you deploy their runtime? Well, given Flash’s install base, you can get away with loading the Flash player already installed on the user’s computer. If they don’t have Flash, just require that they install it from your download page. Simple and easy.

Down the road, when Flash version incompatibilities and that last 5% of your possible market becomes important, you can request permission from Adobe to deploy the Flash player with your application.

Displaying SWFs

IMVU displays Flash in two contexts: traditional HWND windows and 2D overlays atop the 3D scene.

If you want to have something up and running in a day, buy f_in_box. Besides its awesome name, it’s cheap, comes with source code, and the support forums are fantastic. It’s a perfect way to bootstrap. After a weekend of playing with f_in_box, Dusty and I had a YouTube video playing in a texture on top of our 3D scene.

Once you run into f_in_box’s limitations, you can use the IShockwaveFlash and IOleInPlaceObjectWindowless COM interfaces directly. See Igor Makarav’s excellent tutorial and CFlashWnd class.

Rendering Flash as an HWND

For top-level UI elements use f_in_box or CFlashWnd directly. They’re perfectly suited for this. Seriously, it’s just a few lines of code. Look at their samples and go.

Rendering Flash as a 3D Overlay

Rendering Flash to a 3D window gets a bit tricky… Check out my next post!

Agile Business Intelligence: A Core Component of Agile Engineering

By Curtis Pullen

Business intelligence (BI) is not a term that is frequently brought up in the context of agile development – it seems more appropriate in the context of big, slowly moving companies crunching data on million dollar mainframes. The truth, however, is that business intelligence is a critical and integral part of the agile product development process. The goal of agile is to enable teams to react to feedback, and the goal of business intelligence is to provide that feedback.

Business intelligence in the traditional sense is unfortunately not equipped to provide the kind of feedback that an agile engineering team needs. When multiple teams of engineers are iterating in two or three week sprints, continuously deploying their changes to a production system as we do at IMVU, BI needs to be even faster. Those engineers are inventing new dimensions of potential analysis at an explosive rate and transforming user behaviour so quickly that data more than a few months old is of only archaeological interest, and to maintain that pace they need continuous feedback. They need to know when a mistake has been made so they can fix it; they need to know when they’re on to something revolutionary so they can run with it. They need BI that’s as agile as they are.

So what does agile BI look like? It employs many of the same patterns as agile software development: quick iteration, frequent releases, and close communication. While a feature is under development, BI analysts meet with product management to identify the high-level metrics that will best indicate the status of the project post-release, and with engineers, to determine how that data might best be obtained. When a project ships, BI analysts aggregate the data, translate it into meaningful information, present it, and seek feedback. Any questions or concerns raised by engineering or management trigger a new round of the cycle:

1. Discuss requirements with stakeholders.

2. Collect and interpret data

3. Deliver results to stakeholders and collect feedback

In order for this to succeed, BI needs to be tightly linked to both engineering and management. At IMVU, we accomplish this by having our engineers take on most of our BI work as an integral part of product development, while more sophisticated analysis is undertaken by a BI team with engineering support. We collect the right data exactly when we need it, we don’t waste time scrutinizing obvious patterns, and we illuminate the details as soon as it’s necessary.

Life moves fast, business intelligence should too.

This is an expanded version of some thoughts I put down in a contest for how best to describe agile BI. You can read the entries and then vote for mine here: http://www.pentaho.com/what_is_agile

IMVU’s Agile Engineering Process

Clinton Keith, the Agile coach and trainer who introduced Agile and scrum to the video game industry in 2003, visited the IMVU offices during the Game Developer’s Conference in March. I spent time with Clint describing IMVU’s product development process, and we even had the opportunity to show Clint some of our scrum process in action. Clint reported a lot of interest in our successful implementation of Agile and Lean Startup methodologies from people he spoke with at GDC, and he asked if I’d be willing to do a Q&A session with him. Here is an excerpt of our interview, posted today on his Agile Game Development blog:


Agile Game Interview – Agile Engineering for Online Communities

CK: What is the overview of the IMVU engineering process?

JB: Our engineering team practices what people refer to as “Agile” or “Lean Startup” methodologies, including Test-Driven Development (TDD), pair programming, collective code ownership, and continuous integration. We refactor code relentlessly, and try to ship code in small batches. And we’ve taken continuous integration a step further: every time we check in server side code, we push the changes to our production servers immediately after the code passes our automated tests. In practice, this means we push code live to our production web servers 35-50 times per day. Taken together with our A/B split-testing framework, which we use to collect data to inform our product development decisions, we have the ability to quickly see and test the effects of new features live in production. We also practice “5 Whys” root cause analysis when something goes wrong, to avoid making the same mistakes twice.

CK: How do you get so many changes out in front of customers without causing problems, either for your customers or your production web servers?

JB: I think it’s important to point out that sometimes we actually do introduce regressions and new bugs that impact our customers.  However, we try to strike a balance between minimizing negative customer impacts and maximizing our ability to innovate and test new features with real customers as early as possible. We have several mechanisms we use to help us do that, and to control how customers experience our changes. It boils down to automation on one hand, and our QA engineers on the other.

First the automation: we take TDD very seriously, and it’s an important part of our engineering culture. We try to write tests for all of our code, and when we fix bugs, we write tests to ensure they don’t regress. Next, we built our code deployment process to include what we call our “cluster immune system,” which monitors and alerts on statistically significant changes in dozens of key error rates and business metrics. If a problem is detected, the code is automatically rolled back and our engineering and operations teams are notified. Next, we have the ability to limit rollout of a feature to one or more groups of customers–so we can expose new features to only QA or Admin users, or ad-hoc customer sets. We also built an incremental rollout function that allows us to slowly increase exposure to customers while we monitor both technical and business metrics to ensure there are no big problems with how the features work in production. Finally, we build in a “kill switch” to most of our applications, so that if any problems occur later, for example, scaling issues, we have fine-grained control to turn off problematic features while we fix them.


Read the rest of Agile Game Interview – Agile Engineering for Online Communities