Clinton Keith, the Agile coach and trainer who introduced Agile and scrum to the video game industry in 2003, visited the IMVU offices during the Game Developer’s Conference in March. I spent time with Clint describing IMVU’s product development process, and we even had the opportunity to show Clint some of our scrum process in action. Clint reported a lot of interest in our successful implementation of Agile and Lean Startup methodologies from people he spoke with at GDC, and he asked if I’d be willing to do a Q&A session with him. Here is an excerpt of our interview, posted today on his Agile Game Development blog:
CK: What is the overview of the IMVU engineering process?
JB: Our engineering team practices what people refer to as “Agile” or “Lean Startup” methodologies, including Test-Driven Development (TDD), pair programming, collective code ownership, and continuous integration. We refactor code relentlessly, and try to ship code in small batches. And we’ve taken continuous integration a step further: every time we check in server side code, we push the changes to our production servers immediately after the code passes our automated tests. In practice, this means we push code live to our production web servers 35-50 times per day. Taken together with our A/B split-testing framework, which we use to collect data to inform our product development decisions, we have the ability to quickly see and test the effects of new features live in production. We also practice “5 Whys” root cause analysis when something goes wrong, to avoid making the same mistakes twice.
CK: How do you get so many changes out in front of customers without causing problems, either for your customers or your production web servers?
JB: I think it’s important to point out that sometimes we actually do introduce regressions and new bugs that impact our customers. However, we try to strike a balance between minimizing negative customer impacts and maximizing our ability to innovate and test new features with real customers as early as possible. We have several mechanisms we use to help us do that, and to control how customers experience our changes. It boils down to automation on one hand, and our QA engineers on the other.
First the automation: we take TDD very seriously, and it’s an important part of our engineering culture. We try to write tests for all of our code, and when we fix bugs, we write tests to ensure they don’t regress. Next, we built our code deployment process to include what we call our “cluster immune system,” which monitors and alerts on statistically significant changes in dozens of key error rates and business metrics. If a problem is detected, the code is automatically rolled back and our engineering and operations teams are notified. Next, we have the ability to limit rollout of a feature to one or more groups of customers–so we can expose new features to only QA or Admin users, or ad-hoc customer sets. We also built an incremental rollout function that allows us to slowly increase exposure to customers while we monitor both technical and business metrics to ensure there are no big problems with how the features work in production. Finally, we build in a “kill switch” to most of our applications, so that if any problems occur later, for example, scaling issues, we have fine-grained control to turn off problematic features while we fix them.
Read the rest of Agile Game Interview – Agile Engineering for Online Communities …