Clinton Keith, the Agile coach and trainer who introduced Agile and scrum to the video game industry in 2003, visited the IMVU offices during the Game Developer’s Conference in March. I spent time with Clint describing IMVU’s product development process, and we even had the opportunity to show Clint some of our scrum process in action. Clint reported a lot of interest in our successful implementation of Agile and Lean Startup methodologies from people he spoke with at GDC, and he asked if I’d be willing to do a Q&A session with him. Here is an excerpt of our interview, posted today on his Agile Game Development blog:
CK: What is the overview of the IMVU engineering process?
JB: Our engineering team practices what people refer to as “Agile” or “Lean Startup” methodologies, including Test-Driven Development (TDD), pair programming, collective code ownership, and continuous integration. We refactor code relentlessly, and try to ship code in small batches. And we’ve taken continuous integration a step further: every time we check in server side code, we push the changes to our production servers immediately after the code passes our automated tests. In practice, this means we push code live to our production web servers 35-50 times per day. Taken together with our A/B split-testing framework, which we use to collect data to inform our product development decisions, we have the ability to quickly see and test the effects of new features live in production. We also practice “5 Whys” root cause analysis when something goes wrong, to avoid making the same mistakes twice.
CK: How do you get so many changes out in front of customers without causing problems, either for your customers or your production web servers?
JB: I think it’s important to point out that sometimes we actually do introduce regressions and new bugs that impact our customers. However, we try to strike a balance between minimizing negative customer impacts and maximizing our ability to innovate and test new features with real customers as early as possible. We have several mechanisms we use to help us do that, and to control how customers experience our changes. It boils down to automation on one hand, and our QA engineers on the other.
First the automation: we take TDD very seriously, and it’s an important part of our engineering culture. We try to write tests for all of our code, and when we fix bugs, we write tests to ensure they don’t regress. Next, we built our code deployment process to include what we call our “cluster immune system,” which monitors and alerts on statistically significant changes in dozens of key error rates and business metrics. If a problem is detected, the code is automatically rolled back and our engineering and operations teams are notified. Next, we have the ability to limit rollout of a feature to one or more groups of customers–so we can expose new features to only QA or Admin users, or ad-hoc customer sets. We also built an incremental rollout function that allows us to slowly increase exposure to customers while we monitor both technical and business metrics to ensure there are no big problems with how the features work in production. Finally, we build in a “kill switch” to most of our applications, so that if any problems occur later, for example, scaling issues, we have fine-grained control to turn off problematic features while we fix them.
Read the rest of Agile Game Interview – Agile Engineering for Online Communities …
We’ve heard a lot of interest from folks we’ve talked to in the tech community and at conferences about our Continuous Deployment process at IMVU, and how we push code up to 50 times a day. We’ve also received some questions about how we do this without introducing regressions and new bugs into our product, and how we approach Quality Assurance in this fast-paced environment.
The reality is that we occasionally do negatively impact customers due to our high velocity and drive to deliver features to our customers and to learn from them. Sometimes we impact them by delivering a feature that isn’t valuable, and sometimes we introduce regressions or new bugs into our production environment.
But we’ve delivered features our customers didn’t like even when we were moving more slowly and carefully—and it was actually more costly because it took us longer to learn and change our direction. For example, we once spent more than a month of development time working on a feature called Updates–similar to the Facebook “friend feed”, and we guessed wrong–way wrong–about how our customers would use that feature. It took us a way too long to ship a feature nobody actually wanted, and the result was that we delayed delivery of a feature that our customers were dying to have: Groups.
Asking customers what they want takes guesswork and internal employee opinions out of product development decisions, making it easy to resolve questions such as, “Should we build tools to let users categorize their avatar outfits, or should we build a search tool, or both?” We make the best decisions we can based on available customer data, competitive analysis, and our combined experience and insights—then build our features as quickly as possible to test them with customers to see if we were right.
We’ve found that the costs we incur–typically bugs or unpolished but functional features–are worthwhile in the name of getting feedback from our customers as quickly as possible about our product innovations. The sooner we have feedback from our customers, the sooner we know whether we guessed right about our product decisions and the sooner we can choose to either change course or double down on a winning idea.
Does that mean we don’t worry about delivering high quality features to customers or interrupting their experience with our product? Nothing could be further from the truth.
For starters, we put a strong emphasis on automated testing. This focus on testing and the framework that supports it is key to how we’ve structured our QA team and the approach they take. Our former CTO and IMVU co-founder Eric Ries has described in detail the infrastructure we use to support Continuous Deployment, but to summarize, we have implemented:
- A continuous integration server (Buildbot) to run and monitor our automated tests with every code commit
- A source control commit check to stop new code being added to the codebase when something is broke
- A script to safely deploy software to our cluster while ensuring nothing goes wrong. We wrote a cluster immune system to monitor and alert on statistically significant regressions, and automatically revert the commit if an error occurs)
- Real-time monitoring and alerting to inform the team immediately when a bug makes in through our deployment process
- Root cause analysis (Five Whys) to drive incremental improvements in our deployment and product development processes
This framework and our commitment to automated testing means that software engineers write tests for everything we code. We don’t have a specialized role that focuses solely on writing tests and infrastructure–that work is done by the entire engineering team. This is one factor which has allowed us to keep our QA team small. But you can’t catch all regressions or prevent all bugs with automated tests. Some customer use cases and some edge cases are too complex to test with automation. Living, breathing, intelligent people are still superior at doing multi-dimensional, multi-layered testing. Our QA Engineers leverage their insight and experience to find potential problems on more realistic terms, using and testing features in the same ways real customers might use them.
Our QA Engineers spend at least half their time manually testing features. When we find test cases that can be automated, we add test coverage (and we have a step in our Scrum process to help ensure we catch these). We also have a Scrum process step that requires engineers to demonstrate their features to another team member–essentially, doing some basic manual testing with witnesses present to observe. Since we have far more features being built than our QA Engineers have time to test, it also forces the team to make trade-offs and answer the question, “What features will benefit most from having QA time?” Sometimes this isn’t easy to answer, and it forces our teams to consider having other members of the team, or even the entire company, participate in larger, organized testing sessions. When we think it makes sense, our QA Engineers organize and run these test sessions, helping the team to find and triage lots of issues quickly.
Our QA Engineers also have two more important responsibilities. The first is a crucial part our Scrum planning process, by writing test plans and reviewing them with product owners and technical leads. They help ensure that important customer use cases are not missed, and that the engineering team understands how features will be tested. Put another way, our QA engineers help the rest of the team consider how our customers might use the features they are building.
The second responsibility is what you might expect of QA Engineers working in an Agile environment: they work directly with software engineers during feature development, testing in-progress features as much as possible and discussing those features face-to-face with the team. By the time features are actually “ready for QA”, they have usually been used and tested at some level already, and many potential bugs have already been found and fixed.
Regardless of how much manual testing is completed by the team before releasing a feature, we rely again on automation: our infrastructure allows us to do a controlled rollout live to production, and our Cluster Immune System monitors for regressions, reducing the risk of negatively impacting customers.
Finally, once our features are live in front of customers, we watch for the results of experiments we’ve set up using our A/B split-test system, and listen for feedback coming in through our community management and customer service teams. As the feedback starts rolling in—usually immediately, we’re ready. We’ve already set aside engineering time specifically to react quickly if bugs and issues are reported, or if we need to tweak features to make them more fun or useful for customers.
We’re definitely not perfect: with constrained QA resources and a persistent drive by the team to deliver value to customers quickly, we do often ship bugs into production or deliver features that are imperfect. The great thing is that we know right away what our customers want us to do about it, and we keep on iterating.
By: James Birchler and Roland Blanton
Yesterday we kicked off another Hack Week at IMVU, a solid week when we put product development in the hands of IMVU engineers. What does this mean? An engineer can spend the week working on something they personally feel is valuable to the company. It’s a way to harness experience and insights from across the company and give everyone more ownership over what we are building here. The buzz in the building is tangible: there are fewer meetings, less process around group work, and people are focused on finishing their features to put them in front of customers.
Hack Week has been an integral part of our engineering culture since 2007, giving our software engineers a chance to guide product development and test their ideas. This tradition has resulted in many popular features like Outfits Management, Turbo Product Loading of 3D assets, IMVU Badges, and shopping directly from a 3D chat. All these features were driven by IMVU engineers during past Hack Weeks and then adopted by our product teams for release to all customers.
To help foster an environment of creativity, we use our A/B experiment system to make it easy and low-risk for us to test product innovations with customers. Rather than rely on the opinions in the room, we prefer getting feedback directly from customers to help guide our decisions.
In order to maximize chances for success, we follow some lightweight processes and rules:
- The goal in most cases is to deliver valuable features live to customers in experiments by the end of the week.
- Engineers choose projects to work on–sometimes from a team’s existing product backlog, and sometimes not.
- We work closely with product owners, user experience designers, technical leads, QA engineers, and other stakeholders to come up with what we think is a good plan.
- We start hacking, ultimately releasing features in A/B experiments to our customers.
- We only work on one project at a time (it’s pretty easy to find yourself starting many projects and never finishing, which runs counter to our overall goal of delivering value to customers).
- Everyone does a demo of their work at the end of Hack Week.
There is a lot of face to face, ad-hoc collaboration going on in the weeks preceding Hack Week, and during Hack Week itself. The week concludes with demos to the entire company, a strong feeling of engagement with our customers and our product, and curiosity about what our customers will tell us about what we’ve built.