March 2014 – IMVU Engineering Blog

What it’s like to use Haskell

March 24, 2014March 24, 2014 IMVU Engineering 16 Comments

By Andy Friesen

Since early 2013, we at IMVU have used Haskell to build several of the REST APIs that power our service.

When the company started, we chose PHP as our application server language, in part, because the founders expected the website to only be a small part of the business! IMVU was primarily about a downloadable 3D client. We needed “a website or something” to give users a place to download our client from, but didn’t expect it would have to be much more than that. This shows that predicting the future is hard.
Years later, we have quite a lot of customers, and we primarily use PHP to serve them. We’re big enough that we run multiple subteams on separate initiatives at the same time. Performance is becoming important to us not just because it matters to our customers, but because it can easily make the difference between buying 4 servers and buying 40 servers to support some new feature.

So, early in 2012, we found ourselves ready to look for an alternative that would help us be more rigorous. In particular, we were ready for the idea that sacrificing a tiny bit of short term, straight-line time to market might actually speed us up in the long run.

How We Got Here

I started learning Haskell in my spare time in part because Haskell seems like the exact opposite of PHP: Natively compiled, statically typed, and very principled.

My initial exploration left me interested in evaluating Haskell at real scale. A year later, we did a live-fire test in which we taught multiple teammates Haskell while delivering an important new feature under a deadline.

Today, a lot of our backend code is still driven by PHP, but we have a growing amount of Haskell that powers newer features. The process has been exciting not only because we got to actually answer a lot of the questions that keep many people from choosing not to try Haskell, but also because it’s simply a better solution.

The experiment to start developing in Haskell took a lot of internal courage and dedication, and we had to overcome a number of, quite rational, concerns related to adopting a whole new language. Here are the main ones and how they worked out for us:

Scalability

The first thing we did was to replace a single service with a Haskell implementation. We picked a service that was high-volume but was not mission critical.

We didn’t do any particular optimization of this new service, but it nevertheless showed excellent performance characteristics in the field. Our little Haskell server was running on a pair of spare servers that were otherwise set for retirement, and despite this, each machine was handling about 20x as many requests as one of our high-spec PHP servers could manage.

Reliability

The second thing we did was to take our hands off the Haskell service and leave it running until it fell over. It ran for months without intervention.

Training

After the reliability test, we were ready to try a live fire exercise, but we had to wait a bit for the right project. We got our chance in early 2013.

The rules of the experiment were simple: Train 3 engineers to write the backend for an important new project and keep up with a separate frontend team. Most of the code was to be new, so there was relatively little room for legacy complications.

We very quickly learned that we had also signed up for a lot of catch-up work to bring the Haskell infrastructure inline with what we’ve had for years in PHP. We were very busy for awhile, but once we got this infrastructure out of the way, the tables turned and the front-end team became the limiting factor.

Today, training an engineer to be productive in our Haskell code is not much harder than training someone to be productive in our PHP environment. People who have prior functional programming knowledge seem to find their stride in just a few days.

Testing

Correctness is becoming very important for us because we sometimes have to change code that predates every current developer. We have enough users that mistakes become very costly, very quickly. Solving these sorts of issues in PHP is sometimes achievable but always difficult. We usually solve them with unit tests and production alerts, but these approaches aren’t sufficient for all cases.

Unit tests are incredible and great, but you’re always at the mercy of the level of discipline of every engineer at every moment. It’s easy to tell your teammates to write tests for everything, but this basically boils down to asking everyone to be at their very best every day. People make mistakes and things slip through the cracks.

When using Haskell, we actually remove an entire class of defects that we have to write tests for. Thus, the number of tests we have to write is smaller, and thus there are fewer cases we can forget to write tests for.

We like unit testing and test-driven development (TDD) at IMVU and we’ve found that Haskell is better with TDD, but also that TDD is better with Haskell. It takes fewer tests to get the same degree of reliability out of Haskell. The static verification takes care of quite a lot of error checking that has to be manually implemented (or forgotten) in PHP. The Haskell QuickCheck tool is also a wonderful help for developers.
The way Haskell separates pure computations from side effects let us build something that isn’t practical with other languages: We built a custom monad that lets us “switch off” side effects in our tests. This is incredible because it means that trying to escape the testing sandbox breaks compilation. While we have had to fight intermittent test failures for eight years in PHP (and at times have had multiple engineers simultaneously dedicated to the problem of test intermittency,) our unit tests in Haskell cannot intermittently fail.

Deployment

Deployment is great. At IMVU, we do continuous deployment, and Haskell is no exception. We build our application as a statically linked executable, and rsync it out to our servers. We can also keep old versions around, so we can switch back, should a deployment result in unexpected errors.

I wouldn’t write an OS kernel in it, but Haskell is way better than PHP as a systems language. We needed a Memcached client for our Haskell code, and rather than try to talk to a C implementation, we just wrote one in Haskell. It took about a half day to write and performs really well. And, as a side effect, if we ever read back some data we don’t expect from memcached (say, because of an unexpected version change) then Haskell will automatically detect and reject this data.

We’ve consistently found that we unmake whole classes of bugs by defining new data types for concepts to wrap primitive types like integers and strings. For instance, we have two lines of code that say that “customer IDs” and “product IDs” are represented to the hardware as numbers, but they are not mutually convertible. Setting up these new types doesn’t take very much work and it makes the type checker a LOT more helpful. PHP, and other popular dynamic server languages like Javascript or Ruby, make doing the same very hard.

Refactoring is a breeze. We just write the change we want and follow the compile errors. If it builds, it almost certainly also passes tests.

Not All Sunshine and Rainbows

Resource leaks in Haskell are nasty. We once had a bug where an unevaluated dictionary was the source of a space leak that would eventually take our servers down. We also ran into an issue where an upstream library opened /dev/urandom for randomness, but never closed the file handle. These issues don’t happen in PHP, with its process-per-request model, and they were more difficult to track down and resolve than they would have been in C++.

The Haskell package manager, Cabal, ended up getting in the way of our development. It lets you specify version ranges of particular packages you want, but it’s important for everyone on the team to have exactly the same versions of every package. That means controlling transitive dependencies, and Cabal doesn’t really offer a way to handle this precisely. For a language that is so very principled on type algebra, it’s surprising that the package manager doesn’t follow suit regarding package versioning. Instead, we use Cabal for basic package installation, and a custom build tool (written in Haskell.)

Hiring

I’ll admit that I was very worried that we wouldn’t be able to hire great people if our criteria was expertise in an uncommon language without a comparatively sparse industrial track record, but the honest truth is that we found a great Haskell hacker in the Bay area after about 4 days of looking.

We had a chance to hire him because we were using Haskell, not in spite of it.

Final Thoughts

While it’s usually difficult to objectively measure things like choice of programming language or softwarestack, we’re now seeing fantastic, obvious productivity and efficiency gains. Even a year later, all the Haskell code we have runs on just a tiny number of servers and, when we have to make changes to the code, we can do so quickly and confidently.

Charming Python: How to actually close a socket by calling shutdown() before calling close()

March 6, 2014March 6, 2014 IMVU Engineering Leave a comment

By Eric Hohenstein

You may never have to use the Python low-level socket interface but if you ever do, here’s a tip regarding some surprising Python socket behavior that might help you.

The standard Python library has a “socket” module written in Python that wraps an underlying “_socket” module written in C that wraps OS level sockets. Together, these expose a cross-platform Berkeley socket-like interface to Python applications. I say Berkeley socket-like because the socket functions are exposed on an object rather than global free-functions. The surprise with this interface is that the close() function of the socket object does not close the socket. Instead, it de-references it, allowing garbage collection to eventually close the OS-level socket. If you only use the close() method on a socket, the socket will continue using both client and server resources until it is garbage collected. Worse, if the other end is continuing to send data to the locally discarded socket, it may block when the receive window of local socket fills up. Even worse than that, if the software handling the other end of the socket is written in erlang and 0 window packets are dropped somewhere in between (for instance because of overly aggressive firewall rules), the other end may block until the socket gets garbage collected even if the socket is configured to immediately time out send operations if they would block.

The solution is to first call shutdown() on the socket before calling close(). You have to be careful doing this if the socket is being read/written by multiple threads simultaneously but down that path lies madness, so don’t do that.

The Python documentation for the socket module actually does briefly mention this but doesn’t really give much of an explanation as to why it’s necessary. Really, the Python documentation on the subject is correct in that calling shutdown is better form. However, in cases where you are sure that the application state of the connection is no longer valid, it should be possible to bypass the shutdown and close the socket to simply free its resources immediately.

Why is it necessary to call shutdown() before close()? Let’s take a look at the socket.py module from the standard Python library. These code snippets are all taken from Python 2.6 source, though I’ve confirmed that Python 2.7 and Python 3.0 behave similarly. First, socket.py imports everything from _socket, which includes a class called socket:

import _socket

from _socket import *

Then, it saves a reference to _socket.socket called _realsocket

_realsocket = socket

Then it defines a list of functions that will be exposed from the real socket object to the wrapper class (see below):

_socketmethods = (

‘bind’, ‘connect’, ‘connect_ex’, ‘fileno’, ‘listen’,

‘getpeername’, ‘getsockname’, ‘getsockopt’, ‘setsockopt’,

‘sendall’, ‘setblocking’,

‘settimeout’, ‘gettimeout’, ‘shutdown’)

Note that ‘close’ is not in that list but ‘shutdown’ is.

Then it defines a _closedsocket class that will fail all operaitons:

class _closedsocket(object):

__slots__ = []

def _dummy(*args):

raise error(EBADF, ‘Bad file descriptor’)

# All _delegate_methods must also be initialized here.

send = recv = recv_into = sendto = recvfrom = recvfrom_into = _dummy

__getattr__ = _dummy

Then, it defines a class called _socketobject. Note the comment before the start of this class:

# Wrapper around platform socket objects. This implements

# a platform-independent dup() functionality. The

# implementation currently relies on reference counting

# to close the underlying socket object.

class _socketobject(object):

__doc__ = _realsocket.__doc__

__slots__ = [“_sock”, “__weakref__”] + list(_delegate_methods)

def __init__(self, family=AF_INET, type=SOCK_STREAM, proto=0, _sock=None):

if _sock is None:

_sock = _realsocket(family, type, proto)

self._sock = _sock

for method in _delegate_methods:

setattr(self, method, getattr(_sock, method))

Within the _socketobject class, it defines a method for each of the functions exposed by the _socket class:

_s = (“def %s(self, *args): return self._sock.%s(*args)\n\n”

“%s.__doc__ = _realsocket.%s.__doc__\n”)

for _m in _socketmethods:

exec _s % (_m, _m, _m, _m)

del _m, _s

Remember that close was not in the _socketmethods list. Here’s the close method of the _socketobject class:

def close(self):

self._sock = _closedsocket()

dummy = self._sock._dummy

for method in _delegate_methods:

setattr(self, method, dummy)

close.__doc__ = _realsocket.close.__doc__

Assigning a new value to self._sock de-references the real socket object, allowing it to be garbage collected. It then exposes the _socketobject class as socket:

socket = SocketType = _socketobject

Client networking code using the Python socket library would create a socket like so:

import socket

sock = socket.socket()

which will create a socket._socketobject instance that has a _sock member which is a _socket.socket instance. After connecting the socket to a remote endpoint, calling close() just assigns the _sock member to an instance of _closesocket, allowing Python to eventually garbage collect the original _socket.socket object. After the previous code, the following will “leak” the OS socket for some arbitrary amount of time:

sock.connect((“www.foobar.com“, 80))

sock.close()

Looking at the C code in _socketmodule.c, there is a close() method defined on the _socket.socket class but there is no code in socket.py that will call it:

static PyObject *

sock_close(PySocketSockObject *s)

{

SOCKET_T fd;

if ((fd = s->sock_fd) != -1) {

s->sock_fd = -1;

Py_BEGIN_ALLOW_THREADS

(void) SOCKETCLOSE(fd);

Py_END_ALLOW_THREADS

}

Py_INCREF(Py_None);

return Py_None;

}

Actually, there is code in socket.py that will call this function but it’s in the socket._fileobject class that is returned from the socket._socketobject makefile() function which typical networking code will likely not use.

If the close() method of socket._socketobject deferred to the _socket.socket close() method, calling close() on a socket.socket object would actually close the socket (if it was open). Since it does not, calling shutdown() is the only way to close the socket immediately. I’ll not go into the difference between close() and shutdown() here since it’s fairly complex and mostly unimportant but a simplified explanation is that shutdown() may wait until any unsent data has been flushed to the network (although it may not) and close() will not.

Here’s the implementation of _socket.socket shutdown() function:

static PyObject *

sock_shutdown(PySocketSockObject *s, PyObject *arg)

{

int how;

int res;

how = PyInt_AsLong(arg);

if (how == -1 && PyErr_Occurred())

return NULL;

Py_BEGIN_ALLOW_THREADS

res = shutdown(s->sock_fd, how);

Py_END_ALLOW_THREADS

if (res < 0)

return s->errorhandler();

Py_INCREF(Py_None);

return Py_None;

}

Calling shutdown() on an instance of socket._socketobject will actually shutdown the OS level socket before it is garbage collected. If the _socket.socket object is not already shutdown or closed, it will eventually be closed when it’s garbage collected:

static void

sock_dealloc(PySocketSockObject *s)

{

if (s->sock_fd != -1)

(void) SOCKETCLOSE(s->sock_fd);

Py_TYPE(s)->tp_free((PyObject *)s);

}

The conclusion is that Python sockets should always be closed by first calling shutdown() and then calling close().