The Case of the Trailing Space

Solve the case of the trailing space with IMVU’s Senior Engineer, Michael Slezak.

In this post, I wanted to discuss a problem we ran into recently dealing with REST authentication in our IMVU client application which ultimately boiled down to small discrepancies between JSON encoders. In this case, I’m going to focus on Python’s JSON library and javascript’s JSON encoder. Before I dive into the problem, I want to provide a quick background on the client application architecture.

Client Architecture

The IMVU client application contains several layers in its architecture ranging from rendering, business logic, and front-end development. We use C++ for rendering and other low level functionality such as windows, call stacks, and interfacing with the business logic. Our business/client logic layer is all in Python. We use Python for communicating to the front-end, pinging our servers, and maintaining advanced chat logic and chat state, among many other things. Finally, we use the Gecko SDK (a.k.a. XULRunner) to handle all of our UI needs. This means we can write our front-end using HTML, Javascript, and CSS. We also have our own library to allow the front-end to call out to Python for things such as user data. Using web technologies for client UI development has allowed us to unify the technology we use for our site such as jQuery, Underscore.js, and even a few in-house libraries, resulting in increased engineering productivity.

With that said, we’ve recently hand rolled our own implementation of Promises as identified by the upcoming ECMAScript 6 proposal in our open-sourced imvu.js library. I decided to drop this implementation into the client for our immediate use. With this change, I was able to also drop in a new REST client to help with chaining our requests in a more synchronous-like fashion (despite the fact that it’s asynchronous). Hooray! Asynchronous programming has just gotten easier for the client! However, that wasn’t the case…

XMLHttpRequest, are you there?

We got our feet wet using this exciting change with a new feature that my team is developing. There was one issue: Not authorized for request.

Uh oh.

Looks like we need an auth token for these requests. Specifically, the POST and DELETE requests. Simple enough. I found that we handle authentication in our Python code within this “securePostRaw” function. Every single request ever made in the client goes through this. To my knowledge, POSTing with XMLHttpRequest has never been used in the client. Ever. Needless to say, this was news to me…

It’s Dangerous to go Alone…

Looking at the securePostRaw function, we seem to take the auth token that the server initially gave us, and hash everything and use that as the new auth token. For example, if there is a request body, we JSON encode it and also utf-8 encode it. We then take the the customer id, the original auth token, the JSON encoded body, and the query parameters (if they exist), and then run a hash function over the whole, concatenated blob

OK… a little odd way of securing a POST request since we are running over HTTPS. But this is legacy code! So, it’s understandable.

I took this hashing function and copied it into another file for our front-end to call directly. I don’t need Python to send off the request, I just want to set the correct auth headers via XMLHttpRequest so that we can use this new REST client. You might be wondering, “Why not just let Python handle it then?” Partly because we have a bigger vision in the near future where we bring in a bigger, hand-rolled front-end library that is now ubiquitous to how we write front-end software at IMVU. To achieve this, I need XHR to work so that it’s easier to just “plop it in”.

Anyways, I finally get the right tokens to put in our headers. We’re on our way! All is right with the world, so we test again and: Not authorized for request

…Take This!

Wait, but I did what you told me. I did all the right things. The old and new client tokens match up! This isn’t fair!!

Luckily, I can run the IMVU client off a local server and dig into what the server is seeing. Through our REST middleware stack on the server, we’re failing authentication! When we attempt to grab the logged in user, it fails to identify us! But the old and new client tokens are the same! I’m in parity! Not so fast…

Here is a dump from the server of when Python makes the request using the securePostRaw function:

slezakblog1

And here is the dump when XHR makes the request:

slezakblog2

sign is what the server comes up and sent is what the client sent (obviously). Why in the hell are they different? The server is playing tricks on us… So, I decide to log the final output of the string the server uses before it runs the hash function on it.

The Case of the Trailing Space

The log looked something like this:

slezakblog3

….There is an extra space right before the "http://" starts!!! , It turns out JSON.stringify doesn’t leave any spaces like this. Since we let Python compute the hash, it also encodes the request body into JSON which means that it’s causing the spacing! Since we do xhr.send(JSON.stringify(body)), we have a mismatch between what the client calculates and what the server calculates because the server technically has a different request body (by one single space!).

Fortunately, the json library in Python has a keyword argument in it’s dump function called separators. So, the code now looks like json.dumps(body, separators=(‘,’, ‘:’)) which gives us a more compact version of the encoding. We are now matching with JSON.stringify.

After this change, we were finally able to come to a solution and it works!

Conclusion

Several lessons learned from this:

  • Hashing things that are dependent on variable data (which is also encoded) can be problematic. Since JSON is flexible in its encoding and allows for spacing, it can throw off the whole hash. Things that are more in our control such as integer values and constant strings are probably better to use.
  • All JSON encoders/decoders aren’t created equal.

If you enjoyed reading this article and are excited about solving problems such as unifying web technologies across multiple platforms, you’re in luck! We’re hiring!

 

 

 

Tracing Leaks in Python: Find the Nearest Root

By Chad Austin


Garbage Collection Doesn’t Mean You Can Ignore Memory Altogether…

Garbage collection removes a great deal of burden from programming. In fact, garbage collection is a critical language feature for all languages where abstractions such as functional closures or coroutines are common, as they frequently create reference cycles.

IMVU is a mix of C++ and Python. The C++ code generally consists of small, cohesive objects with a clear ownership chain. An Avatar SceneObject owns a ModelInstance which owns a set of Meshes which own Materials which own Textures and so on… Since there are no cycles in this object graph, reference-counting with shared_ptr suffices.

The Python code, however, is full of messy object cycles. An asynchronous operation may hold a reference to a Room, while the Room may be holding a reference to the asynchronous operation. Often two related objects will be listening for events from the other. While Python’s garbage collector will happily take care of cycles, it’s still possible to leak objects.

Imagine these scenarios:

  • a leaked or living C++ object has a strong reference to a Python object.
  • a global cache has a reference to an instance’s bound method, which implicitly contains a reference to the instance.
  • two objects with __del__ methods participate in a cycle with each other, and Python refuses to decide which should destruct first

To detect these types of memory leaks, we use a LifeTimeMonitor utility:

a = SomeObject()
lm = LifeTimeMonitor(a)
del a
lm.assertDead() # succeeds

b = SomeObject()
lm = LifeTimeMonitor(b)
lm.assertDead() # raises ObjectNotDead

We use LifeTimeMonitor’s assertDead facility at key events, such as when a user closes a dialog box or 3D window. Take 3D windows as an example. Since they’re the root of an entire object subgraph, we would hate to inadvertently leak them. LifeTimeMonitor’s assertDead prevents us from introducing an object leak.

It’s good to know that an object leaked, but how can you determine why it can’t be collected?

Python’s Garbage Collection Algorithm

Let’s go over the basics of automatic garbage collection. In a garbage-collected system there are objects and objects can reference each other. Some objects are roots; that is, if an object is referenced by a root, it cannot be collected. Example roots are the stacks of live threads and the global module list. The graph formed by objects and their references is the object graph.

In SpiderMonkey, Mozilla’s JavaScript engine, the root set is explicitly-managed. SpiderMonkey’s GC traverses the object graph from the root set. If the GC does not reach an object, that object is destroyed. If C code creates a root object but fails to add it to the root set, it risks the GC deallocating the object while it’s still in use.

In Python however, the root set is implicit. All Python objects are ref-counted, and any that can refer to other objects — and potentially participate in an object cycle — are added to a global list upon construction. Each GC-tracked object can be queried for its referents. Python’s root set is implicit because anyone can create a root simply by incrementing an object’s refcount.

Since Python’s root set is implicit, its garbage collection algorithm differs slightly from SpiderMonkey’s. Python begins by setting GCRefs(o) to CurrentRefCount(o) for each GC-tracked PyObject o. Then it traverses all referents r of all GC-tracked PyObjects and subtracts 1 from GCRefs(r). Then, if GCRefs(o) is nonzero, o is an unknown reference, and thus a root. Python traverses the now-known root set and increments GCRefs(o) for any traversed objects. If any object o remains where GCRefs(o) == 0, that object is unreachable and thus collectible.

Finding a Path From the Nearest Root to the Leaked Object

Now that we know how Python’s garbage collector works, we can ask it for its set of roots by calculating GCRefs(o) for all objects o in gc.get_objects(). Then we perform a breadth-first-search from the root set to the leaked object. If the root set directly or indirectly refers to the leaked object, we return the path our search took.

Sounds simple, but there’s a catch! Imagine that the search function has signature:

PyObject* findPathToNearestRoot(PyObject* leakedObject);

leakedObject is a reference (incremented within Python’s function-call machinery itself) to the leaked object, making leakedObject a root!

To work around this, change findPathToNearestRoot so it accepts a singleton list containing a reference to the leaked object. findPathToNearestRoot can borrow that reference and clear the list, ensuring that leakedObject has no untracked references.

findPathToNearestRoot will find paths to expected Python roots like thread entry points and module objects. But, since it directly mirrors the behavior of Python’s GC, it will also find paths to leaked C references! Obviously, it can’t directly point you to the C code that leaked the reference, but the reference path should be enough of a clue to figure it out.

The Code

template<typename ArgType>
void traverse(PyObject* o, int (*visit)(PyObject* visitee, ArgType* arg), ArgType* arg) {
    if (Py_TYPE(o)->tp_traverse) {
        Py_TYPE(o)->tp_traverse(o, (visitproc)visit, arg);
    }
}

typedef std::map<PyObject*, int> GCRefs;

static int subtractKnownReferences(PyObject* visitee, GCRefs* gcrefs) {
    if (gcrefs->count(visitee)) {
        Assert(PyObject_IS_GC(visitee));
        --(*gcrefs)[visitee];
    }
    return 0;
}

typedef int Backlink; // -1 = none

typedef std::vector< std::pair<Backlink, PyObject*> > ReferenceList;
struct Referents {
    std::set<PyObject*>& seen;
    Backlink backlink;
    ReferenceList& referenceList;
};

static int addReferents(PyObject* visitee, Referents* referents) {
    if (!referents->seen.count(visitee) && PyObject_IS_GC(visitee)) {
        referents->referenceList.push_back(std::make_pair(referents->backlink, visitee));
    }
    return 0;
}

static Backlink findNextLevel(
    std::vector<PyObject*>& chain,
    const ReferenceList& roots,
    PyObject* goal,
    std::set<PyObject*>& seen
) {
    if (roots.empty()) {
        return -1;
    }

    for (size_t i = 0; i < roots.size(); ++i) {
        if (roots[i].first != -1) {
            if (goal == roots[i].second) {
                chain.push_back(goal);
                return roots[i].first;
            }
            seen.insert(roots[i].second);
        }
    }

    ReferenceList nextLevel;
    for (size_t i = 0; i < roots.size(); ++i) {
        Referents referents = {seen, i, nextLevel};
        traverse(roots[i].second, &addReferents, &referents);
    }

    Backlink backlink = findNextLevel(chain, nextLevel, goal, seen);
    if (backlink == -1) {
        return -1;
    }

    chain.push_back(roots[backlink].second);
    return roots[backlink].first;
}

static std::vector<PyObject*> findReferenceChain(
    const std::vector<PyObject*>& roots,
    PyObject* goal
) {
    std::set<PyObject*> seen;
    ReferenceList unknownReferrer;
    for (size_t i = 0; i < roots.size(); ++i) {
        unknownReferrer.push_back(std::make_pair<Backlink>(-1, roots[i]));
    }
    std::vector<PyObject*> rv;
    // going to return -1 no matter what: no backlink from roots
    findNextLevel(rv, unknownReferrer, goal, seen);
    return rv;
}

static object findPathToNearestRoot(const object& o) {
    if (!PyList_Check(o.ptr()) || PyList_GET_SIZE(o.ptr()) != 1) {
        PyErr_SetString(PyExc_TypeError, "findNearestRoot must take a list of length 1");
        throw_error_already_set();
    }

    // target = o.pop()
    object target(handle<>(borrowed(PyList_GET_ITEM(o.ptr(), 0))));
    if (-1 == PyList_SetSlice(o.ptr(), 0, 1, 0)) {
        throw_error_already_set();
    }

    object gc_module(handle<>(PyImport_ImportModule("gc")));
    object tracked_objects_list = gc_module.attr("get_objects")();
    // allocating the returned list may have run a GC, but tracked_objects won't be in the list

    std::vector<PyObject*> tracked_objects(len(tracked_objects_list));
    for (size_t i = 0; i < tracked_objects.size(); ++i) {
        object to = tracked_objects_list[i];
        tracked_objects[i] = to.ptr();
    }
    tracked_objects_list = object();

    GCRefs gcrefs;

    // TODO: store allocation/gc count per generation

    for (size_t i = 0; i < tracked_objects.size(); ++i) {
        gcrefs[tracked_objects[i]] = tracked_objects[i]->ob_refcnt;
    }

    for (size_t i = 0; i < tracked_objects.size(); ++i) {
        traverse(tracked_objects[i], subtractKnownReferences, &gcrefs);
    }

    // BFS time

    std::vector<PyObject*> roots;
    for (GCRefs::const_iterator i = gcrefs.begin(); i != gcrefs.end(); ++i) {
        if (i->second && i->first != target.ptr()) { // Don't count the target as a root.
            roots.push_back(i->first);
        }
    }
    std::vector<PyObject*> chain = findReferenceChain(roots, target.ptr());

    // TODO: assert that allocation/gc count per generation didn't change

    list rv;
    for (size_t i = 0; i < chain.size(); ++i) {
        rv.append(object(handle<>(borrowed(chain[i]))));
    }

    return rv;
}