Charming Python: How to actually close a socket by calling shutdown() before calling close()

By Eric Hohenstein

You may never have to use the Python low-level socket interface but if you ever do, here’s a tip regarding some surprising Python socket behavior that might help you.

The standard Python library has a “socket” module written in Python that wraps an underlying “_socket” module written in C that wraps OS level sockets. Together, these expose a cross-platform Berkeley socket-like interface to Python applications. I say Berkeley socket-like because the socket functions are exposed on an object rather than global free-functions. The surprise with this interface is that the close() function of the socket object does not close the socket. Instead, it de-references it, allowing garbage collection to eventually close the OS-level socket. If you only use the close() method on a socket, the socket will continue using both client and server resources until it is garbage collected. Worse, if the other end is continuing to send data to the locally discarded socket, it may block when the receive window of local socket fills up. Even worse than that, if the software handling the other end of the socket is written in erlang and 0 window packets are dropped somewhere in between (for instance because of overly aggressive firewall rules), the other end may block until the socket gets garbage collected even if the socket is configured to immediately time out send operations if they would block.

The solution is to first call shutdown() on the socket before calling close(). You have to be careful doing this if the socket is being read/written by multiple threads simultaneously but down that path lies madness, so don’t do that.

The Python documentation for the socket module actually does briefly mention this but doesn’t really give much of an explanation as to why it’s necessary. Really, the Python documentation on the subject is correct in that calling shutdown is better form. However, in cases where you are sure that the application state of the connection is no longer valid, it should be possible to bypass the shutdown and close the socket to simply free its resources immediately.

Why is it necessary to call shutdown() before close()? Let’s take a look at the socket.py module from the standard Python library. These code snippets are all taken from Python 2.6 source, though I’ve confirmed that Python 2.7 and Python 3.0 behave similarly. First, socket.py imports everything from _socket, which includes a class called socket:

import _socket

from _socket import * 

Then, it saves a reference to _socket.socket called _realsocket

_realsocket = socket

Then it defines a list of functions that will be exposed from the real socket object to the wrapper class (see below):

_socketmethods = (         

          ‘bind’, ‘connect’, ‘connect_ex’, ‘fileno’, ‘listen’,         
          ‘getpeername’, ‘getsockname’, ‘getsockopt’, ‘setsockopt’,         
          ‘sendall’, ‘setblocking’,         
          ‘settimeout’, ‘gettimeout’, ‘shutdown’) 

Note that ‘close’ is not in that list but ‘shutdown’ is.

Then it defines a _closedsocket class that will fail all operaitons:

class _closedsocket(object):   

        __slots__ = []   
        def _dummy(*args):       
                raise error(EBADF, ‘Bad file descriptor’)   
         # All _delegate_methods must also be initialized here.   
          send = recv = recv_into = sendto = recvfrom = recvfrom_into = _dummy   

          __getattr__ = _dummy

Then, it defines a class called _socketobject. Note the comment before the start of this class:

# Wrapper around platform socket objects. This implements

# a platform-independent dup() functionality. The

# implementation currently relies on reference counting

# to close the underlying socket object.

class _socketobject(object):    

       __doc__ = _realsocket.__doc__    
       __slots__ = [“_sock”, “__weakref__”] + list(_delegate_methods)    
       def __init__(self, family=AF_INET, type=SOCK_STREAM, proto=0, _sock=None):       
                if _sock is None:           
                      _sock = _realsocket(family, type, proto)       
                self._sock = _sock       
                for method in _delegate_methods:           
                        setattr(self, method, getattr(_sock, method)) 

Within the _socketobject class, it defines a method for each of the functions exposed by the _socket class:

             _s = (“def %s(self, *args): return self._sock.%s(*args)\n\n”         

            “%s.__doc__ = _realsocket.%s.__doc__\n”)   
             for _m in _socketmethods:       
                      exec _s % (_m, _m, _m, _m)   
             del _m, _s 

Remember that close was not in the _socketmethods list. Here’s the close method of the _socketobject class:

             def close(self):       

                     self._sock = _closedsocket()       
                     dummy = self._sock._dummy       
                     for method in _delegate_methods:           
                             setattr(self, method, dummy)   
              close.__doc__ = _realsocket.close.__doc__ 

Assigning a new value to self._sock de-references the real socket object, allowing it to be garbage collected. It then exposes the _socketobject class as socket:

socket = SocketType = _socketobject

Client networking code using the Python socket library would create a socket like so:

import socket

sock = socket.socket() 

which will create a socket._socketobject instance that has a _sock member which is a _socket.socket instance. After connecting the socket to a remote endpoint, calling close() just assigns the _sock member to an instance of _closesocket, allowing Python to eventually garbage collect the original _socket.socket object. After the previous code, the following will “leak” the OS socket for some arbitrary amount of time:

sock.connect((“www.foobar.com“, 80))

sock.close() 

Looking at the C code in _socketmodule.c, there is a close() method defined on the _socket.socket class but there is no code in socket.py that will call it:

static PyObject *

sock_close(PySocketSockObject *s)
{
SOCKET_T fd; 
if ((fd = s->sock_fd) != -1) {
s->sock_fd = -1;
Py_BEGIN_ALLOW_THREADS
(void) SOCKETCLOSE(fd);
Py_END_ALLOW_THREADS
}
Py_INCREF(Py_None);
return Py_None;
}  

Actually, there is code in socket.py that will call this function but it’s in the socket._fileobject class that is returned from the socket._socketobject makefile() function which typical networking code will likely not use.

If the close() method of socket._socketobject deferred to the _socket.socket close() method, calling close() on a socket.socket object would actually close the socket (if it was open). Since it does not, calling shutdown() is the only way to close the socket immediately. I’ll not go into the difference between close() and shutdown() here since it’s fairly complex and mostly unimportant but a simplified explanation is that shutdown() may wait until any unsent data has been flushed to the network (although it may not) and close() will not.

Here’s the implementation of _socket.socket shutdown() function:

static PyObject *

sock_shutdown(PySocketSockObject *s, PyObject *arg)
{
int how;
int res; 
how = PyInt_AsLong(arg);
if (how == -1 && PyErr_Occurred())
return NULL;
Py_BEGIN_ALLOW_THREADS
res = shutdown(s->sock_fd, how);
Py_END_ALLOW_THREADS
if (res < 0)
return s->errorhandler();
Py_INCREF(Py_None);
return Py_None;
}  

Calling shutdown() on an instance of socket._socketobject will actually shutdown the OS level socket before it is garbage collected. If the _socket.socket object is not already shutdown or closed, it will eventually be closed when it’s garbage collected:

 

static void

sock_dealloc(PySocketSockObject *s)
{
if (s->sock_fd != -1)
(void) SOCKETCLOSE(s->sock_fd);
Py_TYPE(s)->tp_free((PyObject *)s);
} 

The conclusion is that Python sockets should always be closed by first calling shutdown() and then calling close().

Leave a Reply