Thursday, August 22, 2013

Debugging C/API python extensions

I feel really fracking dumb right about now. But at least I now know how to use GDB (the Gnu DeBugger) on this kind of project. So I've got that going for me. Which is nice.


OK, so some background: I've been keeping myself amused by learning how to write Python extension modules (in C) because I clearly don't have enough else to do at the moment. (OK, the real reason was that Cython was being stupid about some integer arithmetic stuff and I didn't feel like digging into the guts of the Cython-generated C code without knowing what the hell was supposed to be going on.) And, as always happens, there were failed compiles for trivial reasons and failed compiles because things were the wrong type and core dumps because of programmer logic errors.

And about halfway through the writing process, I started getting an odd compiler warning:
>>> warning: control reaches end of non-void function [-Wreturn-type]

Here's what was going on in that function: I'm writing a quick-and-dirty binary search tree, where the data are lists of Python objects. In highly simplified form:
struct _List {
  struct _List *next;
  PyObject *obj;
} List;

struct _Table {
  struct _Table *left;
  struct _Table *right;
  Py_hash_t hashval;
  List *ObjList;
} Table;

(There are actually other fields, but they're not important for this story.)

OK, so I wrote the obvious basic recursive search function for a structure like this:
static List* search_Table(Table* T, PyObject *key) {
  switch compare(PyObject_Hash(key), T->hashval) {
  case -1:
    if (T->left == NULL) {
      /* T does not contain the key, return NULL */
    } else {
      search_Table(T->left, key);
    }
  case 1:
    if (T->right == NULL) {
      /* see above */
    } else {
      search_Table(T->right, key);
    }
  case 0:
    /* look through T's object list for the key */
  }
}
I bet you already see why I was getting that compiler warning, but like millions of coders before me, I was too close to the screen to step back and see it.

So what do I do? To quiet the compiler, I put in a dummy error raise at the end of the function, thinking that it should never be raised.

And of course, as soon as I tested the module on an example of decent size, it got raised.

Here's where the fun started. Because this was a C-level raise of a Python error, the traceback info was... basically useless. In particular, there's no way to use PDB to investigate the stack info, variables, etc. in place in the C-level execution. However, there's not really a way to run an interactive Python session under GDB. (GDB can start up it's own Python session to enable scripting control of debugging, but that's something else entirely. Also, I think it only uses Python 2.) And even if you put your commands in a script and run
$ gdb python3 buggyexample.py
GDB will happily report that everything went hunky-dory, because it doesn't think of Python exceptions as problems.


OK, so here's how to proceed: I assume that the extension module is named "spammodule.c" and it has a proper distutils-compliant "setup.py" script.

First build the extension with no optimization (so that all function names will continue to exist):
$ CFLAGS='-Wall -O0 -g' python3 setup.py build
Then copy the .so files (the compiled module code on Linux) into the same directory everything else is in:
$ cp /build/lib.linux-x86-x.y/modulename.so .
(note the dot at the end).
Now invoke
$ gdb python3
> break spammodule.c:18
where instead of 18 you want the line of the function you want to start debugging at. In my case, that was the line before I told it to throw the exception (that way, I'd only enter the debugger if the function didn't do what I thought it should). You may get a message saying that GDB doesn't know anything about a file named "spammodule.c"; it'll ask if you want it to listen out for one to get loaded in dynamically, which of course you'll say yes to. Then
> run buggyexample.py
and wait for the exception hook to catch.

Oh, and if you didn't catch my mistake in the original function: each of the recursive calls to search_Table() should have been return search_Table(...) instead.

No comments:

Post a Comment