Thursday, September 13, 2012

Subclassing immutable types in Python 3

I've been hacking around in Python 3 for a while now, writing (as I mentioned a while back) a package for implementing arbitrary finite first-order structures.

The natural way to build such a thing is to subclass sets in some way: the traditional way of describing a first-order structure is as a set with additional information attached. But Python has two native set types: set and frozenset, the former mutable and the later im-.

Ideally, we'd like to enforce that you can't add or subtract elements from a model -- mean, if you plan to iterate through a model M, it would be a poor choice to pop elements off one at a time and throw them away, so a good programmer should make that an action that a user can't do by accident. Hence, I'm going to subclass from frozenset instead of the mutable set class.

There's only one problem: initializing an immutable object, or a mutable object which inherits from an immutable class, requires doing things a little differently.

I won't go through the whole setup process for my Model class, since that would involve a lot of explaining, but I'll do a simplified example: a derived class from frozenset with an extra data attribute foo.

But first, where does the problem crop up anyway? Let's say that we didn't know about this whole business: how would we usually program a class inheritance?
class SetWithFoo(frozenset):
    def __init__(self,X,foo_in):
        frozenset.__init__(self,X)
        self.foo = foo_in
 But when you compile this code we might get something like
>>> S = SetWithFoo({1,2},"bar")
>>> S.foo
    'bar'
>>>1 in S
    False
What's gone wrong? Well, remember how frozensets are immutable? And remember how __init__(self,...) isn't a constructor, because the object self already exists? What that means here is that self gets summoned into existence from the void with certain elements -- and those are the only elements which will ever belong to it. By the time __init__ sees the object, it can't change its members.

The method which does the summoning is where we need to work. That method is called __new__(...), and it's the only method in your class which doesn't take self as an argument -- because self doesn't exist yet! Instead, the first argument to __new__ is the class of the object it's creating. You the programmer don't have to worry about this at all -- just put cls as the first argname, and Python will take care of the rest:
def __new__(cls,X,foo_in):
Now, at this point there are two schools of thought on what to do next. One of these schools says that if you're going to bother overriding __new__, you should code the whole initialization in there and just leave __init__ alone (don't even explicitly override it). I'm more in the other side, which thinks that only that which has to be done in __new__ (that is, what has to be done before freezing the basic data of your object, in this case the immutable members of the set) should be done there; everything else can be handled profitably in __init__. The one caveat is that the arglists (including default arguments) of the two methods must be the same (except for cls and self, of course), or else Python will throw a fit when you call SetWithFoo(args)
Long story short, either of the two following code blocks will do just fine.
def __new__(cls,X,foo_in):
    s = frozenset.__new__(cls,X)
    s.foo = foo_in
or
def __new__(cls,X,foo_in):
    return frozenset.__new__(cls,X)

def __init__(self,X,foo_in)
    self.foo = foo_in
but obviously not both ;)

No comments:

Post a Comment