Sunday, August 5, 2012

Blogging learning Python: equivalence relations I

 (Continuation here)

So I've finally broken down and started climbing up the learning curve for Python, just like everyone has been telling me to do for a couple of years now. Last summer's debacles with Octave should have been the last straw, but it's amazing what one can't get oneself to do when one actually got the result (despite the algorithm's taking a week to finally fail to compute what I asked for) and can move over to writing a paper instead of laboriously constructing examples.


(While we're on the subject: hey anonymous referee who I totally know who you are, please to be sending my manuscript back to JSL with a big fat approval stamp soonish. Thanks in advance.)

Anyway: what I'm working towards is a library for working with arbitrary finite first-order structures, where "working with" means both doing structural things (forming products, verifying that a map is a homomorphism, etc.) and performing syntactic tests (computing whether \( A \models \varphi \), ...)

In the aforementioned Octave investigations, I'd hacked together tools for some of these things, always conscious of how kludgy a lot of it was, since Octave doesn't do object orientation at all well, doesn't have the right datatypes (it doesn't have a linked list type, for fuck's sake... everything in Octave is in genus 'array'), and so forth. (Don't get me wrong: Octave is a great piece of software for what it does, which is numerical computation and linear algebra on floats. I'll still use it to do my taxes.) Since I already had a good idea of the functionality I'd like from an equivalence relation class, I decided to use that as my first non-baby programming challenge.

(Intellectual property mumbo-jumbo: the following code implements some well-known algorithms which are AFAIK in the public domain. Treat it as published under some share-and-share-alike license the MIT license (thanks Aziz!), and if you use it in any kind of real application please include a reference to me as the author.)

The basic idea of the eqrel class defined below is that equivalence relations arise naturally as connected components of graphs and forests. (Rooted) forests, for their part, can be implemented really easily in a 1-by-N array or an N-element list EQ, by having EQ[x] be x's parent if x is not a root in the forest, and EQ[x] be zero minus the number of descendents of x, including x itself, if x is a root. (Thus x is a root iff EQ[x] < 0.) Since Python indexes starting at 0, it is natural to view an equivalence relation as living on the set \( \{ 0, 1, \ldots, N-1 \} \). For example: the equivalence relation on 6 elements with blocks diagram \( (0,1,2)(3)(4,5) \) could have any of the following tree representations:
[1,2,-3,-1,5,-2]
[1,-3,1,-1,5,-2]
[-3,0,0,-1,-2,4]
or others.

This post, we'll go over all the class methods; in the next post, we'll look at the module methods.

Let's start with the first essential: the initializer method:

class eqrel(list):
    def __init__(self,N,pairs = [], blocks = []):
        """Codes an equivalence relation on the set {0,1,..., N-1}
        from the blocks and generating pairs given."""
        list.__init__(self)
        self.extend([-1] * N)

        for b in blocks:
            for x in b[1:]:
                self[x] = b[0]
                self[b[0]] = -len(b)

        for x1,x2 in pairs:
           self.union(x1,x2)


But wait: how the hell does this work. We're not even done initializing, and we're already calling one of the class's methods?

Yep! Python is just fine with this (and IIRC Java is not. Suck it Java!) because in Python, the initializer method is not a constructor. As soon as __init__ opens up, say because EQ = eqrel(some args) is called somewhere, EQ exists and has type eqrel; it just doesn't have all its structure in place yet. Think of __init__ like a haberdasher: he doesn't create the person, he just gets them presentable for interacting with the rest of the world.

But we do need to define the union method (and its partner in crime, find):

    def find(self,x):
        """EQ.find(x) returns the root of the tree containing x.

        Simultaneously tends the tree by taking all ancestors of x and
        reattaching them as children of x's root.

        EQ.find(x) does not change the roots of EQ."""
        if self[x] < 0: return x
        
        ancestors = []
        active_element = x
        while self[active_element] > 0:
            ancestors.append(active_element)
            active_element = self[active_element]

        root = active_element
        for iter1 in ancestors:
            self[iter1] = root

        return root

    def union(self,x1,x2):
        """EQ.union(x1,x2) glues together the trees containing x1 and x2."""
        root1 = self.find(x1)
        root2 = self.find(x2)

        if root1 != root2:
            if self[root1] < self[root2]:
                #i.e. x1 has more elements in its class than x2
                self[root1] = self[root1] + self[root2]
                self[root2] = root1
            else:
                self[root2] = self[root1] + self[root2]
                self[root1] = root2
OK, so we can find the root of an element's tree, and glue two trees together. Very useful stuff. And already this is better than Octave, since these methods dynamically update their object, which is the whole point of the UNION/FIND algorithm anyway (i.e. the source of its really fucking amazing speed upper bounds). But I frequently need to do more: in particular, I often need to compute the intersection (lattice meet) or union (lattice join) of two equivalence relations. (Technically, lattice join is the transitive closure of the union, but you know what I mean.) For eqjoin, our current methods suffice, but for eqmeet, we'll need an auxiliary class method:

    def alists(self):
        """EQ.alists() returns the adjacency list representation of the forest EQ."""
        N = len(self)
        adjlist = [[] for x in range(N)]
        
        for iter1 in [x for x in range(N) if self[x] >= 0]:
            ##iter1 will be all non-roots
            adjlist[iter1].append(self[iter1])
            adjlist[self[iter1]].append(iter1)

        return adjlist

This is because I don't know of a good algorithm to compute the meet without symmetrizing the forest, being able to search up and down.

Another helpful one-liner:
    def index(self):
        return [x < 0 for x in self].count(True)

Nothing surprising in either of these. ("Index" is the mathematical term for the number of blocks in an equivalence relation. I realize that CS people frequently use it to talk about the position of an object in an iterable; you'll get used to it.)

There's one more set of class methods: my customized comparison ops coding the poset of equivalence relations (under inclusion, aka refinement). However, these methods call the functions eqmeet and eqjoin, so I'll wait to discuss them.

 All code typeset using David Craft's excellent SyntaxHighlighter.

3 comments:

  1. Infamous Heel - I think you might want the MIT License under which to publish your examples.

    /Aziz

    ReplyDelete
    Replies
    1. That looks like exactly what I said, but in perfectly comprehensible legal gibberish. Thanks!

      Delete
  2. I've been dreaming about a python library to handle first order thingies ever since I stumbled upon this beautiful programming language. I'd love to talk with you about what you have in mind, and maybe chip in with my modest python skills.
    One thing that has always kept me from plunging in and start coding wholeheartedly is that there's an impressive amount of stuff already done in java by Ralph Freese in his Universal Algebra Calculator. Recently I learned that it is possible to make these java libraries work in jython (http://universalalgebra.wordpress.com/documentation/uacalc/uacalc-at-the-command-line/).
    I failed to find your email address on your blog so I left this lengthy comment.
    Cheers!

    PS. Line 12 in your definition of find should read >=0 instead of >0.

    ReplyDelete