Tuesday, August 14, 2012

Blogging learning Python: Equivalence Relations II

Part I

Last time we looked at the basic class methods of the eqrel class. Python knows these are class methods because their definition blocks are indented under the main class header. What this means in practice is that a method like union is called on an instance EQ of the equivalence relation class, modifies that instance, and doesn't return anything.

The methods we'll see this time are different: they don't belong to the class but to the overall module, and they return a new instance of the class.

As mentioned, the two main functions we want to be able to compute are the equivalence relation join and meet. Join doesn't require any new technology:

def eqjoin(EQa,EQb):
    """Returns an instance of eqrel coding the least equivalence relation
    containing both EQa and EQb.

    Both arguments must be of the same length."""
    if len(EQa) != len(EQb): raise IndexMismatchError()
    
    N = len(EQa)
    EQr = eqrel(N)

    for iter1 in range(N):
        for iter2 in [y for y in [EQa[iter1],EQb[iter1]] if y >= 0]:
            EQr.union(iter1,iter2)

    return EQr

I've included a custom error message in case __name__ passes stupid data to the method; I'm sure that there are other possible errors I could anticipate, but this is the only one I can remember accidentally tripping over in practice. Here's the custom exception definition:

class IndexMismatchError(Exception):
    def __init__(self):
        self.msg = "Index Mismatch!\n\nAll equivalence relations must have the same length!"

    def __str__(self):
        return repr(self.msg)

Notice that a Python exception is a class (which for obvious reasons inherits from Exception). Basically all I've written this one to do is print a message detailing why it got called; nothing fancy.

Now let's program equivalence relation meet. But to do this, we need a low-wattage implementation of breadth-first search. (Recall that we already have a class method EQ.alists(), shown in the last post, for creating an adjacency-list representation of the tree out of the original parent data.)

def bfs_component(alists,origin):
    """bfs_component(alists) returns a list, the connected component
    of the element origin in the graph coded by alists."""
    N = len(alists)
    is_searched = [False] * N
    queue = [origin]
    ret_component = []

    while len(queue) > 0:
        active_element = queue.pop()
        if is_searched[active_element]:
            pass
        else:
            is_searched[active_element] = True
            ret_component.append(active_element)

            for iter1 in [x for x in alists[active_element] if not is_searched[x]]:
                queue.insert(0,iter1)

    return ret_component

All this implementation does is compute the connected component of whatever vertex number is passed as the origin argument. This will allow us to search downwards, where the array-of-parents data representation allows easy upward search but no downward search. (The tradeoff is that it is more difficult to tend a tree represented as adjacency lists, cut off and reattach elements, etc. Not an insurmountable problem, but annoying.)

Finally we can meet the meet:

def eqmeet(EQa,EQb):
    """eqmeet(EQa,EQb) returns an instance of eqrel coding the greatest
    equivalence relation contained in both EQa and EQb. Both arguments must
    have the same length."""
    if len(EQa) != len(EQb): raise IndexMismatchError()
    
    N = len(EQa)
    alistsa = EQa.alists()
    alistsb = EQb.alists()
    components_b = dict([[root,bfs_component(alistsb,root)] for root in range(N) if EQb[root] < 0])
    ## components_b is a dictionary with items root : root/EQb, where root
    ## takes on all root values in EQb and root/EQb is a list of the elements
    ## in root's tree.
    
    component_list_r = []
    is_handled = [False] * N

    ## Basic procedure: 1: pop an element from the main queue.
    ## 2: find its EQa component and its EQb component.
    ## 2a: the intersection of these two is a component of EQr.
    ## 3. exhaust the remaining elements of the EQa component,
    ## treating it as a queue.

    for x in range(N):
        if is_handled[x]:
            pass
        else:
            cmp_a = bfs_component(alistsa,x)

            while len(cmp_a) > 0:
                x1 = cmp_a[0]
                cmp_b = components_b[EQb.find(x1)]

                cmp_r = [y for y in cmp_a if y in cmp_b]
                ## cmp_r is the intersection of cmp_a and cmp_b
                component_list_r.append(cmp_r)
                for y in cmp_r:
                    is_handled[y] = True

                cmp_a = [y for y in cmp_a if y not in cmp_r]
                ## deletes all elements of cmp_r from cmp_a

    return eqrel(N,blocks = component_list_r)

Python's list comprehension scheme is right up my alley, as you've no doubt already seen. I do sometimes worry about construction like the assignment of cmp_r near the end; abstractly, that could be \( \mathcal{O}(n^2) \), but I think Python has ways of making comprehensions like that run faster than the naive algorithm. (If someone knows if that's true, I'd love to know details. Maybe use some kind of hash and only check clashing pairs?)

So that's it for the methods I planned to write when I sat down to program. But wandering around in the documentation, I found something really fun, which I couldn't resist including. But that's for next time.

No comments:

Post a Comment