Thursday, February 14, 2013

On objects

I haven't had cause to blog about it much at all, but the only course I'm taking this semester is a for-fun course on approximation theory, or more specifically splines.

The guy teaching it, Larry, is... shall we say, old-school. I don't mean that in any way negatively; just that he remembers the days of punch-cards first-hand. However, I do mean to say that his way of computer programming is very different from my own -- at least, as I am now. The computational part of the course has given me an opportunity to reflect on my own changing thoughts, intuitions and preferences when it comes to programming.

It's a truism that programming languages (and related environments like markup languages) end up engendering distinctive thinking styles in their dedicated users. My favorite example of this, when I'm talking to nonspecialists, is the very different ways you think when you're writing in Powerpoint versus in LaTeX. Powerpoint promotes a cognitive style without room for so much as complete sentences, let alone ideas, let alone arguments. LaTeX, on the other hand, somehow manages to slot you into thinking in sentences, paragraphs, and recursively higher nested blocks -- I have no idea quite how this happens, by the way, because the syntax of a LaTeX document is not all that far removed from plain-text-plus-math-markup. The fact remains, however, that my most frequent mistake while typesetting a doc in LaTeX is inserting a period after closing a displayed equation.

My first serious experience with programming was in Matlab; Octave remained my go-to environment for developing solutions in my own mathematical investigations until last year, when I made the decision to learn Python. I'm now quite firmly a Python partisan, when it comes to any task more complicated than straight-up matrix multiplication. I also had a brief run-in with Java; that experience left me scarred and traumatized, as I might someday get the courage to write about. My therapist and I are in talks for the movie rights.

Anyway, Larry has a large-scale, robust package of Matlab code that he uses to teach this splines course. This code, to me, exemplifies the cognitive style of Matlab: "data" means arrays of numbers, and data are manipulated by functions, which explicitly take as passed arguments all the various pieces of "data" which they will need.

Another way to say this: in Matlab,
Data are dumb. Functions are smart, and everything is held together by the user, who must be the smartest of all.

As you've probably guessed by now, Matlab's cognitive style doesn't come naturally to me any more. In any event, I didn't want to spend a hundred bucks on a yearlong student Matlab license, and wanted to give my Python a workout, so I signed up for the course with the explicit intention of building things from scratch. (For some sufficiently advanced notion of "scratch", of course -- I'm not interested in telling the interpreter how to multiply matrices, TYVM.)

And so I run smack-dab into the problem of taking information taught in the cognitive style of Matlab, and recasting it mentally into the cognitive style of Python (and by extension the cognitive style of object-orientation), where the model is
Data are smart. Data know how to do everything. Do not do for your data what your data could just as well do for itself.
Some examples of what I mean will help.

Let's think about a polynomial. Mathematically, a polynomial is a function \[ f(t) = a_0 + a_1 t + \cdots + a_n t^n \]In Matlab, a natural way to store a polynomial would be as an array of coefficients:
fcoeffs = [a_n, ..., a_1, a_0]
where one would then have a function to evaluate the polynomial at a point like so:
y = polyval(fcoeffs,t)
I want to stress that there's nothing wrong with this cognitive style as far as it goes; however, I've grown to find it constraining. Yes, a polynomial "is" a vector, in the sense that the set of all polynomials is a vector space; one could even say that a polynomial "is" a matrix, though that would be a harder sell for me. But these are uncomfortably unsatisfying; vectors aren't, at a base level, functions; polynomials are first of all functions. Matrices are fundamentally functions, but their action does not correspond to the action of the polynomial they are standing in for.

What I've come to expect of the code I write, by contrast, is that if I'm implementing some mathematical object, that my programmed object behave like the mathematical object (within the limits of the computer's processing power) in all ways that I decide are relevant. If \(f\) is, mathematically a function which takes a real number as input, then in my programming environment I should be able to create
f = myFunctionObject(...)
and then issue the command
where t is some number, say a float, where \(f\) is defined.

Now, so far so good, Matlab has the capability to program \(f\) as a function. But that's all. You can choose to define a function which spits out the value \(f(t)\) when you hand it the parameter \(t\); or you can have data which "stand for" the function, and a separate evaluator function which has to be passed both the coefficient matrix and the input point.

But choose you must, so choose wisely.

What developing in Python has enabled me to do, instead, is to bypass the proliferation of helper functions which take longer and longer agglomerations of arguments, and to instead program those as tasks which my objects know how to do themselves. To compute the value of a spline (which is, for the initiated, a more complicated type of function obtained by gluing several polynomials together), a Matlab programmer would have to manually pass a coefficient vector, another vector representing the glue points, and possibly other data to an evaluator function. In Python, you can build your object so that it knows all its own glue points, knows its own coefficients, and does all the necessary stuff invisibly when the special method .__call__() is invoked (which happens behind the scenes when I, the human user, enter f(t)).

Things get even worse for the Matlab developer if they want to more interesting things, like, say, calculus. For technical reasons, the list of glue points changes when you differentiate or integrate a spline; what this means is that you have to once again pass a whole bunch of data to your differentiator function -- and then it has to spit out not only the coefficients of \( Df \), but also the revised auxiliary information too.

Whereas, a Pythonista would  set it up so that the user can write
Df = f.D()
and f would already "know" everything it needed.

No comments:

Post a Comment