Tuesday, October 30, 2012

"Has the package installed correctly?"

This post will either be a staid and serious discussion of what is probably a completely n00b-tastic issue I'm having with Python, or more likely devolve into a rant. Important note: nothing in this post should be regarded as factual without further checking. If you find that I have made a factual mistake, please let me know in comments.


Like beer, the cause and solution of all of life's problems is extensibility. Since we don't want to build the world from scratch every morning, we asked god to allow us to save our work. And since a million monkeys on a million typewriters will eventually result in two of them both titling some function foo(x), god decided to avoid future headaches by spending the eighth day creating namespaces.

OK, so the basic problem is like this: Alice has written some code which creates virtual widgets. Bob finds that he has a use for widgets just like those Alice has written. How does Bob get Alice's widgets to exist on his system?

Method 0: Alice has written her widgets so that their only dependencies are the Python standard library; she has put all the code in one file, widgetsbyalice.py:

"""This is widgetsbyalice.py."""

class Widget(dict):
    def __init__(self,X):
        ...

Then all Bob has to do is get the file widgetsbyalice.py from Alice (say, by downloading it from Alice's website), and save it into some directory on the Python path. Then to create a widget, Bob enters the following in his Python terminal or into a script of his own:

import widgetsbyalice
w = widgetsbyalice.Widget(...)  

So far so easy. The suite of objects created by running the code in widgetsbyalice.py is called a module; the import command populates these objects into Bob's system, in a sub-namespace widgetsbyalice.objectname. (Note: as far as I can tell, it is not correct to refer to the file widgetsbyalice.py as a module; the module itself is the object in Python's virtual world, not something external to the Python interpreter, such as a file on disk or an actual memory location.)

OK, great, but most of the time, something useful enough to be shared around is also too big to live in one file. Let's say that Alice has written modules implementing WidgetA, WidgetB, WidgetC, etc. Now, not every user will want to use every one of these widgets, so we'd like a way to import only those widgets that we're actually going to use, but ensuring that the distribution of all the modules together. The first goal suggests that Bob simply download a directory of modules; but the second says that there should be some kind of abstraction superordinate to those modules. Never fear, Python supports bundling a bunch of related modules into a package. The basic abstract way that could look here is, the package name (widgetsbyalice) becomes the name of a subdirectory of a directory on the Python search path; that directory contains a script __init__.py which is run the first time a module in the package is imported (the presence of this script tells Python that this directory is a package. Is? Contains? Represents? Ugh, I don't know.) Alice might put this whole directory on her website file-by-file, with the instruction that Bob download them and arrange them into the correct file structure; or more likely, she'd put them into a tarball or other archive and say "extract this into a directory on your path". In any case, once these files are in place on Bob's machine, he should be able to

import widgetsbyalice.WidgetB
w = widgetsbyalice.WidgetB(...)  
or

from widgetsbyalice import WidgetB
w = WidgetB(...)  

Now, this process of creating a directory in the path is all that needs to be involved in "installing" a Python package. However, there are a number of helpful tools and systems that are designed to automate this, add some helpful layers, etc. (For example, doing version control and updates, etc., etc., ...) If you started off, as I did, with ActivePython, the package manager that comes bundled is called PyPM; there's a native tool called easy_install that does some of this too; and an increasingly popular tool called pip that I haven't used at all, so darüber werde ich schweigen.

One doesn't even need to use any of these installer systems at all; the standard Python package distribution utility (read about it in Dive Into Python) takes a directory (eg, the files extracted from a tarball) containing a script setup.py and uses the standardized command

python setup.py install

at the main shell to write all the package files to the right system directories.

However, some of this is still iffy for me. In particular, it's not clear to me what, exactly, a package is from the point of view of the Python interpreter. I'll illustrate with two examples, one of which was the proximate inspiration for this post.

Example 1: There exists a lightweight graph package called Gato. To install Gato, one downloads Gato-x.y.z.tar.gz, unpacks it to a convenient directory, navigates the main shell to that directory, and types

python setup.py install

The result of this command is the following: there is now a directory named Gato inside Python/Lib/site-packages; and when one opens up Python 2.7 (Gato hasn't been ported to Python 3),

import Gato.Graph

results in an object called Gato.Graph which is (a copy of) that module in the Gato package. Note that there is also now an object (of type module) called Gato in the main namespace! Note that when I enter the above command in Python, Gato's __init__.py script is run from the directory /Lib/site-packages/Gato.

This example is how all this "is supposed" to work, at least the way I understand Mark Pilgrim and the Python Tutorial. But consider this second example:

Example 2: in order to use the IPython IDE, which has a lot of nice features, one has to first install the distribute package, a new package for (surprise!) distributing and managing packages. (The package is designed as a fork of the existing setuptools package.) So I did the exact same thing for distribute that I did for Gato: I downloaded distribute-0.6.30.tar.gz, unpacked it to my installation files directory, (which creates a subdirectory called distribute-0.6.30), cd into that subdirectory, and

python setup.py install

just as before. Except now here's what happens: a directory named distribute-0.6.30-py2.7.egg is created in Lib/site-packages, which contains no __init__.py script; no directory is created at all. OK, and what happens in Python? Well,


import distribute

and

from distribute import *

both fail, claiming that Python doesn't know about this 'distribute' fellow. I spent about half a day beating my head up and down, trying various download and install schemes, getting the same errors. I eventually filed this bug report on the package's development page; the response I got was dismissive and snippy: "distribute is the name of the package. The module to import is setuptools."

Lo and behold,

import setuptools 

succeeds (no surprise there; there was already a setuptools module that Python knew about) and

help(setuptools)

reveals that the setuptools importer script __init__.py is found in the distribute-0.6.30-py2.7.egg directory created in the last step! After further digging, it appears that the distribute setup.py script put that .egg folder on my sys.path, and Python finds setuptools there (rather than wherever it was finding the old, pre-"distribute" version).

But now I'm mystified: in what sense is distribute a package at all? I can see calling it a "project", and calling "setuptools" a package, but what is gained by this arcane process of creating a directory with an oviparous name and hiding packages inside there?

Oh, and also, the questions in the title: how the fuck was I supposed to test whether distribute had installed correctly? You know how I knew Gato had installed correctly? I imported the fucking thing! I could run unit tests if I wanted to, or create a Hello World graph. But what should I have done for distribute? In particular, without filing that bug report, how the hell would I have known to just try to import a module name that is not mentioned once on the package's page? (The page says it's trying to replace setuptools, not implement another package by the same name!)

No comments:

Post a Comment