Like beer, the cause and solution of all of life's problems is extensibility. Since we don't want to build the world from scratch every morning, we asked god to allow us to save our work. And since a million monkeys on a million typewriters will eventually result in two of them both titling some function
foo(x), god decided to avoid future headaches by spending the eighth day creating namespaces.
OK, so the basic problem is like this: Alice has written some code which creates virtual widgets. Bob finds that he has a use for widgets just like those Alice has written. How does Bob get Alice's widgets to exist on his system?
Method 0: Alice has written her widgets so that their only dependencies are the Python standard library; she has put all the code in one file,
"""This is widgetsbyalice.py.""" class Widget(dict): def __init__(self,X): ...
Then all Bob has to do is get the file
widgetsbyalice.pyfrom Alice (say, by downloading it from Alice's website), and save it into some directory on the Python path. Then to create a widget, Bob enters the following in his Python terminal or into a script of his own:
import widgetsbyalice w = widgetsbyalice.Widget(...)
So far so easy. The suite of objects created by running the code in
widgetsbyalice.pyis called a module; the import command populates these objects into Bob's system, in a sub-namespace
widgetsbyalice.objectname. (Note: as far as I can tell, it is not correct to refer to the file
widgetsbyalice.pyas a module; the module itself is the object in Python's virtual world, not something external to the Python interpreter, such as a file on disk or an actual memory location.)
OK, great, but most of the time, something useful enough to be shared around is also too big to live in one file. Let's say that Alice has written modules implementing
WidgetC, etc. Now, not every user will want to use every one of these widgets, so we'd like a way to import only those widgets that we're actually going to use, but ensuring that the distribution of all the modules together. The first goal suggests that Bob simply download a directory of modules; but the second says that there should be some kind of abstraction superordinate to those modules. Never fear, Python supports bundling a bunch of related modules into a package. The basic abstract way that could look here is, the package name (
widgetsbyalice) becomes the name of a subdirectory of a directory on the Python search path; that directory contains a script
__init__.pywhich is run the first time a module in the package is imported (the presence of this script tells Python that this directory is a package. Is? Contains? Represents? Ugh, I don't know.) Alice might put this whole directory on her website file-by-file, with the instruction that Bob download them and arrange them into the correct file structure; or more likely, she'd put them into a tarball or other archive and say "extract this into a directory on your path". In any case, once these files are in place on Bob's machine, he should be able to
import widgetsbyalice.WidgetB w = widgetsbyalice.WidgetB(...)or
from widgetsbyalice import WidgetB w = WidgetB(...)
Now, this process of creating a directory in the path is all that needs to be involved in "installing" a Python package. However, there are a number of helpful tools and systems that are designed to automate this, add some helpful layers, etc. (For example, doing version control and updates, etc., etc., ...) If you started off, as I did, with ActivePython, the package manager that comes bundled is called PyPM; there's a native tool called easy_install that does some of this too; and an increasingly popular tool called pip that I haven't used at all, so darüber werde ich schweigen.
One doesn't even need to use any of these installer systems at all; the standard Python package distribution utility (read about it in Dive Into Python) takes a directory (eg, the files extracted from a tarball) containing a script
setup.pyand uses the standardized command
python setup.py install
at the main shell to write all the package files to the right system directories.
However, some of this is still iffy for me. In particular, it's not clear to me what, exactly, a package is from the point of view of the Python interpreter. I'll illustrate with two examples, one of which was the proximate inspiration for this post.
Example 1: There exists a lightweight graph package called Gato. To install Gato, one downloads
Gato-x.y.z.tar.gz, unpacks it to a convenient directory, navigates the main shell to that directory, and types
python setup.py install
The result of this command is the following: there is now a directory named Gato inside
Python/Lib/site-packages; and when one opens up Python 2.7 (Gato hasn't been ported to Python 3),
results in an object called
Gato.Graphwhich is (a copy of) that module in the Gato package. Note that there is also now an object (of type
Gatoin the main namespace! Note that when I enter the above command in Python, Gato's
__init__.pyscript is run from the directory /Lib/site-packages/Gato.
This example is how all this "is supposed" to work, at least the way I understand Mark Pilgrim and the Python Tutorial. But consider this second example:
Example 2: in order to use the IPython IDE, which has a lot of nice features, one has to first install the distribute package, a new package for (surprise!) distributing and managing packages. (The package is designed as a fork of the existing
setuptoolspackage.) So I did the exact same thing for distribute that I did for Gato: I downloaded
distribute-0.6.30.tar.gz, unpacked it to my installation files directory, (which creates a subdirectory called
distribute-0.6.30), cd into that subdirectory, and
python setup.py install
just as before. Except now here's what happens: a directory named
distribute-0.6.30-py2.7.eggis created in
Lib/site-packages, which contains no
from distribute import *
both fail, claiming that Python doesn't know about this 'distribute' fellow. I spent about half a day beating my head up and down, trying various download and install schemes, getting the same errors. I eventually filed this bug report on the package's development page; the response I got was dismissive and snippy: "
distributeis the name of the package. The module to import is
Lo and behold,
succeeds (no surprise there; there was already a
setuptoolsmodule that Python knew about) and
reveals that the setuptools importer script
__init__.pyis found in the
distribute-0.6.30-py2.7.eggdirectory created in the last step! After further digging, it appears that the distribute
setup.pyscript put that .egg folder on my
sys.path, and Python finds
setuptoolsthere (rather than wherever it was finding the old, pre-"distribute" version).
But now I'm mystified: in what sense is
distributea package at all? I can see calling it a "project", and calling "setuptools" a package, but what is gained by this arcane process of creating a directory with an oviparous name and hiding packages inside there?
Oh, and also, the questions in the title: how the fuck was I supposed to test whether distribute had installed correctly? You know how I knew Gato had installed correctly? I imported the fucking thing! I could run unit tests if I wanted to, or create a Hello World graph. But what should I have done for
distribute? In particular, without filing that bug report, how the hell would I have known to just try to import a module name that is not mentioned once on the package's page? (The page says it's trying to replace
setuptools, not implement another package by the same name!)