Sunday, June 23, 2013

Reductions in practice: Semantic interpretation

So: last time, we talked about reductions in computer science. We certainly didn't exhaust the topic, but we at least got a definition on the board.

Now, there was one misleading thing that I didn't quite come out and say, but pretty strongly implied: I made it sound like the reason we would find a reduction of problem \(P\) to problem \(Q\), is that we have (or pretend we have for the purposes of proving something) a solution to problem \(Q\), and use that solution to give a solution to problem \(P\).

That is certainly one way to use a reduction. But, and this may come as a surprise, it's far more common to use reductions in another way: to show that problem \(Q\) is hard, because we can reduce problem \(P\) to problem \(Q\) and we somehow already know that \(P\) is hard to solve.

If this sounds like nonsense, I have a different way to describe this idea. The original explanation I gave can be thought of as saying
if there's a reduction from problem \(P\) to problem \(Q\), then the difficulty of problem \(Q\) is an upper bound for the difficulty of problem \(P\)
(whatever we precisely mean by "difficulty"). This new way of thinking about it then becomes
if there's a reduction from problem \(P\) to problem \(Q\), then the difficulty of problem \(P\) is a lower bound for the difficulty of problem \(Q\)
So they really aren't different ideas.

But one of the really annoying things about theoretical computer science is that a lot of problems lend themselves to having lower bounds proved about them in a very pretty, abstract way, while the way to show an upper bound is, basically, to write down the actual algorithm that achieves some upper bound on difficulty (again, whatever that word actually means).

In the current research project I'm involved in, for example, "easy" means "capable of being computed by an abstract computer". The question we're looking at has to do with when a universal-algebraic variety has an "easy" formal theory; however, if you look at the theorems we actually prove, they're almost all of the form "if such-and-such a structural property holds in an algebra, then the variety it generates has formal theory which is as hard as it logically could be".

Believe me, I'd love to have a theorem which shows that an interesting class has decidable formal theory. But those theorems aren't cheap. (As I mentioned, I spent most of last week's conference huddled with a senior colleague who thinks he has a road map for just such a proof... but right now it's just a treasure map with a lot of blank swaths.)

All of which is to say that the first example of a reduction that I wanted to share is one of this type: not one that succeeds in making a problem easy (by reducing it to a solved problem) but one that establishes that a problem is hard.

That's not the goal of this post, however. There's a lot of plumbing that needs to be done first -- not terribly exciting stuff, but without it you're not going to have happy guests. Specifically, I want to talk about the nuts and bolts of a formal theory, and how one would go about building a function which takes formulas from one theory and spits out formulas in a different theory written in a different language.

A Crash Course on First-Order Languages

The strength of a formal language is, paradoxically, its weakness: there are some things which it's possible to express in that language, and there are often a lot of things that are impossible to express. The strength comes from the fact that (if everything is well designed) everything which it is possible to express is either true or false (at least, once all the statements are interpreted as referring to mathematical objects).

What do I mean by that? Let's take a very simple example: the statement \[ x < y \]Now, as it stands, that statement is neither true nor false, but that's because neither \(x\) nor \(y\) stands for an object.

There's one more obstacle, though: not only are \(x\) and \(y\) currently "open" variables, but so is the symbol "\(<\)". If I told you that you should interpret \(x\) as 7 and \(y\) as 17, then it'd be pretty safe to say the statement would be true. But what if I told you that \(x\) is the string "aardvark" while \(y\) is the string "blabber"? What should the meaning of "\(<\)" be? (I know of at least two reasonable interpretations: dictionary order, where "aardvark" does precede "blabber", and prefix order, under which neither string precedes the other -- they're just incomparable.)

If we're going to strip everything down to the very basics, then, we start with symbols intended to convey basic assertions in our formal language. Traditionally, "=" is required to be one of these and all others are optional. Let's agree that our first formal language will have just one more assertion symbol, \(R(x,y)\). What does \(R\) mean?

That's the thing: we have complete freedom over what \(R\) means. Here are some possibilities:
  • \(x\) and \(y\) are users of a network, and \(R(x,y)\) asserts that \(x\) can send data to \(y\);
  • \(x\) and \(y\) are strings, and \(R(x,y)\) asserts that \(x\) is a prefix of \(y\);
  • \(x\) and \(y\) are real numbers, and \(R(x,y)\) asserts that the complex number \(x + iy\) is a zero of the Riemann Zeta Function.
You see the pattern: we specify some underlying set of objects (respectively network users, strings, and real numbers in the examples above), and then define when the assertion \(R(x,y)\) is true or false for \(x\) and \(y\) in that underlying set. Once the definition is in place, for every particular choice of \(x\) and \(y\) in the underlying set, the assertion \(R(x,y)\) is either true or false.

Such a definition (an underlying set, together with a specification of what our symbol \(R(x,y)\) means) is called a (first-order) structure. I like to write a structure like this: \( \mathbf{S} = \langle \text{underlying set}; \text{relation symbols} \rangle \).

And once we can assert a "simple statement", that is, \(x = y\) or \(R(x,y)\), we can assert a few kinds of compound statement: if \( \varphi \) and \( \psi \) are statements, then
  • NOT \( \varphi \) is true iff \( \varphi \) is false, and vice versa;
  • \( \varphi \) AND \( \psi \) is true iff both \( \varphi \) and \( \psi \) are true individually;
  • \( \varphi \) OR \( \psi \) is true iff at least one of \( \varphi \) and \( \psi \) is true;
(These are the so-called "boolean" connectors.) Lastly, if a variable, say \(x\), appears "free" in \(\varphi\) (for example, both \(x\) and \(y\) are free in \(R(x,y)\)), then we can assert
  • FOR ALL \(x\), \(\varphi\) and
  • THERE EXISTS \(x\) SUCH THAT \( \varphi \)
which are true or false depending on what happens when we substitute possible objects from the underlying set in for \(x\).

That's all the rules: every statement in our formal language gets built from simple statements and these five connectives. (For example, if we wanted to assert that some formula \( \varphi \) implies some other formula \( \psi \), we would write \( \psi \) OR (NOT \( \varphi \)), which is traditionally abbreviated \( \varphi \rightarrow \psi \).)

Definable Subsets

Once we have this formal language, what might we do with it? Here are two possible related problems:
  • Fix a structure \( \mathbf{A} = \langle A; R \rangle\), and any subset \( Z \subset A^k \) for some \(k\). Is \[ Z = \{ z \in A^k \colon \varphi(z) \} \]for some formula \( \varphi(x_1, \ldots, x_k) \) in the formal language?
  • Fix a family of structures \( \mathcal{V} = \{ \mathbf{A}_1, \mathbf{A}_2, \ldots \}\), and any formula \( \varphi(x_1, \ldots, x_k) \) in the formal language. What can we say about the sets \[ Z_i = \{ z \in A_i^k: \varphi(z) \} \]Do they have some kind of common structure? Some interesting geometry? (Of course, we would need to have some notion of geometry in the original structures in \( \mathcal{V} \), but that's pretty common already.)
A subset defined by a formula in the language is called, originally enough, a definable subset.

As an example of the first problem type: let's go back to a set we mentioned before: \[Z = \{ (x,y) \in \mathbb{R}^2 \colon \zeta(x + iy) = 0 \}\]the set of zeros of the Riemann Zeta Function. Is this a definable subset? Of course, this depends on what our basic language is, so let's say that the language allows us the following simple statements: "\(x + y = z\)", "\( x \cdot y = z\)", "\(x \leq y\)", and "\( e^x = y\)". It is a celebrated theorem in model theory that, while we can say quite a lot in this language, our set \(Z\) is not definable.

Semantic Interpretation

If you're still with me here, I'm amazed and grateful, because the previous few paragraphs are really the stuff of up to a month of work in a logic course. Let's get some payoff for that work!

The setup: we have two classes, \( \mathcal{V} \) and \( \mathcal{W} \), each in a first-order language. (Note: probably not the same first-order language.) For simplicity, let's say that the language of \( \mathcal{V} \) has just the symbol \( R(x,y)\), like we were using above, while the language of \( \mathcal{W} \) has other symbols, that we won't specify.

Second, for technical reasons that I'll explain later, we need to assume that \( \mathcal{V} \) is finitely axiomatizable (that is, there are sentences \( \alpha_1, \ldots, \alpha_n \) in the language of \( \mathcal{V} \) such that a structure is in \( \mathcal{V} \) if and only if all of the \( \alpha_i \) are true in that structure).

And last, we assume that we have somehow found two special formulas \( \pi(x) \) and \( \rho(x,y) \) in the language of \( \mathcal{W} \), with the following property:
For each \( \mathbf{A} \in \mathcal{V} \), there is
  • a \( \mathbf{B} \in \mathcal{W} \) and 
  • a one-to-one function \(j\) from \(A\) into \(B\), such that
  • \( \pi(b) \) is true (in \( \mathbf{B} \)) iff \(b = j(a) \) for some \(a \in A\), and
  • for any \(a_1, a_2 \in A \), \(R(a_1, a_2) \) is true (in \( \mathbf{A}\)) iff \( \rho(j(a_1), j(a_2))\) is true (in \( \mathbf{B}\)).
Remember what we said before about discovering the "structure" of definable subsets? Well, this is a special case of that question: if all the properties above hold, then these definable subsets in \( \mathcal{W} \) form structural copies of everything in \( \mathcal{V} \)!

And now we can close the loop that we started at the beginning of this post: We're about to build a nontrivial reduction, from the set of formulas satisfiable in \( \mathcal{V} \) to the set of formulas satisfiable in \( \mathcal{W} \). What this will show is that if we know that we can solve the satisfiability problem in \( \mathcal{W} \) -- that is, if there is an algorithm that tells us whether a formula \( \varphi(x) \) is compatible with \( \mathcal{W} \) -- then we can write an algorithm to solve that problem in \( \mathcal{V} \). Or, turning things inside out, if we somehow already know that the satisfiability problem in \( \mathcal{V} \) is not solvable by any algorithm, then we conclude that the same is true for \( \mathcal{W} \).

To do that, we need to define our reduction: that is, our function \(f\) from \( \mathcal{V} \)-formulas to \( \mathcal{W}\)-formulas. Most of the work will be done by a helper function \(g\), which we define by recursion, starting from simple statements:
  • if \( \varphi \) is \( x = y \), then \(g(\varphi)\) is \( \pi(x) \) AND \(\pi(y)\) AND \(x = y\);
  • if \( \varphi \) is \( R(x,y) \), then \( g(\varphi) \) is \( \pi(x) \) AND \( \pi(y) \) AND \( \rho(x,y) \);
  • if \( \varphi \) is NOT \( \psi \), then \( g(\varphi) \) is NOT \(g(\psi)\);
  • if \( \varphi \) is \( \psi_1 \) AND \( \psi_2 \) (resp. \( \psi_1 \) OR \( \psi_2 \)), then \( g(\varphi) \) is \( g(\psi_1) \) AND \( g(\psi_2) \) (resp. \( g(\psi_1) \) OR \( g(\psi_2) \);
  • if \( \varphi \) is FOR ALL \(x\), \( \psi \) then \( g(\varphi) \) is FOR ALL \(x\), \( \pi(x) \rightarrow g(\psi) \);
  • if \( \varphi \) is THERE EXISTS \(x\) SUCH THAT \( \psi \), then \( g(\varphi) \) is THERE EXISTS \(x\) SUCH THAT \(\pi(x) \) AND \( g(\psi) \).
Now we define \(f(\varphi)\) to be \( g(\alpha_1) \) AND \( \ldots \) AND \( g(\alpha_n) \) AND \( g(\varphi) \) (where, we recall, the \( \alpha_i \) are the axioms for \( \mathcal{V} \)). Notice that if \( x_1, \ldots, x_k \) are the free variables of \( \varphi \), then those same variables are the free variables of \( f(\varphi) \).

Theorem: \(f\) is a reduction from the set of \( \mathcal{V} \)-satisfiable formulas to the set of \(\mathcal{W} \)-satisfiable formulas. That is, if \( \varphi \) is satisfiable in \( \mathcal{V} \), then \( f(\varphi) \) is satisfiable in \( \mathcal{W} \), and if unsatisfiable, unsatisfiable.

Proof: The direction "If \( \varphi \) is satisfiable in \( \mathcal{V} \), then \( g(\varphi) \) is satisfiable in \( \mathcal{W} \)" is proved by induction on the complexity of the formula \( \varphi \). The basic idea is to find a structure \( \mathbf{A} \in \mathcal{V} \), which is isomorphic to a definable substructure of some \( \mathbf{B} \in \mathcal{W} \).

In the other direction, we show the contrapositive: suppose that \( f(\varphi) \) is satisfiable in \( \mathcal{W} \). This means that for some structure \( \mathbf{B} \) and some \( b_1, \ldots, b_k \) corresponding to the free variables of \( f(\varphi) \), \( f(\varphi)(b_1, \ldots, b_k) \) is true in the structure \( \mathbf{B} \). By how we've defined \( f\), we know that \( \pi(b_i) \) is true for each \(i\).

Now, the formula \( \pi(x) \) defines some subset \(Z\) of \(\mathbf{B}\), and the formula \( \rho(x,y) \) defines some binary relation \(R\) on this subset. We think of \( \mathbf{A} = \langle Z; R \rangle \) as a structure in the language of \( \mathcal{V} \); but is this structure actually in \( \mathcal{V} \)? Because \(f(\varphi) \) asserts that \( g(\alpha_1) \), \( g(\alpha_2) \), etc are all true, it follows that \( \alpha_1\), \( \alpha_2 \), etc are true in \( \mathbf{A} \). Since these sentences axiomatize \( \mathcal{V} \), we answer yes, \( \mathbf{A} \in \mathcal{V}\)!

But now, since \( f(\varphi)(b_1, \ldots, b_k) \) is true in \( \mathbf{B} \), and \( g(\varphi) \) is one of the conjuncts in \( f(\varphi) \), we know that \( g(\varphi) \) is true in \( \mathbf{B} \). Now another induction on complexity shows that \( \varphi \) is true in \( \mathbf{A} \). QED.

(Exercise for the reader: if you haven't done a proof by induction on formulas before, complete the details of that.)

No comments:

Post a Comment