Monday 11 August 2008

Expressing bidirectional transformations

As part of the preparation for my invited talk at the International Conference on Graph Transformations I have been trying to understand the relationships between triple graph grammars, lenses and QVT. In particular, how does their expressivity compare, and how does it relate to the general notions of computability and coherent transformations? Some useful email from PC chair Reiko Heckel made me realise that I was failing to think carefully enough about basic notions.

A function f : M -> N is called computable if it can be implemented by a Turing machine; it turns out that this is a very robust notion, so that, for example, you can say "it's computable if you can write a program that computes it" and it basically doesn't matter what programming language you're thinking about. In particular, expressivity for actual programming languages is something of a non-question: they are all equi-expressive.

Is the situation equally boring for bidirectional transformations between M and N? We have to decide what M and N are, but that's probably not very important, any more than it's important to know what the type of our ordinary function was. Any of the bidirectional languages we've been thinking about can be looked at as means of specifying three things:
  1. the consistency relation, a subset of M x N
  2. the forward transformation, a function M x N -> N
  3. the backward transformation, a function M x N -> M
Each of these three elements can be considered as an ordinary function, and then we can ask whether the corresponding function is computable or not. If you wanted to write your bidirectional transformation in your favourite ordinary programming language, you could write three programs: the first would take elements of M and N and return true if they were consistent, else false, the second would take elements of M and N and return an element of N, etc. Since your favourite programming language can express all computable functions, and since we can't hope for any more (because if we had a bidirectional transformation language which could express something with an uncomputable component, we could use it as an ordinary programming language by disregarding everything else), this gives us a notion of computable bidirectional transformation, and any standard programming language can express all computable bidirectional transformations. Boring.

Next attempt: the point about bidirectional languages is that you don't write separate programs for those three elements: you write one text, which somehow encodes all three. There's something deeply unsatisfactory about saying "yes, my language can express bidirectional transformations, you just [do something equivalent to writing the three things separately]". Formally, what exactly is unsatisfying? On the face of it, the notion of "one text to express all three things" is not really a formal notion, but a pragmatic one. The point of it is to ease the software engineering process. In order for a bidirectional transformation to be sane (whatever that means) the three elements have to stay consistent. For example, one kind of sanity, discussed last time, is correctness -- after applying a transformation (defined in 2. or 3.) we expect consistency (defined in 1.) to be restored. So, if we change the definition of consistency (1.) we may well have to change the transformations (2. and 3.) to make sure that the whole bidirectional transformation stays correct. Busy software engineers are not very good at remembering that when you do X you also have to do Y. (Let X be "update the code" and Y be "update the documentation" for example...) So the name of the game is to remind the engineer about Y when they do X; or, to be a bit more ambitious, to make it impossible for them to do X without also doing Y. For example, it's better to have a variable called widgetsInStock (and no comment) than to have a variable called x and a comment "x is the number of widgets in stock".

No mention of "same text" yet -- specifying several things in one text is just one possible way to achieve that aim. At the extreme, if we complain to our Java programmer that we don't like the use of Java as a bidirectional programming language, because the three things are specified in different methods, we will not be much happier if the programmer modifies their bidirectional transformation class to have only one method which begins with a switch statement...

What the language of lenses, and to a lesser extent the other languages like TGGs and QVT, do, is in effect to limit the expressivity of the language to only those triples that are well-behaved in some language-dependent sense. In Focal, for example, it's a key part of the language design to ensure that the basic lenses satisfy some lens laws, and that the ways of composing basic lenses into compound lenses preserve those lens laws. The idea is that it is not possible for the lens programmer to write a bidirectional transformation which is insane, for Focal's idea of sanity. Even, maybe, that it isn't possible for the programmer to think about writing an insane program!

That's interesting. Given a language which can express all computable bidirectional transformations -- including the many insane ones -- an obvious question is "is it decidable whether a transformation is sane?" Excuse me while I demonstrate that it's too long since I tutored Computability and Intractability, but it's at least tempting to think that the answer to that is always going to be No, by Rice's Theorem. Here's Rice's Theorem as stated in Papadimitriou's Computational Complexity, p62.

Suppose that C is a proper, non-empty subset of the set of all recursively enumerable languages. Then the following problem is undecidable: Given a Turing machine M, is L(M) in C?

To apply this, let's suppose we wrap up a given computable bidirectional transformation in a Turing machine which takes as input an indicator of what the machine is supposed to do (check consistency, apply a forward transformation, or apply a backward transformation) together with appropriate arguments for the task. If the machine is to check consistency, the appropriate arguments are a pair of models and a boolean: the machine should accept the input iff the boolean correctly indicates whether or not the models are consistent. If it is to do a forward transformation, the arguments are a triple of models, and the machine accepts iff the third model is the transformation's output on being given the first two; and dually.

Regardless of whether the transformation is sane or not, this is a reasonable definition of a Turing machine, so the language accepted by it is recursively enumerable. Modulo some boring coding, it'll have the form of a set of strings which might be
  • (consistent, M, N, true)
  • (consistent, M, N, false)
  • (forward, M, N, N')
  • (backward, M, N, M')
and there will be enough of those strings to specify the whole of the transformation.

Now let C be the subset of that set of recursively enumerable languages which represent sane transformations. The amazing thing about Rice's theorem is that we don't have to decide on a notion of sane: provided we ensure that there is some sane transformation, that's enough: the theorem tells us that it is undecidable whether a given Turing machine's language is in C or not.

Unfortunately, this isn't what we wanted to know. Given a TM, there's no reason to think that it'll be obvious whether or not it even is a wrapped-up bidirectional transformation at all; so it's not very interesting that it's undecidable whether or not it's a wrapped-up sane bidirectional transformation. That said, usually the kind of thing we did there is sorted out by some more boring coding, so I guess that "morally" Rice's thm is going to tell us that no non-trivial notion of sanity will be decidable on any language that can express all computable bidirectional transformations. I guess this with enough confidence that I can't muster the confidence to actually check right now...

Conclusion: that wasn't the right question. Onwards and upwards...

No comments: