Wednesday 15 September 2010

On sequential composition of bidirectional model transformations

This is a partial brain dump prompted by being part of the way through reading this paper by Martin Hofmann, Benjamin Pierce and Daniel Wagner. It's also influenced, naturally, by interesting conversations with Benjamin and various other people over the years. ObDisclaimer: this is a blog post, not a paper, and it does not contain a related work section. Comments and pointers to work I might not know about most welcome! This post may still change.

First a quick introduction for anyone who doesn't already know, following the presentation here because it's my blog: a bidirectional model transformation is a program which allows you to keep two (or more, but two is enough for now) models consistent, when either of them may change. So such a transformation embodies a notion of consistency:

T \subseteq M x N

(just one notion of consistency, not two directional notions) along with a pair of functions

Tforwards : M x N -> N

which tells you how to modify a model from N if you take an old version of it together with an authoritative m in M, and dually

Tbackwards : M x N -> M

In some particularly simple cases, we may be able to do without one of the arguments in one or both of these functions; for example, if N is an abstraction - a view - of M, containing no information not also present in M, then the transformation does not need to consult an old version of n in N and the N argument is redundant. In general, however, either model may contain information that the other does not.

One goes on to ask what constraints must hold on T, Tforwards and Tbackwards to make them form a "reasonable" bidirectional transformation. For a start, the functions have to be "correct" - that is, they must succeed in enforcing consistency - and "hippocratic" - that is, if their arguments are already consistent, then the function must not propose any change. This doesn't feel like enough, however: the problem is that a transformation can be correct and hippocratic and still have the property that (almost) "all bets are off" if its arguments are not consistent to start with. In that case, a transformation which only has to be correct and hippocratic is at liberty to throw away all information from the old model and just pick its favourite from all the models that are consistent with the given authoritative model. Intuitively this is not sensible: although there are real decisions for the transformation writer to make - there can certainly be multiple sensible choices of how to restore consistency - we do expect a transformation, whose job is to restore consistency, to do its job without avoidable vandalism.

There are various ways to handle this; one can put extra constraints ("undoability", "PUTPUT" in the asymmetric case) on the transformation, but these tend to be considered too strong; or if transformations are built up from transformation elements as in bidirectional programming, one at least intuitively gets something like this for free (continuity..). Meertens discusses the use of metrics; I think there may be more to be said, though, at least in the context of models specifically, and perhaps more generally.

What has any of this to do with sequential composition? Well, as good little computer scientists, we naturally want to be able to compose things once we've defined them. Consistency is a relation, and we know how to compose relations; it's natural to say that the notion of consistency captured by T1;T2 should be

(T1;T2)(m,p) iff exists n s.t. T1(m,n) and T2(n,p)

This immediately exposes the problem for defining the corresponding transformation functions: the apparent need to "guess" an element of the elided missing model type. Hmm. Let's try to be good little software engineers instead, take a step back, and think about why we might want to be able to compose bidirectional model transformations, in order to get some requirements for a notion of composition.

In model-driven development, we may have a chain, or even a network, of models describing different aspects of the world and/or the artefacts we are building, each being worked on by different people. At certain points in time, e.g. software release dates, we will want at least a certain collection of these to be guaranteed consistent, in the sense that there exists a model of them all in the model-theoretic sense, viz., the actual software in the world which should be consistent with all (or a specified subset of) the models in the MDD sense. An individual bidirectional transformation captures the correctness of some pair of models. Imagine a graph with model types as nodes and bidirectional transformations as edges.

We would certainly like it if overall consistency meant nothing more nor less than the conjunction of the notions of consistency captured by each of the transformations. We'd also like confluence: if our models are inconsistent, there may be different sequences in which we can apply our bidirectional transformations so as to restore consistency, and we'd like it if they all gave the same result. A little more precisely: suppose we label some of our nodes as authoritative i.e. not to be touched, and others as result. Now imposing consistency means applying the transformations in some order, without ever applying a transformation in the direction of an authoritative model, until nothing changes - that is, the whole set of models is consistent - and then looking at the result model(s). It's only too easy to imagine doing this blindly at 3am when a release is overdue: wouldn't it be nice to know that the result model(s) will depend only on the set of models we started with, and on which of them we deemed authoritative, but not on the order in which we applied the transformations? Unfortunately, a moment's thought shows that this is unachievable without imposing restrictions on the transformations (undoability, or PUTPUT in the lens world) which are generally thought too strong.


An important special case is applying a transformation when the non-authoritative model does not exist yet: model sets can be equipped with a special no-information model to act as the argument in this case. In such a case, the transformation functions should compose just as the ordinary functions they are.

An interesting aspect of this kind of "composition" is that all the models involved may have equal status: in particular, they all actually exist, perhaps even in the same CMS if you're really lucky.

There is at least one other scenario in which we may wish to compose bidirectional transformations, though, and it's pragmatically a bit different. Writing bidirectional transformations correctly is hard work, and there is every chance that you will want to reuse a transformation written by someone else. If you just want straight reuse, well and good, but more likely, you will want to customise the transformation that you buy in or otherwise reuse. IIRR (can't be bothered to check right now) this ability to customize transformations was one of the requirements in the OMG document that led to the QVT standard - however, the standard does not address it, at least for QVT-R.

There are various kinds of customisation you might want to do, of course - perhaps you want parameterised or higher order transformations - but let's suppose that what you want is related to sequential composition in the sense that you want to do one or both of:

- guddling systematically with your models before they are given as input to the reused transformation

- guddling systematically with the output of the transformation, before continuing to use the output model.

First let's consider the easiest case, in which the guddling of a model doesn't depend on any other model and doesn't alter the type of the model. Given T on M x N and f : M -> M we can define

(f;T) (m,n) iff T(f(m),n)

(f;T)forward (m,n) = Tforward (f(m),n)

(f;T)backward (m,n) = Tbackward (f(m),n)

In fact, looking at this we can see that it doesn't matter if f : M' -> M; provided the target of f matches the appropriate input type of T we'll be fine.

Dually, we can define T;g where g : N -> N' : no problem.

My guess is that this is actually enough to meet most customisation needs, but suppose not. Maybe we buy a nontrivially bidirectional transformation

S: FavouriteObscureModellingLanguage <-> UML

from the vendor of our FavouriteObscureModellingLanguage tool, and another nontrivially bidirectional transformation

T: UML <-> RDBMS

from our database vendor, and suppose we don't care about UML and there's noone in our company who speaks UML, but we want to be able to work on our model in FavouriteObscureModellingLanguage and on our RDBMS schema and keep those in sync. If we are happy to generate an initial UML model from one end or the other and keep it around, then we're back in the original case - it just so happens that the UML model will only change when a transformation tells it to. But what if our CMS expert is a vehement hater of UML and will not permit a UML model to sully his pristine repository? Are there interesting conditions under which we can still somehow use S and T to give us a composed

S;T: FavouriteObscureModellingLanguage <-> RDBMS ?

Obviously, since none of us ever touch UML, the absence of the UML model isn't really depriving our company of any information. In particular, provided we have all previous versions of the models we do touch in our CMS, together with a timeline on when transformations were run, we could in principle just rebuild the latest UML model by reapplying all the previously run transformations, use it to apply this transformation, then throw it away. Storing and using previous versions is pretty much like storing and using edits, and this is closely related to what the graph transformation people do: parse the input graph to see where it came from - hoping fervently that the parse is unique - then do the same on the other side.

What makes this impractical is that to rerun stage n of the transformation - say, the one that ended up modifying r : RDBMS to r' to be consistent with authoritative f : FavouriteObscureModellingLanguage - we really do need to construct the very UML model we had at the end of that transformation stage. It won't do to somehow guess any old u : UML s.t. S(f,u) and T(u,r') and hope. For if the one we really had was u', it could well be that Tforward(Sforward(f',u'),r') is not the same as Tforward(Sforward(f',u),r') which would mean we're stuffed next time we try to amend r' in accord with authoritative f'. Indeed, our attempt at composition could fail to be hippocratic, because it could happen that S(f',u') and T(u',r') but one of those fails with our u in place of the genuine u'. Maybe there are some friendly conditions under which this can't happen (beyond the lens case) but I don't know any offhand. It might be useful to compare "keep as much information as we possibly can" vs "keep only the information we're forced to keep" policies for choosing u.

Still to come: relating this properly to the paper that prompted this post, or, come to that, to my own previous work on an algebraic approach to bidirectional transformations.

No comments: