Software Engineering (Research): September 2010

Tuesday 21 September 2010

Software testing and a talk by Peter Dybjer

Another thing I wanted to write about and not forget was a lovely LFCS seminar I heard on Tuesday by Peter Dybjer. It was entitled "Program testing and constructive validity" and was about applying insights from software testing to the problem of validating judgements in formal systems, specifically Martin-Löf type theory.

Of course, I was one of a handful of members of the audience more comfortable with testing than with type theory, and I was wondering whether there is anything that can be taken back the other way: can we gain any insight into software testing from this? In the end I think probably the answer is "probably not really" (and of course, it would not have been reasonable to expect anything else), but I am thinking about testing anyway because I'm very much looking forward to a Dagstuhl workshop on Model-based testing that's taking place in a few weeks' time, so it was interesting to see. The rest of this post has a little background on model-based testing, and then a few words on the way Peter used testing and possible connections.

The basic idea of model-based testing is that you make use of a model of the system's behaviour in testing the actual system (the "system under test", SUT). For example, you might have a finite state machine that is supposed to capture the interaction between the system and the outside world, in some suitable sense, and then you analyse the FSM in order both to generate a sufficient set of concrete tests and to check the system's output. A testing model could be a grammar, or a UML model, or what you will, provided that it can somehow express requirements. Given a suitable testing model, test generation, running and results checking can be done automatically, and the hope is that the effort of building or deriving the models for MBT is small compared with the effort of writing or generating tests in other ways. Of course if the testing model is wrong - that is, its behaviour is not what the customer actually requires - we are in trouble; generally, any discrepancy shown up by running a test is supposed to expose a bug in the SUT. If, however, we knew that the SUT was perfect, we could equally well regard a discrepancy as exposing a bug in the testing model. (And hands up everyone who's ever found that a test failed because there was a bug in the test...) The point is that running one test and seeing the expected result is a simple experiment that fractionally, Bayesianly, increases our confidence that the testing model and the SUT are consistent in some appropriate sense; and more importantly, running one test and not seeing the expected result not only reduces our confidence of that, but gives us a strong hint about where in the models to look for the source of the discrepancy. (You would not pay much to a testing company that would run your tests and tell you afterwards how many failed, but not which ones.)

In Peter's talk, the thing under scrutiny is a logical system: a collection of rules for how to construct judgements such as "under these assumptions, a and a' are equal as members of type A". The issue is: how do you convince somebody that your system is correct? Of course there are different kinds of conviction that might be required. Somebody might want to be convinced that the system as a whole has some desirable property, such as not contradicting itself (this is what Peter called a meta-mathematical issue). Someone else might want to be convinced that the rules of your system are individually true, and you might want, as far as possible, to address each rule separately, without depending on their understanding your system as a whole, and without relying on complex mathematical gadgets in your argument. This is what Peter called a pre-mathematical approach: if we do it successfully, then the conviction produced helps to provide a foundation for the system, a reason for considering it reliable as a way of answering questions. The trouble with complex mathematical gadgets, and what he is trying to avoid in this work, is that results produced using them are only as reliable as they are themselves. We may recall the famous Knuth quotation "Beware of bugs in the above code; I have only proved it correct, not tried it."

One traditional way to convince someone that a logical system is correct, which does require a complex mathematical gadget, is to explain it using a realizability model. This is basically where you provide a relationship between things in the system and things the person you need to convince already has a good understanding of and knows how to compute with. You demonstrate that everything in your system has at least one related well-understood thing, and that all the computations that can happen in your system correspond to computations that can happen in the well-understood place. This is fine provided that you have a realizability model that you're happy to accept you understand, but it rests complete on that, and there isn't a way to accept the realizability model bit by bit: you have to swallow it whole.

A testing model, although it can be used to generate dynamic tests together with their expected results, is a static object: it captures a whole set of requirements on the SUT. Running, and checking the result of, an individual test, on the other hand, is a dynamic procedure. This seems a little bit like: a logical system is to a realizability model as a testing model is to the SUT. You produce a test from the testing model, together with its expected results. To find out whether it passes or fails, you have to run it on the SUT. You could, of course, use the same testing model against two different versions of the SUT, run the same test, and find that on one SUT it passed and on the other it failed. In Peter's world, you produce a test from the logical system, but there doesn't seem to be an explicit notion of a model against which the test is run, such that different models might give different results. I get a bit confused here, but I think this is because here it is the logical system itself which is under scrutiny: a discrepancy (between it and the single, implicit model) would be taken as a bug in the logical system itself. Peter makes a strong distinction between the meta-mathematical semantics of a system, in which realizability models live, and the pre-mathematical world in which his testing approach lives. In the pre-mathematical world, the real world does duty for the SUT.

An interesting aspect of Peter's talk which might have applicability in the MBT world is that a running test can get stuck for lack of suitable data needed during the course of its execution, in which case it solves its problem by going back to the logical system to generate that data on demand, rather like the questioning of the questioner that happens in game semantics. This obviates the need to have a complex structure against which to test - all the complication is in the logical system. In MBT, the sort-of-equivalent would be that instead of generating a single-level test and running it to completion against the SUT - which, in the case of a test that invokes complex computation, may lead to difficulty in debugging the SUT if the test fails - one might generate a test which is somehow able and instructed to invoke the testing model's version of certain functionality - which obviously has to be functionality actually present in the testing model, not something deep in the design of the SUT - when required. This is just using the testing model to stub out parts of the system's functionality, but I wonder whether anyone has explored using this approach to increase the efficiency of MBT. There would be some engineering problems in making it practical, I suppose, and I don't know whether it would be worthwhile.

Anyway, thanks to Peter for a very thought-provoking talk, and to John Longley for explaining to me (again) what a realizability model is. All errors are mine, and I apologise for them.

Wednesday 15 September 2010

On sequential composition of bidirectional model transformations

This is a partial brain dump prompted by being part of the way through reading this paper by Martin Hofmann, Benjamin Pierce and Daniel Wagner. It's also influenced, naturally, by interesting conversations with Benjamin and various other people over the years. ObDisclaimer: this is a blog post, not a paper, and it does not contain a related work section. Comments and pointers to work I might not know about most welcome! This post may still change.

First a quick introduction for anyone who doesn't already know, following the presentation here because it's my blog: a bidirectional model transformation is a program which allows you to keep two (or more, but two is enough for now) models consistent, when either of them may change. So such a transformation embodies a notion of consistency:

T \subseteq M x N

(just one notion of consistency, not two directional notions) along with a pair of functions

Tforwards : M x N -> N

which tells you how to modify a model from N if you take an old version of it together with an authoritative m in M, and dually

Tbackwards : M x N -> M

In some particularly simple cases, we may be able to do without one of the arguments in one or both of these functions; for example, if N is an abstraction - a view - of M, containing no information not also present in M, then the transformation does not need to consult an old version of n in N and the N argument is redundant. In general, however, either model may contain information that the other does not.

One goes on to ask what constraints must hold on T, Tforwards and Tbackwards to make them form a "reasonable" bidirectional transformation. For a start, the functions have to be "correct" - that is, they must succeed in enforcing consistency - and "hippocratic" - that is, if their arguments are already consistent, then the function must not propose any change. This doesn't feel like enough, however: the problem is that a transformation can be correct and hippocratic and still have the property that (almost) "all bets are off" if its arguments are not consistent to start with. In that case, a transformation which only has to be correct and hippocratic is at liberty to throw away all information from the old model and just pick its favourite from all the models that are consistent with the given authoritative model. Intuitively this is not sensible: although there are real decisions for the transformation writer to make - there can certainly be multiple sensible choices of how to restore consistency - we do expect a transformation, whose job is to restore consistency, to do its job without avoidable vandalism.

There are various ways to handle this; one can put extra constraints ("undoability", "PUTPUT" in the asymmetric case) on the transformation, but these tend to be considered too strong; or if transformations are built up from transformation elements as in bidirectional programming, one at least intuitively gets something like this for free (continuity..). Meertens discusses the use of metrics; I think there may be more to be said, though, at least in the context of models specifically, and perhaps more generally.

What has any of this to do with sequential composition? Well, as good little computer scientists, we naturally want to be able to compose things once we've defined them. Consistency is a relation, and we know how to compose relations; it's natural to say that the notion of consistency captured by T1;T2 should be

(T1;T2)(m,p) iff exists n s.t. T1(m,n) and T2(n,p)

This immediately exposes the problem for defining the corresponding transformation functions: the apparent need to "guess" an element of the elided missing model type. Hmm. Let's try to be good little software engineers instead, take a step back, and think about why we might want to be able to compose bidirectional model transformations, in order to get some requirements for a notion of composition.

In model-driven development, we may have a chain, or even a network, of models describing different aspects of the world and/or the artefacts we are building, each being worked on by different people. At certain points in time, e.g. software release dates, we will want at least a certain collection of these to be guaranteed consistent, in the sense that there exists a model of them all in the model-theoretic sense, viz., the actual software in the world which should be consistent with all (or a specified subset of) the models in the MDD sense. An individual bidirectional transformation captures the correctness of some pair of models. Imagine a graph with model types as nodes and bidirectional transformations as edges.

We would certainly like it if overall consistency meant nothing more nor less than the conjunction of the notions of consistency captured by each of the transformations. We'd also like confluence: if our models are inconsistent, there may be different sequences in which we can apply our bidirectional transformations so as to restore consistency, and we'd like it if they all gave the same result. A little more precisely: suppose we label some of our nodes as authoritative i.e. not to be touched, and others as result. Now imposing consistency means applying the transformations in some order, without ever applying a transformation in the direction of an authoritative model, until nothing changes - that is, the whole set of models is consistent - and then looking at the result model(s). It's only too easy to imagine doing this blindly at 3am when a release is overdue: wouldn't it be nice to know that the result model(s) will depend only on the set of models we started with, and on which of them we deemed authoritative, but not on the order in which we applied the transformations? Unfortunately, a moment's thought shows that this is unachievable without imposing restrictions on the transformations (undoability, or PUTPUT in the lens world) which are generally thought too strong.

An important special case is applying a transformation when the non-authoritative model does not exist yet: model sets can be equipped with a special no-information model to act as the argument in this case. In such a case, the transformation functions should compose just as the ordinary functions they are.

An interesting aspect of this kind of "composition" is that all the models involved may have equal status: in particular, they all actually exist, perhaps even in the same CMS if you're really lucky.

There is at least one other scenario in which we may wish to compose bidirectional transformations, though, and it's pragmatically a bit different. Writing bidirectional transformations correctly is hard work, and there is every chance that you will want to reuse a transformation written by someone else. If you just want straight reuse, well and good, but more likely, you will want to customise the transformation that you buy in or otherwise reuse. IIRR (can't be bothered to check right now) this ability to customize transformations was one of the requirements in the OMG document that led to the QVT standard - however, the standard does not address it, at least for QVT-R.

There are various kinds of customisation you might want to do, of course - perhaps you want parameterised or higher order transformations - but let's suppose that what you want is related to sequential composition in the sense that you want to do one or both of:

- guddling systematically with your models before they are given as input to the reused transformation

- guddling systematically with the output of the transformation, before continuing to use the output model.

First let's consider the easiest case, in which the guddling of a model doesn't depend on any other model and doesn't alter the type of the model. Given T on M x N and f : M -> M we can define

(f;T) (m,n) iff T(f(m),n)

(f;T)forward (m,n) = Tforward (f(m),n)

(f;T)backward (m,n) = Tbackward (f(m),n)

In fact, looking at this we can see that it doesn't matter if f : M' -> M; provided the target of f matches the appropriate input type of T we'll be fine.

Dually, we can define T;g where g : N -> N' : no problem.

My guess is that this is actually enough to meet most customisation needs, but suppose not. Maybe we buy a nontrivially bidirectional transformation

S: FavouriteObscureModellingLanguage <-> UML

from the vendor of our FavouriteObscureModellingLanguage tool, and another nontrivially bidirectional transformation

T: UML <-> RDBMS

from our database vendor, and suppose we don't care about UML and there's noone in our company who speaks UML, but we want to be able to work on our model in FavouriteObscureModellingLanguage and on our RDBMS schema and keep those in sync. If we are happy to generate an initial UML model from one end or the other and keep it around, then we're back in the original case - it just so happens that the UML model will only change when a transformation tells it to. But what if our CMS expert is a vehement hater of UML and will not permit a UML model to sully his pristine repository? Are there interesting conditions under which we can still somehow use S and T to give us a composed

S;T: FavouriteObscureModellingLanguage <-> RDBMS ?

Obviously, since none of us ever touch UML, the absence of the UML model isn't really depriving our company of any information. In particular, provided we have all previous versions of the models we do touch in our CMS, together with a timeline on when transformations were run, we could in principle just rebuild the latest UML model by reapplying all the previously run transformations, use it to apply this transformation, then throw it away. Storing and using previous versions is pretty much like storing and using edits, and this is closely related to what the graph transformation people do: parse the input graph to see where it came from - hoping fervently that the parse is unique - then do the same on the other side.

What makes this impractical is that to rerun stage n of the transformation - say, the one that ended up modifying r : RDBMS to r' to be consistent with authoritative f : FavouriteObscureModellingLanguage - we really do need to construct the very UML model we had at the end of that transformation stage. It won't do to somehow guess any old u : UML s.t. S(f,u) and T(u,r') and hope. For if the one we really had was u', it could well be that Tforward(Sforward(f',u'),r') is not the same as Tforward(Sforward(f',u),r') which would mean we're stuffed next time we try to amend r' in accord with authoritative f'. Indeed, our attempt at composition could fail to be hippocratic, because it could happen that S(f',u') and T(u',r') but one of those fails with our u in place of the genuine u'. Maybe there are some friendly conditions under which this can't happen (beyond the lens case) but I don't know any offhand. It might be useful to compare "keep as much information as we possibly can" vs "keep only the information we're forced to keep" policies for choosing u.

Still to come: relating this properly to the paper that prompted this post, or, come to that, to my own previous work on an algebraic approach to bidirectional transformations.

Wednesday 8 September 2010

Combinatorial species

I'm reading an article (to appear in proceedings of Haskell'10) on combinatorial species by Brent Yorgey. Very interesting, although as I don't speak Haskell, the explanations were sometimes harder to understand than the things they were explaining. I did offer to tutor Haskell this year in the hope that I'd learn it, but I doubt it's going to happen. For purposes of reading this paper, the main thing you have to know, e.g. from here is that a "class" is an interface, and an "instance" of a "class" is a thing that implements the interface! I do hope they don't cover this aspect of Haskell in the course that precedes our first Java course...

Software Engineering (Research)