Software Engineering (Research): What's still wrong with code generation?

Generating code from models is fine, and can certainly save time compared with actually typing that boilerplate stuff that's obvious from looking at the model. Life gets interesting - and that's sometimes a euphemism - when you start editing the code, while still editing the model too, and you need to keep the two in sync. This is often called "round-trip engineering", but if I remember correctly, that's actually a trademark of Rational. Let's call it model-code synchronisation.

Typically, you'll develop a model in UML or some such, then you'll generate code from the model. The generated code will often have special comments of some kind scattered through it to show the tools which bits of the code are autogenerated, and there will often be a notion of some parts of the code "belonging" to the tool, which is interesting. If you add code, code written by you is distinguished from autogenerated code, and the tool is supposed to leave it alone. If you modify code that was autogenerated, you'll probably have to do something to indicate to the tool that you don't want your changes thrown away; maybe you delete one of its special comments. Now, what happens when you modify the model? If you add new structural elements, it's pretty simple: when you press the button, new code gets generated for them and it doesn't interfere (much) with what was there already. What if you delete a structural element from the model? If you do so before any hand-written code has been added that's "owned" by this element, no problem: the autogenerated code should be deleted too. If you have hand-written code for it, then deleting the element from the model is a slightly surprising thing to do. What should the tool do about this? It could restore consistency to the world by deleting the code corresponding to the deleted model element, and in the end that may be the only sensible thing to do. Maybe, though, the presence of that code indicates that more work needs to be done? Maybe it even indicates that deleting the model element is a mistake? It might be better for the tool to indicate in some way that this was a surprising thing to do.

In this particular case, we can probably cover it by something like:

enforce a hard separation between code "owned" by the tool and code that "should not be touched" by the tool;
any time the tool needs to change or delete code that "should not be touched", do something special, e.g., require confirmation.

However, it's not hard to go on inventing more and more puzzling cases, so we need a systematic way to think about what's reasonable behaviour and what's not.

Since code can be seen as a special kind of model - after all, it records some information about a software system - this is a special case of bidirectional model transformations, which I've been interested in for a while. In general we can think about a definition of model-code synchronisation as involving (a) a definition of what it is for the code and model to be in sync (b) a pair of recipes for fixing it when they aren't. Let's assume that the person pressing the button has just finished editing either the code or the model, and doesn't want their work changed, so that synchronisation should change either the model, or the code, but not both. (How much it complicates things if we allow both to be changed deserves some thought, but the answer is probably "not much".) What should be true? Well, two things seem fairly uncontroversial:

if it happens to be the case that the code and the model are already in sync, then nothing should happen. I called this the transformation being hippocratic - "first, do no harm".
after the synchronisation, the code and the model should be in sync. I called this the transformation being correct, since the job of the transformation is to restore consistency.

The first of these deserves a bit of comment: isn't it automatic from the second? Well, not if there can be more than one model that's in sync with the same code, or vice versa. (That is, not if the consistency relation can be non-bijective.) Is that likely? Absolutely. For example, a UML model can include sequence diagrams for any parts of its behaviour, so there's an endless choice of model for the same code there. For an example the other way round, anything that's specified in code but not visible in the model can be changed without altering consistency, too.

Notice that in the process of thinking about that, it becomes obvious that there's a real choice about what "being in sync" means, too. How much model is there expected to be for a given code body? If there's a class diagram but no state diagrams, is that in sync, or should there have to be a state diagram for each (important?) class? Which classes are important, and what kind of state diagram? The right answers should, ideally, depend on the needs of the user: they won't be defined once and for all by the tool vendor.

Unfortunately, correctness and hippocraticness - nor their near ancestors, the basic lens laws of Foster, Pierce et al. - are not enough to ensure sensible behaviour. For example, the transformation that silently deletes the user's handwritten code to restore consistency when an element is deleted from the model can perfectly well be correct and hippocratic. More another time on what else we might require...

Software Engineering (Research)

Monday 4 August 2008

What's still wrong with code generation?

No comments:

Blog Archive

About Me