Well, OK... I didn't promise that this blog was readable, let alone correct. So here goes with some notes; caveat lector.
There are two related concepts.
- Data integration, aka federation, lets you answer queries over multiple data sources as though they were integrated. The data stays where it is, but a new schema is created, against which the user can write queries; something must explain how to translate the query and its results so that the query is done over the actual data sources with their actual schemas, and the results presented to the user in a consistent way.
- Data exchange, aka translation, really does move the data. You take some data with its original schema; you take a new schema; you explain how to fit the old data into the new schema.
The way this field talks about these problems is using schema mappings. A schema mapping is a triple (S,T,\Sigma) [how do I do Greek in this thing, or maths for that matter?! never mind for now!]. S and T are the source and target schemas - let's be thinking about 2. for now - and Sigma is a set of dependencies. There are two kinds of dependencies: source-to-target dependencies Sigma_ST, which model transformation people might think of as the specification of a consistency relation that forms part of a model transformation, and constraints Sigma_T over the target schema, which we might think of as part of the target metamodel. (We see already an example of the common phenomenon that related problems in different areas often put their concept boundaries in different places: another kind of translation problem...)
Then the data exchange problem is to take this triple, and a model I for S, and create a model J for T which satisfies Sigma. We might say: given a pair of metamodels S (plus some constraints which are not specified because we're about to be given the data as well and it is guaranteed to satisfy whatever it has to: let's write S+) and T plus Sigma_T (let's write T+), and a consistency relation Sigma_ST, and a model I for S+, construct J which is a model for T+ such that (I,J) satisfy the consistency relation.
At this point, the database people observe that solutions may not exist, or if they exist, may not be unique, and they go on to define and study the existence-of-solutions problem which is: for a given M and I, does a solution J always exist? This is initially puzzling from a model transformation point of view, because in our world, we would not expect it to be useful to try to apply a model transformation which is defined only by its consistency relation. It's sort of the whole point that we have to have some way of choosing among solutions: we expect the consistency relation to be only part of the specification of a model transformation; we expect to add a constructive procedure of some kind to explain how to construct a target model, at which point non-existence is "just" a bug in the program. Still, I'd like to be able to say confidently whether or not we can write transformations which are not actually executable (at all, never mind non-determinacy) in (fragments of) our favourite bidirectional transformation languages, so maybe it would be useful to think about this more.
Back in the database world, they observe that there's an important question concerning the language in which the constraints are written. FOL is too strong - the existence of solutions problem is undecidable - so they look at fragments. It turns out that what happens is very sensitive to which fragment they choose.
Time to go to the next talk; maybe even more anon. Do let me know if you're reading this: perhaps I should offer a small prize for the first comment.
1 comment:
people are reading. though it can be challenging to follow 'random thoughts', challenging, but also quite interesting.
Post a Comment