The Data-Centric Modus Operandi
Written by Rick Warren
August 16, 2010
DDS stands for "Data Distribution Service." Data distribution is not messaging, and it is not eventing. However, data distribution subsumes messaging and eventing as use cases to a large extent, and as a result it often gets lumped into those categories.
Data distribution is about observing a changing world. A system whose communication is based on this paradigm tends to become data-centric: it becomes more concerned with modeling the first-class concepts of its business domain and less concerned with managing second-class "who-told-whom-to-do-what" middleware concepts like queues and messages. Along the way, it enjoys the benefits of decreased coupling and improved reliability, scalability, and performance.
Data Distribution and Its Kin
Classically, messaging is an evolution of the remote method invocation (RMI) paradigm — an attempt to make that paradigm less coupled and more scalable by making it asynchronous. A message says "I tell you to do this." When compared with RMI, "I" and "you" are more abstract, both in identity and multiplicity, and the request can be queued for processing at a later time or by another party without making the sender wait. These are improvements, but the interaction remains coupled, because the roles of "I" and "you" (often in the guises of "client" and "server" or the trendier "service consumer" and "service provider"), as well as the intention of what action should be performed, are still very much in play.
Eventing, like data distribution, is preoccupied with changes to the world. An event says "I changed in this way." It reduces coupling by entirely removing both the recipient of that information and any notion of intention from you business logic and your mental model; who might receive an event, and what they might choose to do as a result, are not the business of the event source. But state management remains a problem, because in order to understand the change that occurred, all recipients must have an up-to-date understanding of the state of the world prior to the latest event — "the price went up by a dollar" doesn't do me any good if I don't know what the price was before. This temporal coupling means that every recipient must process every event in order, whether those events are interesting or not, just in case the interpretation of a subsequent interesting event should happen to require the state established by a previous otherwise-uninteresting one.
The resulting processing and state management are complex and expensive. As a mitigation, they are frequently factored out of the applications that need the data and into state-management "servers" that "clients" must query using a message-centric or even RMI-based approach — a huge regression in engineering practice! The system becomes complicated by the presence of multiple interacting communication paradigms, and the servers (which serve no business role) introduce performance and fault-tolerance choke points.
A data-centric architecture eliminate these problems by simplifying the interactions. A data sample says simply "the world is like this." It thereby eliminates coupling not only in terms of source, recipients, and their intentions, but also in terms of time. There's no longer any need for recipients to process or store information they don't care about, because samples don't implicitly encompass previous samples. Therefore it becomes perfectly reasonable for one observer to examine the state of the world every second, or every minute, or every hour — and for another to observe every single intermediate state, even if those states change from one to the other many times a second.
Modeling the World with DDS
A set of DDS entities, and the data they distribute and manage, define a view into this changing "world."
- A "domain" defines the boundaries of the world, the set of information that a collaborating group of applications might find interesting. A "domain participant" defines the presence of some application in that world; it is the data-centric analogue to what is frequently known as a "connection" in the messaging middleware.
- A "type" is a structural description of some part of the world — for example, an Antelope is brown in color and has four legs and two horns; a Ferrari is red in color and has four wheels and two seats. A type has a formal definition, usually (though not always) in a declarative language like XSD or OMG IDL, and it implies a corresponding definition in the target programming language.
- A "quality-of-service" (QoS) definition defines the fidelity with which some party/parties is/are able to describe the world. For example, will the description contain every state the world passes through or only a subset? Will observers have access to new states of the world only, or will they be able to see previous states as well? If the latter, how far back will those previous states go?
- A "topic" defines some aspect or subset of the world consisting of similar objects. As such, it combines a type, which defines the structure of those objects, with a QoS definition, which defines how they can be observed to change.
- An "instance" defines a single object in the group defined by a topic. For example, a topic may be used to distribute the positions of airplanes as detected by a radar. Each plane would be an instance. All radar tracks have the same structure (type) and are updated in the same way (QoS). But they are also distinct from one another: it matters whether the plane at a given location happens to be American Airlines flight 123 or Delta flight 456.
- A "data writer" defines a source of information about a particular subset of the world (topic). As such, it may override the QoS of its topic — multiple parties may provide information about the same part of the world but with different degrees of fidelity.
- A "data reader" defines an observer of a particular subset of the world (topic). As such, it may also override the QoS of its topic. Furthermore, it may only be able and/or interested to observe certain states of the world. For example, it may only be interested in airplanes flying over a particular geographic area or in stocks trading at over $20/share.
By creating a data reader with a certain QoS definition, an application makes an affirmative statement that it wishes to observe a certain portion of the world under a certain set of circumstances. For example, it may state that it is interested in observing the most recent five states (samples) to the objects (instances) in its part of the world (topic), but it doesn't need to process changes more frequently than once every second.
This statement is one of interest only; it in no way requires the observer to actually observe a certain set of samples in a certain way or within a certain period of time. On the one hand, the observer may choose to be notified asynchronously of every new sample and to respond to it immediately. On the other, it may "go away" to other business and return hours later; when it does, it will find the most recent five samples of each instance, occurring no more frequently than once every second, waiting for it. In the mean time, DDS will have taken care of all of the necessary data reception, filtering, and replacement in order to make that happen.
DDS's ability to combine notification and lightweight caching — in effect, to maintain an application's observed state of the world on its behalf — is something no other standards-based technology provides. Developers of data-centric systems reap the benefits: higher performance and scalability, greater tolerance to dynamic network conditions, and ultimately improved ROI and time-to-market.