Software-Defined Is All About the Data

Entry 2 of the Software-Defined Future Blog Series


AI can and will run cars, traffic control, urban air mobility, trains, renewable energy, hospital devices, surgical robots, naval systems, air defense, avionics, simulation and training.  It will make the entire planet run better.  How?  How can we enable the intelligence of our future to control the physical world of our past?  This is the second entry in a blog series striving to answer that question.


As outlined in the first entry, the key to artificial intelligence (AI) is data. And the most important difference between intelligent cloud systems and “smart-world” systems is how the AIs get that data.  In summary:

Intelligence in the cloud needs data.  

Intelligence in the real world needs dataflow. 

To make the world run better, don’t design systems around algorithms, devices, or even functions. To make the world run better, design systems around the dataflow.  This “data centricity” organizes the flow into intimately-shared information. Data centricity adapts to many industries, transforming them all to be better. 

So…What is data centricity? Why does it work for these systems?


Software-Defined Systems and Data Centricity

Basing designs on software enables many new capabilities, but don’t confuse benefits of software adding software to the design with the potential of being “software-defined”.  Sure, connecting to devices may enable new business models, let manufacturers evaluate and maintain devices after sale, or ease product lines that scale from low-end to top-line versions. But a software-defined vehicle isn’t about charging for air conditioning, improving operation in the field, or simplifying configurations. Software definition is about defining the system by the software it runs. It’s a fundamental transition to an entirely new paradigm, as profound as adding a processor to the system in the first place. Software-defined vehicles, transit systems, industrial controllers, and medical patient-monitoring systems bring the vast potential of artificial intelligence to real-world systems.

So, how do these systems get the dataflow they need? Increasingly, intelligent software-defined systems are using a concept known as “data centricity”. Intelligent applications span defense, renewable energy, air traffic control, avionics, trains, ships, metros, launch control, satellites, hospital networks, medical imaging, surgical robotics, sports timing, simulation holodecks, electric vehicles, construction equipment, and even flying cars (aka Urban Air Mobility, UAM). Data centricity is the  #1 architectural approach used by all these applications.  Data centricity makes it easier to get the data to and from the AI to make the world run better.

In contrast to the complexity of the real world it controls, data centricity is a strikingly elegant and simple concept! At its core, data centricity just means you figure out what data modules need, and then design the system components around the data. The opposite is to define servers and clients and routers and networks and applications first…and then figure out what data the components use. In a data-centric architecture, applications don’t communicate directly with each other or with active components like servers. They share data directly according to strictly defined rules. Implementing data centricity is deeply complex, because the real-world systems have so many different challenges. But in concept, it’s simple and profound. Data centricity simply flips the thinking: data first, then system.

Importantly, realize that data centricity is a completely different way of building systems. Traditional systems are built around system components like servers, connections, networks, and topologies. A data-centric system doesn’t do that; it builds around the data. So, for instance, instead of collecting data, a data-centric system enables superb control of dataflow. Instead of connecting to a server, a data-centric system just requests delivery once. Instead of securing a network, a data-centric system secures the data itself. The system components are still important, of course. But their function is just to move data to where it needs to go. By thinking this way, data centricity can completely hide the physical implementation from functional elements: e.g., it doesn’t matter how the data gets there, as long as it gets there on time. That simplifies organization: rather than write code that’s dependent on servers or topology, a data-centric design organizes independent modules. The modules interface only through the shared data. And, if they can also share that data fast enough, precisely enough, and reliably enough to control physical reality, it makes physical systems far easier to build and control. In particular, it eases mobility, integration, scalability, security, feedback, and more (these are the subject of the next blog). Data-centric designs enable the smart world. 


Data Centricity Concept

If you think data first, you can write your software algorithms, device handlers, and interfaces as independent modules and hook them up later. So, the fundamental diagram for a data-centric software module looks like this:




Data Centricity.  The fundamental property of data centricity is building applications around data.  In the case of a database (data-centric storage), all the data is in a central place where every application can get it.  In the more interesting case of data-centric real-world systems, the middleware delivers the right data to every application at the right time.  


That’s it. Two boxes. The diagram may look too simple, but it’s essentially correct. Don’t underestimate the orange box; that box can contain all the data in a smart car with radar, Lidar, AI, and autonomous driving. Or all the data in a hospital system with 200,000 complex instruments. The green box is also very general.  It could represent thousands and thousands of lines of sophisticated code that does complex things: sensor processors, AI algorithms, or consoles in an operations center.  The power of data centricity is that it reduces this complexity to, well, two boxes.

Conceptually, a data-centric system logically puts all the data “inside” the application as if it was in local memory. This sounds like magic, but it’s not; the middleware simply knows what data the application needs and when it needs it. It also knows where to get that data.  So, it “gets” it (actually arranges for it to be sent), and sticks it into the application’s memory at the right time. All the data isn’t really “inside” the application, that’s a virtual abstraction. But the applications can act as if any and all data they need is simply available.

Let’s zoom out just a bit here.  A typical real-world system contains sensors, actuators, algorithms, and usually operator interfaces. With a data-centric world, teams writing code for these can specify what data they need and produce. At run time, the middleware provides every component what it needs to run. It’s conceptually that simple.


Click image to enlarge.

Data-Centric System Operation.  On the left, software components specify the data they need and produce.  The Quality of Service (QoS) pecifications are sufficiently detailed to decouple components around data.  There are no system concepts like servers, clients, messages, or transactions complicating the interactions.  Each component virtually has the data it needs.  At runtime, the middleware makes this work; it finds the data wherever it lives and delivers it in accordance with the QoS specifications to where it’s needed. Thus, all the modules can work alone, but they can also participate in a larger coordinated system. 

(By the way, this is the origin of RTI’s tagline:“Your Systems. Working as One.”)


Note that this also has implications for software team organization. Data-centric systems provide a data architecture: what data is shared, how to join, and when and how data is shared. With the data architecture defined, teams can focus on just their module. Thus, data centricity not only specifies how modules work together, it specifies how teams work together.



Click image to enlarge.

Data-Centric Team Coordination.  The data-centric system defines the data to share and enables components to get the right data at the right time.  Thus, just like the components they write, a team can work alone, but also participate in a larger coordinated system.  Some real world systems have very large teams; one of RTI’s customers has over 1,500 teams of “cooperating” programmers.  And yet their software integrates cleanly.


Why is data centricity the right foundation for software-defined systems?

At its core, data centricity is a software integration technology. The basic goal is to provide a sufficiently deep interface to separate the problem into manageable chunks, and then to pull off the absolute magic of making those chunks work together. 

To work together, applications must share data. To do that, they have to: a) agree on what to share, b) have some way to join the party, and c) be able to get data when needed. A bit more formally, they need a data architecture, defined as: a) a data model (what is shared), b) interfaces (how applications join), and c) a data-exchange paradigm (when and how data is shared). An effective data architecture also implements strong rules that make these things crisp with guarantees. The data architecture is the core of any system. The architecture has to fit the application: it must solve all these issues in a way that satisfies the use case.


Data Centricity for the Enterprise: the Database

You are already familiar with data-centric storage, also known as a database. The reason databases took over the enterprise is that a database provides exactly the software integration that business applications need. For a relational database, applications share data “tables” that contain rows of data and columns of fields. They join through an API that most commonly connects to a server. And they exchange data on demand (“pull”) by searching with a simple query language (e.g., SQL). The rules (“ACID properties”) ensure consistency; they guarantee that the data is uncorrupted and the same for everyone. 

This database data architecture fits thousands of use cases that have to collect information and then provide it to the system. The database program is the same for each use case, but the data table is very different for an HR system (employee names, functions, and pay) and an auto-parts store (part numbers, inventory, and prices). Importantly, note that information is collected and stored ahead of time and then provided later to applications that find it with search. The whole purpose of the database is to provide the right stored data to the right application when it asks.

That’s a great architecture for most business applications. Databases run essentially every business system in the world today, from HR systems to auto-parts stores.  You wouldn’t dream of building an enterprise system without one. You wouldn’t even run a food truck. The business world runs on data stored in a data-centric database.


Data Centricity for the Software-Defined Future: the Databus

But the database architecture doesn’t work for controlling the real world.  While databases are a part of many real-world intelligent systems, they fundamentally work on stored (old) data and they fundamentally deliver that data on request.  Real-world systems have to model future events. And then, to react to those events, they have to proactively get the data needed to the right subsystems. They have to do all this at the right time, often measured in milliseconds.

The data-centric architecture that meets this challenge is called a databus.

Databuses are taking over the software-defined future because they provide exactly the software integration that real-world infrastructure needs. Like a database, databuses also have data tables that specify what’s shared (“Topics”). Unlike a database, applications join by calling a local API that creates a virtual shared memory space (“data space”) and specifies what Topics the application produces or consumes in that memory space. They exchange data on production (“push”), which sends the data to other applications that have registered an interest in consuming it. The rules for a databus (“QoS parameters”) are mostly about timing: they specify when and how to deliver data. 

This databus data architecture fits thousands of use cases that have to send information to other applications on time. Like a database, the databus middleware is also the same for each use case, but the data table and QoS settings are very different. With different data models, the same databus executable can be used to track incoming cruise missiles or monitor a heart patient’s ECG signal. Importantly, note that the information is sent immediately to applications that have registered interest to get it. The whole purpose of the databus is to deliver the right information to the right application at the right time.

There is a standard that defines all these terms crisply called the Data Distribution Service (DDS™). We’ll get back to that standard later in the series. 



Click image to enlarge.

Data-Centric Architectures.  Both a database and a databus are data-centric; applications communicate only through the data directly and not with each other.

  • A database fits use cases that have to collect information and then provide it to the system. It stores (old) data and provides that data on request.  
  • A databus fits use cases that have to deliver information to other applications on time. It proactively gets the data needed to the right place at the right time. 

Databases dominate business applications in the cloud. Databuses are the most prevalent architecture for smart-world systems on the edge


The best question at this point is really, so what?  What good is this, really?  For that, stay tuned for blog #3…


Blog Series Overview

The era of evolving artificial intelligence in the real world is perhaps better called the “software-defined” world, because although the explosion of hardware capability enabled it, it is software that increasingly defines where it can be used. And the most important characteristic of that software is how it controls dataflow. Getting the right data to the right place at the right time enables edge intelligence.

This blog series sets out to examine a key question: How can we enable the intelligence of our future to control the physical world of our past? In the first entry, we looked into the potential and reality of the software-defined future. This entry explains the data centricity concept that underlies most software-defined applications. Next, we will analyze when and how to use data-centric design to implement intelligence. In later entries, we will examine the challenges and benefits of software-defined systems, including autonomy, extensibility, online interactions, new business models, and cost savings. We will look carefully at the implications of the most profound change of all: updating fielded products to enable evolution. And we’ll dig into near-future use cases, including software-defined vehicles, software-defined medical systems, software-defined defense, and software-defined industrial automation. ChatGPT and its cousins will change the way we interact with the Internet and businesses, but it’s “only” a revolution in the online world. Software-defined systems will change the way the real world runs. That, in the end, is far more profound.


A smart world runs better.  Real-time intelligence matters.




About the Author

rti-event-headshot-stan-schneiderStan Schneider is CEO of Real-Time Innovations (RTI), the largest software framework provider for smart machine and real-world systems. 

Stan also serves on the advisory board for IoT Solutions World Congress and the boards of the Teleoperations Consortium and the Autonomous Vehicle Computing Consortium (AVCC). Stan holds a PhD in EE/CS from Stanford University.


Getting Started with Connext

Connext® is the world's leading implementation of the Data Distribution Service (DDS) standard for Real-Time Systems. Try a fully-functional version of Connext for 30 days.

Free Trial