Navigating IIoT Protocols: Comparing DDS and MQTT
The convergence of Operational Technology (OT) and Informational Technology (IT) has become a strategic imperative for organizations aiming to unlock...
RTI is the world’s largest DDS supplier and Connext is the most trusted software framework for critical systems.
|
From downloads to Hello World, we've got you covered. Find all of the tutorials, documentation, peer conversations and inspiration you need to get started using Connext today.
RTI provides a broad range of technical and high-level resources designed to assist in understanding industry applications, the RTI Connext product line and its underlying data-centric technology.
RTI is the largest software framework company for autonomous systems. The company’s RTI Connext product enables intelligent architecture by sharing information in real time, making large applications work together as one.
If you're serious about the performance of your distributed system, you probably read with interest the performance claims made by network middleware vendors. And if you're a network middleware vendor, you've probably published your share of performance claims. (RTI has comprehensive performance numbers available for both our DDS and JMS APIs.) But in order to know which claims are meaningful -- and more importantly, which are useful to you -- it's important to understand what you're reading. In the words on one of my coworkers, "many apples are compared to rhinoceroses."
First of all, there are three primary axes along which people tend to measure network performance:
The whitepaper "The Data-Centric Future" on the RTI website has a good overview of general performance considerations, so I won't reiterate that material here. What I'll focus on today is throughput, and specifically some of the subtleties you'll want to keep in mind when you're evaluating a networking middleware product.
The size of the data samples you send in a throughput test has a huge impact on the throughput you measure. At small data sizes (dozens to hundreds of bytes), performance is dominated by the expense of traversing the layers of software in between your application and the network: your middleware, if any; your operating system's network stack, including any system calls; and your network driver. As the data size increases, these fixed costs become less significant relative to the cost of actually copying the data (including the "copy" of the data across the network).
This difference in the throughput profile among different data sizes means that if you're reading a performance report, and you see "bytes" but your application deals with "samples" or "messages," it's important to understand the sample size(s) used to generate that report. It's not sound to generate some data with one sample size, add up the total number of bytes, and then divide by another sample size, because you're not correctly accounting for per-sample constant factors.
One vendor -- an enterprise software vendor newly entering the messaging market, who shall remain nameless -- made a series of throughput measurements for 256-byte samples. This vendor then declared that what their customers really cared about was 16-byte samples, and so immediately multiplied their measurements by 16 (= 256 / 16) and published those extrapolated results! Depending on how they were planning on sending those 16-byte samples, they were assuming either that (a) the cost of performing 16x the number of network sends is zero or (b) the cost of packing and unpacking 16-byte samples into and out of 256-byte chunks is zero. Of course, both of these costs are emphatically non-zero. But that brings me to my next point:
Because of the high fixed cost of a network send, especially relative to the cost of copying a small data sample, it is a common practice to batch multiple data samples together and send them together as a unit.
Suppose you're publishing 64-byte samples. Remember that each time you send a packet on the network, you're also sending a couple dozen bytes of IP header data and whatever meta-data your middleware requires. That adds up to a 30-100% space penalty -- added to the time penalty discussed above. But if you can amortize these costs over many samples, they become much less important. In fact, batching data can increase your effective throughput by more than an order of magnitude in some cases.
In our experience, you can send 50-80,000 smallish packets per second using commodity OS, computing, and networking components. When you see samples-per-second-style throughput numbers much higher than that, it means that those samples are being batched under the hood.
Note that data batching is an intrinsic part of the TCP protocol, so any middleware implementation that relies on TCP batches data all the time.
There are two ways to look at throughput: an application-centric view and a network-centric view. Which one of these a given community cares about governs which one gets measured and reported by vendors that market into that community. It means that you need to be aware of what you're reading when you see "n samples per second," especially when dealing with new vendor.
When you're evaluating a throughput claim, be sure you know which one of these scenarios you're looking at! I can tell you that there was a flurry of activity at RTI when a competing vendor started touting 6 million samples per second -- until we read further into that vendor's testing methodology and discovered that the result was an aggregate across 60 applications. For the record, the throughput numbers you will find on RTI's website -- showing over 1 million samples per second -- are measured 1-to-n. That's one publisher on one box publishing to one destination.
To me, 1-to-n numbers are the honest numbers when you have a peer-to-peer solution, as RTI does. That's because, assuming your switch can keep up, there's nothing to test other than the so-called "client" applications themselves. We can saturate a gigabit link for data sizes not much over 100 bytes, and come close even for very small sizes. Do you have more data to send than can fit over a single link? Then add another link. At that point, you're no longer testing the performance of the RTI infrastructure; you're testing the performance of your switch.
Now, hopefully, you have a better understanding of how throughput performance numbers data is measured and reported, what to expect from that data, and what to look out for. You're ready to enter the wide and wild world of performance evaluation. (Care to try your own hand? Pick up a copy of RTI Data Distribution Service and run the comprehensive performance test you can find in RTI's Knowledge Base -- search for "Example Performance Test.")
Of course, there's a lot more to network performance besides throughput. No doubt I (and/or someone else) will be returning to talk about latency, loaded latency, jitter, competitive analysis, and other topics in the future -- stay tuned.
The convergence of Operational Technology (OT) and Informational Technology (IT) has become a strategic imperative for organizations aiming to unlock...
A Comparison of DDS, TLS and DTLS Security Standards
Meet New RTI Team Member Marcos Rivera! Work environments around the globe have changed as a result of the recent pandemic, and the structure of...