Reliability isn't just for getting everything that was sent....
Written by Howard Wang
May 24, 2012
I got a email from a user that basically stated that "as a general rule, sending data with BEST_EFFORT Reliability qos (i.e., using nominal UDP semantics) should provide better performance than sending data with RELIABLE Reliability QOS on a stable, clean and thus relatively lossless network".
Hmm, that sounds reasonable enough....to use a reliable protocol, the delivery protocol would have to send and process additional network packets like heartbeats, ACK/NACK packets thus consuming both network bandwidth and additional CPU cycles. This additional overhead should make the "performance" of a reliable connection worse than that of a "best effort" connection. If not "worse", it certainly shouldn't make it better...or would it?
Well, we may want to make some definitions first. What is "performance"? Is it maximum throughput? Or the latency of the data (time it takes from sending to receiving)? Or resources being consumed while sending at a particular data rate (CPU, network bandwidth, memory)?
In general, when sending data slower than the network bandwidth on a "stable, lossless transport", then with Best Effort, there is no additional overhead in CPU/network bandwidth/memory being consumed. Of course, if you use the Reliable mode, you'll get the same throughput performance but at a higher "price" (overhead).
So, no, you do not get better throughput/latency, using Best Effort vs Reliable QOS when sending data below network bandwidth limitations on networks that do not lose data packets. You'll get the same throughput/latency performance...just for lower "cost".
If there is a chance that data packets can be lost on the network no matter what the network load is, then there is an obvious performance difference between Best Effort and Reliable...not necessarily in terms of throughput and latency, but in terms of determinism vs the guaranteed receipt of all data sent in the order sent.
With Best Effort, you may not receive all of the data, but you will receive whatever data that was able to get through with minimal latency (more deterministically), and no additional overhead will be incurred even if there is data loss.
With Reliability, the reliable protocol will be able detect and repair lost packets so that all of data sent will be received in the order sent, at the expense of additional network packets (HB, ACK/NACK) to detect and repair lost packets, not to mention the increased CPU and memory needed as well. But one could argue that the "performance" of the Reliable connection is better if less deterministic (i.e., there may be unpredictable delays in receiving data while the system repairs missing data).
That's all good and great when the data rates are well below the network bandwidth...
However, when you send data faster than the system can handle, no matter if the network itself is "lossless", e.g., shared memory, data still can be lost...by DDS or the OS if not by the network hardware.
It's easy to send data faster than the network can handle. Data rates is calculated by (amount of data/time). You can overwhelm a network by sending small 1-200 byte data too fast. Or the same can happen by trying to send a MB in a single write() call.
When an application tries to send data faster than the network can handle, data packets are lost.
In Best Effort mode, DDS does not try to detect that it is being asked to send data faster than the network can handle. And in Best Effort mode, there is no mechanism to stop DDS from pushing data through the network stack even though the network is saturated. So the network stack and/or physical network will throw away data packets exceeding the network bandwidth.
So just because the "network" is lossless, doesn't imply that from App to App there isn't a place where data can be thrown out. The physical network may never see a packet because the OS throws out the data packet when the network reports that it can't handle any more. So the packet isn't lost by the physical network, but intentionally dropped by the OS or device driver layer.
e.g., the send socket buffer is full which causes OS to throw out the data being sent before it reaches the Ethernet card.
Or more likely, since it usually take more CPU to process incoming data then to send outgoing data, Sending apps usually can send much faster than Receiving apps can process, and thus the receive socket buffer (or shared memory buffer) fills up while the CPU is busy processing received packets....then the Ethernet device or the OS shared memory driver has no choice but to drop the data packets it's received.
So how fast is too fast? Well, assuming a "clean network", it's when the sender tries to send more than the total amount of data that can be buffered in the "system" in one go...without any delay between sends. The "system" being a combination of the send network stack, the network itself (including buffers in switches/routers) and the receive network stack. The main places where significant amounts of data can be stpred are the send buffer and the receive buffer.
For RTI's shared memory driver, there is no independent send buffer versus receive buffer vs network buffer, there is only 1 shared memory buffer. So if you send data > the size of the shared memory buffer in one go, then some part of the data will probably be lost.
Let's take the case of sending "large data". Large data is defined as data that is larger than the MTU (maximum transmission unit) of the physical transport. The largest user data packet that can be sent by UDP is 64K. So sending 1 MB of data in a single write() call would require some mechanism, either RTI DDS's builtin large-data fragmentation feature or a user-level software layer, to break up the large data into smaller (MTU-sized) chunks, and sending the fragments individually through the physical network.
And usually sending the data fragments consecutively without any delay...which with today's CPU speeds..can easily exceed the maximum network bandwidth of most networks.
e.g. sending 1 MB in the 1 ms that takes a CPU to breakup and send 1 MB in 64K chunks through a UDP socket requires a network that can handle 8 Gbps. A 1 Gbps network would not be able to transmit the data that fast.
With other networks, if the large data being sent is greater than the network can buffer, then data fragments could be lost.
e.g., 1 MB of data. 64K chunks --> 16 data fragments are sent. But if the shared memory buffer only holds 512 KB of data, it's likely that the send side sends much faster than the receive side can process, so up to 8 data fragments could be "lost" (in the case that the send side sends so fast that all of the fragments is "sent" even before DDS on the receive side has a chance to take one packet from the network).
The situation that I just described is exactly what would happen if you try to send too much data in Best Effort mode. There is no throttle. DDS will push the data to the network as fast as the application sends the data. And if the application gives DDS large data (e.g., 1 MB), DDS will send all of the data in fragments without delay. If the "network" looses data, then you'll see your effective throughput either be zero (i.e., the network is always loosing the last parts of a large data), or not with high performance (i.e., every now and then you get lucky and all of the fragments of a large data sample does make it through).
So, what can you do? Put in a mechanism to limit the rate that DDS pushes packets onto a network to something that the network can handle. You can do this open loop, i.e., put in arbitrary delays between sending of data at the application layer and/or use the RTI DDS FlowControl mechanism, or closed loop, by using feedback from the receiving side to let the send side know when it's OK to send more data.
The closed-loop mechanism is basically what you're getting with the Reliable mode. By using a limited-sized send queue, the reliable mechanism will block DDS from sending any more data when the send queue is full, and only when there is feedback (ACKs) back from the receive side (indicating that it's able to process more packets) is DDS allowed to send more data. This is also known as "throttling".
Yes, this will add some amount of overhead...but sending using the Reliable protocol to throttle the send rate and thus not lose any data due to excessive data rates at the cost of receiving/processing HB/ACK is a small price to pay compared to sending data so fast that data is lost and then having to use the same Reliable protocol to repair the lost packets.
So, even when using the Reliable protocol, it's still better to tune the protocol to never send faster than the end-to-end network can handle.
In short, for large data, you're almost guaranteeing that DDS will try to send it fast than the network (even shared memory) can handle, and thus data will be lost. If you're using RTI DDS's internal large data algorithm, the data rate can be throttled using the Reliable protocol. If your own code is breaking up the large data yourself, you can use arbitrary delays in your send loop. Another open-loop approach is to use the RTI DDS FlowControl mechanism which can be configured to limit the max send rate for a DataWriter to a specified data rate. The FlowController can also be used by the RTI DDS internal large data algorithm.
For those of you who have used TCP for transferring MBs and MBs of data without every having to worry about this issue...well TCP internally breaks up data to MTU sized chunks and uses a reliable protocol for data transfer and limited buffer (queue) sizes so that it doesn't send data faster than the network can handle. You don't actually get to choose if you want to send Best Effort or Reliable, it's always Reliable. And it's hard to tune TCP to work under abnormal conditions. And the MTU size is usually based on the MTU of Ethernet (around 1500 bytes).
So, in conclusion, sending data using Best Effort QOS may not provided the best performance...especially if peak data rates are greater than the network data rate. You can see this on the highways of California...during rush hour, there are metering lights at the on-ramps that regulate when cars get to get on the highway. With the metering lights, the network, aka highway, can be run at higher effect throughput. Without this type of regulation, driving in the SF Bay area or LA during rush hour would be more of a mad house than it is.