Who Is Chopping My Application Data and Why Should I Care?
Written by Sara Granados
July 27, 2017
As you probably know, DDS data is sent on the wire as RTPS messages. As such, these messages include a header and the data payload. The header contains useful information such as host ID, remote ID and sequence numbers; we’ll refer to the payload as ‘data sample’. For instance, in this Wireshark capture you can see the header and two submessages: INFO_TS, which contains the timestamp info, and DATA_FRAG, which is actually a data sample fragment.
Knowing that, it may come as a surprise to discover that RTPS messages are not the ones actually sent on the wire. RTI Connext ® DDS relies on a transport that sends the messages from the host to the remote application over the network. That transport, by default, is UDPv4. So each RTPS message needs to be wrapped as UDP datagrams that the OS (or the IP Stack in general) can send. At the same time, UDP runs on top of an IP stack, which also splits and wraps messages with its own headers.
In short, RTPS messages need to be wrapped inside UDP datagrams first, which are, at the same time, encapsulated into IP fragments. To complicate everything a bit more, UDPv4 datagrams have a maximum size of 64KB while IP packets for a Ethernet LAN has usually a Maximum Transmission Unit (MTU) of 1500 Bytes. This means that an 80KB data sample would need to be split up into 2 UDP datagrams: one of 44 IP fragments and another one of 11 (see image below).
As you can see, your DDS data sample is going to be split several times, and then reassembled before getting to your application.
On one hand, for IP fragments, the IP stack will mark each of the fragments as part of a bigger message and indicate the position (offset) of that fragment in that message. Once all of the fragments arrive at the socket/kernel reception buffer, the IP stack will transfer the reassembled UDP packet to the DDS application.
On the other hand, DDS is prepared to reassemble the split data sample fragments -called DATA_FRAG in the DDS standard and in the Wireshark capture above- into your original data sample before notifying the application that new data is available.
Then, why should you care about fragmentation?
There are mainly three scenarios in which fragmentation could affect your communication using DDS:
- IP fragments (pink boxes) of incomplete messages are filling the socket/kernel receiver buffer. In some OSes, like Windows systems, there is a maximum number of IP fragments that can be temporarily held in the receiver buffer. That buffer usually has a cleanup timeout and, when that timeout expires, incomplete IP fragments are dropped. If the buffer fills up with fragments that cannot reassemble a UDP packet and/or the cleanup timeout period is too long, the system may end up without free resources to hold new incoming IP fragments. This causes new fragments to be rejected until resources are cleaned up. If fragments in the buffer are cleaned up before a packet can be reassembled, the data sample will not be reassembled and, therefore, will not be delivered to the DDS application and could be considered lost (when using reliable communication). Note that some OSes, like Linux, allow configuring this maximum number of fragments and the cleanup timeout period, while others, such as Windows, does not.
- An IP fragment is lost in the network. Since DDS handles packets to the UDP level, the reliability unit is the UDP packet and can be repaired by a DDS application, while an IP fragment cannot. Thus, if an IP fragment is lost, the NIC cannot reassemble the UDP packet and the whole DDS data sample will be considered lost.
- Some switches drop IP packets marked as fragments. This may happen because they are designed to do so or because they want to avoid an IP fragmentation attack.
If any of these scenarios happen, you may not see any communication at all between your DDS applications. You can review this article on how to confirm if IP fragmentation is the cause of your lack of communication and how to fix it.
And what about performance?
Even if your applications communicate, IP fragmentation may still affect your performance.
If an IP fragment is lost, the DDS layer will not receive data from the IP layer. From the point of view of Connext DDS, the RTPS packet containing that fragment is lost, and may require resending it.
When you use best effort delivery, resending will not happen. If you are sending large data with best effort (for streaming, for instance), this will result in a very inefficient usage of your network. For example, if you send a full HD image (1920x1080 pixels), you will be sending around 6 MB of data. If one IP fragment is lost (1500 bytes), you will discard 99.99% of the data that arrived correctly to the receiver.
When you use reliable delivery, DDS will only resend the RTPS fragment for which IP fragment was lost. In our example, the 6 MB image will be split in ~95 DATA_FRAGs of the maximum UDP datagram size (if message_size_max is set to 65KB, the default value). When one IP fragment is lost, 64KB of data will need to be resent, which is only about 1% of the data.
In both cases, if your DDS application is continuously resending data sample fragments due to packet loss, it may end up affecting your overall performance (both throughput and latency). When using Connext DDS, there are several QoS considerations to mitigate this performance loss. To learn more about the specifics of these configurations, be sure to read this Knowledge Base post over on the RTI Community portal.