This week we’re talking about administration and monitoring with a focus on DDS systems and how Connext Tools can help. We continue our discussion with Ken Brophy, principal software engineer, as he discusses how to use RTI Tools to improve performance in operational monitoring and customer system integration.
In Episode 26 of The Connext Podcast:
- [0:27] Operational Monitoring: What is it and how is it used?
- [4:17] Integrating Connext DDS with System Management Infrastructures
- [6:38] Minimizing operational disruptions, optimizing results
- [9:29] A use case in maximizing throughput
- [Blog] Implementing Simple Introspection with Connext DDS in C++14
- [Datasheet] RTI Connext Tools
- [Webinar] Accelerate Distributed Systems Development using Connext Tools
- [Webpage] RTI Labs
Steven Onzo: Hello everybody and thanks for joining us for another episode of the Connext podcast. Today, we continue our conversation with RTI Connext Tools Technical Lead, Ken Brophy. In Part II, we take on system administration and monitoring. Ken will give us an overview of operational monitoring and discuss how customers are integrating Connext DDS with existing monitoring infrastructures. Ken, welcome back.
[0:27] Operational Monitoring: What is it and how is it used?
Lee Johnson: Moving on, admin console does a lot of things. We want to shift gears, perhaps more towards systems in operation and how tools apply to use cases after deployment, specifically admin console. You mentioned RTI Monitor earlier, but before we get there, we often refer to this general area of capability called operational monitoring, and it's coming up more and more frequently with our customers. What is it? What is operational monitoring?
Ken Brophy: Right. Well, I think in a nutshell what we're seeing is that as customers take their system from deployment to production, they want to keep an eye on it and make sure that the system is functioning as it was intended, as it was designed, and proactively be able to keep an eye on that so that if there is a failure, they'll be able to recover quickly and maybe even predict when things might need an upgrade. You might need more memory or you might be running into a network bottleneck. That, to me, is kind of what operational monitoring does, and it's multi-faceted. There's a whole lot of stuff typically in a customer system. It's not just DDS stuff that's happening, they want to keep an eye on the CPU and the memory and the throughput and their switches and their databases and their web servers and all the various components that they use to build up a full application.
Ken Brophy: For us, we are looking at that as a challenge for us to solve in the near future. We'd like to tackle the operational monitoring space with a product offering that fits those needs. Today, we have the RTI Monitor, which is heavily focused on DDS debugging. Now it does show you some CPU information and some memory, but it's kind of a stove pipe today. You bring up the RTI Monitor and it shows you lots of very detailed information about DDS, the protocol statistics and this sort of thing. Ack/Nack’s, heartbeats, and ...
Lee Johnson: Which are essential if you're concerned about the health of the DDS system.
Ken Brophy: Exactly, but I view that tool more as a development tool than a deployment tool. The deployed systems are typically looking at bigger picture items. The throughput is typically 1,000 samples per second on this topic, and last Wednesday it went up to 3,000. Well, why was that? Well, you're not going to do that with Monitor, but you will debug and tune your system to achieve...1,000 isn't very difficult, depending on your data size, but you use Monitor in those situations on your development side to tune and do deep analysis of how to get the system to where you want it to be, but the operational side has a different set of concerns.
[4:17] Integrating Connext DDS with System Management Infrastructures
Lee Johnson: We do see more and more customers either integrating or wanting to integrate with system management infrastructures, operational monitoring infrastructures, for tracking and measuring health resource and performance attributes of their system. Where do you see some of the particular opportunities for RTI and Connext DDS to integrate with these existing monitoring infrastructures?
Ken Brophy: Alright, well we do provide a deep set of statistics regarding each data reader, data writer, and participant. There's a set of statistics that goes along with them and what you can do is mine the interesting fields out of that and use it for your operational system. The way that it works is we don't have any proprietary data flows or anything like that with the monitoring technology, we have these 13 DDS topics that anybody can subscribe to and utilize for their needs. We happen to produce a graphical interface that consumes those topics and presents a nice interface to work with and look around and debug, but you could use those same topics to drive your operational system and gain a lot of value from that. We're starting to see people want to do that and actually, they are doing it, and they've been coming to us for advice how to best configure that and integrate that into their system. I think that's an active area for us to investigate.
Lee Johnson: Yeah, taking those vital signs of a DDS system that we provide out of the box and helping customers transport them to the right place in the process and analyze them and respond to them accordingly.
Ken Brophy: Right, and each one has their own technology stack, so one of the challenges there is what can we offer that will be generally useful for customers solving that issue.
Lee Johnson: Okay, good. Good.
[6:38] Minimizing operational disruptions, optimizing results
Lee Johnson: Continuing on on the monitoring topic, we often hear about or get questions about intrusiveness, the impact of instrumenting a system to be able to monitor its behavior on the system itself. Are there things that we do to manage or minimize the impact that our tools and monitoring capabilities have on a system in operation?
Ken Brophy: Yeah, there's a definite spectrum that you can pick from in terms of how much impact the tooling will have, so the Heisenberg Principle applies here like it does anywhere else. Observing means changing and the monitoring library, which is the instrument package that provides those statistics, certainly does have an impact, though it's pretty negligible. We've done some testing and it's not a very big impact at all on throughput or latency but a couple of things you can do: You can, for example, publish the information that the monitoring library gathers on a domain that's different than the ones that you're using for your production system. Now it'll still use throughput from your switches and things like that, but you can separate traffic so that no discovery happens. You don't impact discovery and that the traffic at the switch is segregated to different ports and things so that you have a minimal impact there.
The other thing you can do is you can pick which topics you want the monitoring library to actually publish. There are kind of two flavors of topics in the 13 offered, and one flavor is the QoS setting, so it provides QoS setting as a topic. Now, most people don't need that in their operational system, you need that when you're debugging and when you're developing and tuning. You can stop sending those right away. That'll cut out a pretty big chunk of the data and you can also fine tune the per-matched topics. Often we recommend to just omit those in an operational scenario because those are interesting but only really when you're developing and debugging. They're not all that interesting normally in a production system where things aren't as dynamic. You're not having machines come up and down every few minutes, you're in a more steady state so those are less interesting in those scenarios.
[9:29] A use case in maximizing throughput
Lee Johnson: Some important capabilities are available today and it sounds like there's some neat stuff coming on the horizon. You mentioned tuning in there, and I want to touch up on that for just a moment because the process, the approach to tuning the performance, the behavior of a distributed system is something that really straddles both the development process and a system in deployment. How do customers look to tune the behavior and performance of their systems, whether that's in the earlier stages of development or after they've deployed a system in operation?
Ken Brophy: Right, well I think there's one typical use case that people are looking to maximize very frequently, which is the throughput. It's one of those performance metrics that in and of itself is very important and also perhaps maybe overemphasized. However, you can use the Monitor as a good tool to do this with. So in Monitor you'll be able to see the throughput measurement, and as you tune your QoS settings you'll be able to see the heartbeats and the Ack/Nack’s and what your overheads are so you can make adjustments in your Quality of Service to maximize your throughput. That's a very common scenario people want to look at when they first engage with our product, and we have very high-performance numbers for that scenario.
My take on it is that you're really never going to be running at that maximum throughput level because as soon as you start doing something with the data, now you're taking time away from sending data around, and what's really more important, I think, is can DDS handle what my system needs for throughput? I think that's why people look at throughput as one of those like measures of goodness, but Monitor will help you tune that. You can look at the protocol information and get all the statistics just right, so that you're maximizing that use case.
Steven Onzo: Ken, thanks for another terrific episode and thanks to all of you listeners out there tuning into another episode of The Connext Podcast. Next week we wrap up our podcast series with Ken Brophy as he covers testing and tuning of DDS systems and answers the top customer questions on Connext tools. If you have suggestions or feedback on this or other episodes, please contact us at podcast@RTI.com. Thanks and have a great day.