In the 2019.2 release, we added built-in performance monitoring to the Inmanta Service Orchestrator. This post explains a bit about the why, what and how.
Why performance monitoring? For both operators (who run the software) and developers (who make the software), in-app performance monitoring is important to keep track of application health. If an API call becomes a little slower every day, you know there will be trouble eventually. By measuring and tracking performance, problems can be detected and resolved before things explode. Having built-in performance monitoring is a must for any kind of server or infrastructure software (e.g. the kernel, any kind of database, web servers, application servers …) .
There are many ways of doing in-app performance monitoring. We preferred a solution that could work stand-alone, without complicating the setup. We chose to use pyformance, a Python port of Metrics. Pyformance is a framework that offers different types of measurement (such as timer and counter), it is completely in-app (it doesn’t require a server like for example statsd), and it does performance data aggregation in-app.
The built-in measurement primitives are well designed and easy to use. For example, to measure how long a call takes (and count how often it is called, at which rate, what the 99th percentile of latency is and much more):
import time from pyformance import timer with timer("test").time(): time.sleep(0.1)
The pyformance/metrics approach is particularly nice for us because it works both completely stand-alone, where the application offers an API endpoint to expose the measurements data, and it works in a setup with a Time Series Database (TSDB) that collects and stores performance data over a longer time (like Influxdb, Graphite or Prometheus).
Our first target for the measurements was the external API. It is the most crucial thing to measure, for both operators and developers. It gives information about both the end-to-end performance and the usage. Additionally, it takes only one line of code to implement.
Overall, we are very happy with pyformance. It has a well designed API, it is convenient to use, and the overhead is limited.
One aspect of pyformance we didn’t use out-of-the-box: the reporters. To send data off to an external TSDB, pyformance has a suite of reporters. However, these are implemented in a synchronous way, while we use async Python. So we created an async reporter.
We also implemented an async Influxdb reporter. However, it is not compatible with the standard pyformance Influxdb reporter: we modified the way pyformance metrics are mapped to Influxdb metrics, in order to better connect to the way we use Influxdb.
We have not pushed our extensions to pyformance back upstream, as they aren’t feature complete. If you would be interested in using them, feel free to copy them and give us a shout.