Microservices Monitoring

Monitoring used to be a somewhat passive thing. You used tools like Nagios to perhaps send and alert if something seemed amiss, but mostly it was hands off. The times they are a changing. 

User Experience and Microservices Monitoring
With Microservices which are released more often, you can try new features and see how they impact user usage patterns. With this feedback, you can improve your application. It is not uncommon to employ A/B testing and multi-variant testing to try out new combinations of features. Monitoring is more than just watching for failure. With big data, data science, and microservices, monitoring microservices runtime stats is required to know your application users. You want to know what your users like and dislike and react

Debugging and Microservices Monitoring 
Runtime statistics and metrics are critical for distributed systems. Since microservices architecture use a lot of remote calls. Monitoring microservices metrics can include request per second, available memory, #threads, #connections, failed authentication, expired tokens, etc. These parameters are important for understanding and debugging your code. Working with distributed systems is hard. Working with distributed systems without reactive monitoring is crazy. Reactive monitoring allows you to react to failure conditions and ramp of services for higher loads.

Circuit Breaker and Microservices Monitoring 
You can employ the Circuit Breaker pattern to prevent a catastrophic cascade, and reactive microservices monitoring can be the trigger. Downstream services can be registered in a service discovery so that you can mark nodes as unhealthy as well react by reroute in the case of outages. The reaction can be serving up a deprecated version of the data or service, but the key is to avoid cascading failure. You don't want your services falling over like dominoes.

Cloud Orchestration and Microservices Monitoring 
Reactive microservices monitoring would enable you to detect heavy load, and spin up new instances with the cloud orchestration platform of your choice (EC2, CloudStack, OpenStack, Rackspace, boto, etc.). 

Public Microservices and Microservices Monitoring

Microservices monitoring of runtime statistics can be used to rate limiting a partners Application ID. You don't want partners to consume all of your well-tuned, high-performant microservices resources. It is okay to trust your partners but use Microservices Monitoring to verify. 

As the Russian proverbs says "doveryai no proveryai" (trust, but verify). Monitoring public microservices is your way to verify. Once you make microservices publicly available or partner available, you have to monitor and rate limit. 

This is not a new concept. If you have ever used a public REST API from Google for example, you are well aware of rate limiting. A rate limit will do things like limit the number of connections you’re allowed to make. It is common for rate limits to limit the number of certain requests that a client id or partner id is allowed to make in a given time period. This is protection. 

Deploying public or partner accessible microservices without this protection is lunacy and a recipe for disaster, unless you like failing when someone decides to hit your endpoints 10x more than you did the capacity planning for. Avoid long nights and tears. Monitor microservices that you publish, and limit access to them.

The reactive manifesto is a good tutor for the types of monitoring you will want to do and states that your system should react to change instead of just fail. 

Microservices Libs and Microservices Monitoring

QBit is a reactive mircoservices library that comes with a runtime statistics engine which can be used for Microservices Monitoring. You can query QBit services via WebSocket RPC using JSON or REST/JSON. The QBit statistics engine is easy to query and use. The stats engine can keep track of counts efficiently across a cluster of services. This allows you to write code that reacts to microservices metrics. QBit stats can be used to implement features like rate limiting, or spinning up new nodes when you detect things are getting overloaded. QBit can also feed stats into StatsD

StatsD and Microservices Monitoring

StatsD is a network daemon for aggregating statistics, such as counters and timers, and shipping over UDP to backend services, such as Graphite or DatadogStatsD has many small clients libs for Java, Python, Ruby, Node, etc.  StatsD server collects stats from clients using a published wire protocol.  StatsD is the de facto standard. Although the Etsy StatsD Server is the reference implementation (the first implementation was written in Perl), there are other implementations like Go Stats Daemon, Data Dog and many moreStatsD captures different metrics, Gauges, Counters, Timing Summary Statistics, and Sets. You decorate your code to capture this type of data and report it. Although StatsD collects runtime statistics data over time and does periodic “flushes” of the data to analysis and monitoring engines you choose, StatsD was originally written with Graphite in mind. Graphite is used to visualize the state of microservices. Graphite is made up of Graphite-Web (graph and dashboard rendering), Carbon (metric processing daemons), and Whisper (time-series database) library.

StatsD seems to be the current champion of mind space. Mainly due to its simplicity and fire-and-forget protocol. StatsD can’t cause as cascading failure, and its client libs are very small. There are other alternatives that QBit can integrate with as well like Coda Hale’s Metrics library which uses a Go Daemon.

StatsD can also dump its feed to Kibana or Banana via a Logstash plugin. You can use Kibana and Banana in place of Graphite. There is even commercial support of StatsD via DataDog which allows monitoring, graphing, alerting, and event correlation. DataDog embedded the StatsD daemon within the Datadog Agent so it is a drop in replacement for StatsD. Datadog is a monitoring service for IT, Operations, Development and DevOps. It attempts to take input from many vendors, cloud providers, open source tools, servers, and aggregate their data into reactive actionable metrics.

Reactive Microservices Monitoring

Reactive Microservices Monitoring is an essential ingredient of microservices architecture. You need it for debugging, knowing your users, working with partners, building reactive systems that react to load and failures without cascading outages. Reactive Microservices Monitoring can not be a hindsight decision. Build your microservices with microservices monitoring in mind from the start. Make sure that the microservices lib that you use has monitoring of runtime statistics built in from the start. Make sure that is a core part of the microservices library. StatsD and Code Hale Statistics allow you to gather metrics in a standard way. Tools like Graphite, Kibana, DataDog and Banana help you understand the data, and build dashboards. QBit, the Java Microservices Library, includes a query-able stats service which feeds into StatsD/CodeHale Metrics. QBit can also be used to create reactive features to do rate limiting or spin up new nodes. With big data, data science, and microservices, monitoring microservices runtime stats is required to know your application users, know your partners, know what your system will do under load, etc.