High-Speed Microservices

Microservices Architecture | High-Speed Microservices

This article endeavors to explain high-speed microservices architecture. If you are unfamiliar with the term microservices, you may want to first read this blog post on microservices by Michael Brunton and if have more time on your hands this one by James Lewis and Martin Fowler. 

High-speed microservices is a philosophy and set of patterns for building services that can readily back mobile and web applications at scale. It uses a scale up and out versus just a scale-out model to do more with less hardware.  A scale-up and out model uses in-memory operational data, efficient queue hand-off, and async calls to handle more calls on a single node.

In general, the cloud scale-out model, employs a sense of reckless abandon. If your app has performance issues, no problem spin up 100 servers. Still not fast enough. Try 1000 servers. This has a cost. This does not replace a cloud scale-out model per se. It just makes a cloud scale-out model more cost effective. Do more with less.

This is not to say that there are not advantages to the cloud scale-out model. This is to say that an ability to do more with less hardware has cost savings, and you can still scale out as needed in the cloud. This is nothing new per se. Actor systems like Akka, have been around for a while. Akka is a key ingredient to many high-speed microservices. 

The beauty of high-speed microservices is it gets back to OOP roots where data and logic live together in a cohesive understandable representation of the problem domain, and away from separation of data and logic. Since data lives with the service logic that operates on it. Also, less time is spent dealing with cache coherency issues as the services own or lease the data (own for a period of time). The code tends to be smaller to do the same things.

You can expect to write less code. You can expect the code you write to run faster.  To a true developer and software engineer, this is a boon. Algorithm speed matters again. It is not dust on the scale while you wait for a response from the database. This movement frees you to do more in less time and to have code that runs orders of magnitudes faster than typical IO bound cloud lemming services. 

There are many Java frameworks and libraries for microservices that you can use to build a high-speed microservice system.  Vertx, Akka, Kafka, Redis, Netty, Node.js, Go Channels, Twisted, QBit Java Microservice Lib, etc. are all great tools and technology stacks to build these types of services.  This article is not about any particular technology stack or programming language but a more abstract coverage of what it means to build these type of high-speed services devoid of language or technology stack preference.

The model described in this article is the inverse of how many applications are built. It is not uncommon to need 3 to 20 servers for a service where in a traditional system you might need 100s or even 1,000s. Your EC2 bill could be cut into 1/10th the cost for example. This is not just supposition but an actual observation.

In this model, you typically add extra services to enable failover support not to scale out per se. You will reduce the number of servers needed and your code base will be more coherent if you adopt these strategies.

You may have heard, keep your services stateless. We are recommending the opposite. Make your services own their operational data.

Attributes of High-speed services

High-speed services have the following attributes:
  • High-speed services are in-memory services
  • High-speed services do not block
  • High-speed services own their data
  • Scale out involves sharding services
  • Reliability is achieved by replicating service stores

An in-memory service is a service that runs in-memory. An in-memory service is non-blocking. An in-memory service can load its data from a central data store. In-memory services can load data it owns asynchronously and does not block. It continues to service other requests while the data is loading. It streams in data from its service store whenever leased data is not found, i.e., it faults the data into the services in streams. 

At first blush, it appears that an in-memory service can achieve its in-memory from using a cache. This is not the case. An in-memory service can use caches but and in-memory service owns its core data. Cached data is only from other services that own their data.

Single writer rule: Only one service at any point in time can edit service particular set of service data

In-memory services either own their data or own their data for a period of time. Owning the data for a period of time is a lease model.

Think of it this way. Data can only be written to by one service at any given point in time. Cache inconsistencies and cache control logic is the root of all evil. The best way to keep data in sync with caches is to never use caches or use them sparingly. It is better to use a service store that can keep up with your application vending the data as needed in a lease model. Or to create longer leases on service data to improve speed. More on leasing and service stores is described later. A key ingredient is some sort of persistent message queue or better yet a distributed transaction log to sit in front of calls of an in-memory service like Kafka (Kafka Microservices). 

Avoid the following:
  • Caching (use sparingly)
  • Blocking
  • Transactions
  • Databases for operational data

Embrace the following:

  • In-memory service data and data faulting
  • Sharding
  • Async callbacks
  • Replication / Batching / Remediations
  • Service Stores for operational data

Data faulting and data leasing seem a lot like caching. The key difference is ownership of data and the single writer principle.

Imagine a mobile app with a set of services that contains user data. The first call to any service checks to see if that users data is already loaded in the service. If the user data is not loaded into the service than the call from the mobile app is put into a queue and the call waits for the user data to get loaded asynchronously. The service continues to handle other calls and the service gets notified when the user data loads, and executes the call on the user data load event then. Since we can get many calls to load user data in a given second, we do not load each user one at a time but we load 100 user at a time or 1000 users at a time or we batch load all requests every 50ms (or both) of all user requests in that time. Loading the user data when it needed is called data faulting. Loading 1000 users at a time or all users in the last 50ms or all users since the last user load is called batching request. Batching requests is combining many requests into a single message to optimize IO throughput. Data faulting is the same way your OS loads disk segments into memory pages for virtual memory.

High speed services employ the following:

  • Timed/Size Batching
  • Callbacks
  • Call interception to enable data faulting from the service store
  • Data faulting for elasticity

Data ownership

The more data you can have in-memory the faster your services can run. Not all use cases and data fit this model. Some exceptions can be made. The more important principle is data ownership. This principle comes from the canonical definition of microservices.

In-memory is a means to an end. Mostly to facilitate non-blocking. The more important point is to have the service own the data instead of just being a view into shared data. The more important principle is the single writer principle and the avoidance of cache. 

Let's say that some data is historical data, and historical data rarely gets edited, but it does get edited. Then in this scenario it might make no sense at all to not load the historical data from a database and then update the database directly and skip the service store altogether since the usage is rare and unlikely to hamper the overall performance of the system. 

If size of the data is an issue remember that you can shard the services and you can also fault data into a service server in batches or streams. These two vectors should allow most if not all of the operational data to be loaded into memory and enable the single writer principle. Think of this as more of the Pareto principle. You don't need all. You just need the set of data in-memory that is going to give you the SLA that you need. All would be nice. But you can only have 20% of the data that is faulted in and still have a really fast system.

A lease can be 1/2 hour, 8 hours, or some other period of time. Once the lease has expired which could be based on the last time that service data was used, then the data just waits in the service store. 

Why Lease? Why not just own?

Why not just own data out right. Well you can if the service data is small enough. Leasing data provides a level of elasticity. This allows you to spin up more nodes. If you optimize and tune the data load from the service store to the service then loading users data becomes trivial and very performant. The faster and more trivial the data fault loading, the shorter you can lease the data and the more elastic your services are. In like manner services can save data periodically to the service store or even keep data in a fault tolerant local store (store and forward) and update the service store in batches to accommodate speed and throughput. Leasing data is not the same as getting data from the cache. In the lease model only one node can edit the data at any given time.

Service Sharding / Service Routing

Elasticity is achieved through leasing and sharding. A service server node owns service data for a period of time. All calls for that users data is made to that server. In front of a series of service servers is a service router. A service router could be an F5 (network load balancer) that maintains server/user affinity through an HTTP header. A service router could be a more complex entity that knows more about the problem domain and knows how to route calls to other back end services.

Fault tolerance

The more important the data and the more replication and synchronization that needs to be done. The more important the data the more resources that are needed to ensure data safety.

If a service node goes down, a service router can select another service node to do that work from the service discovery. The service data will be loaded in an async/data faulting/batch. If the service was sending updates to changes then no state is lost except the state that was not sent to the service store since the last update. The more important the state/data, the more synchronization that should be done when the data is modified. For example, the service store can send an async confirmation of a save, the service could enqueue a response to the client. The client or service tier could opt to add retry logic if it does not get a response from the server. You can also replicate calls to services. You can also create a local store and forward for important calls. 

Fault tolerance, service router and service discovery are essential to building a reactive Java microservices architecture.

Service Store

The primary store for a high speed service system is a service store. A service store can treat the service data as opaque. A service store is not a database. A service store is not a cache either. A service store may also keep the data in-memory. The primary function of a service store is vending data quickly to services that are faulting data in. 

A services store also takes care of data replication for data safety and safe storage. A service store should be able to bulk save data (stream saves) and bulk load data (stream loads) to/from a service and to/from replicas. A service store like the service itself should never block. Responses are sent asynchronously. WebSocket or sockets are a great mechanisms to send responses from a service store to a given number of services. JSON or some form of binary JSON is a good transport and storage mechanism for a service store.

Service stores are elastic and typically sharded but not as elastic as service servers. Service stores employ replication and synchronization to limit data loss. Service stores are special servers so the rest of your application can be elastic and more fault tolerant. It is typical to over provision service stores to allow for a particular span of growth. Adding new nodes and setting up replication is more deliberate than it is with services. The service store and the leasing model is what enables the services to be elastic.

By special servers, we mean service store servers might use special hardware like disk level replication and servers which might employ additional monitoring. All service data saved to a service store should be saved in at least two servers. There is a certain level of replication that is expected. Service stores may also keep a transaction log so that others processes can follow the log and update databases for querying and reporting.

In high-speed services, databases are only for reporting, long term storage, backup, etc. All operational data is kept and vended out of the service stores which maintain their own replication and backups for recovery. All modifications to data is done by services. Service stores typically use JSON or some other standard data format for long term storage for both the transaction logs for storage into secondary databases.

A service store is the polar opposite of Big Data. A service store is just operational data. One could tail the transaction logs to create Big Data.

Active Objects / Threading model:

To minimize complex synchronization code that can become a bottle neck one should employ some form of Active Objects pattern for stateful, high-speed services.  One could use an event bus system like Vertx or Node.js or an Actor system like Akka or GO channels or Python Twisted and build their own Active Object service system. Queues and messaging are essential if you want to handle back pressure and create reactive applications.

The active object pattern separates method execution from method invocation for objects that each reside in their own thread of control (or in the case of QBit for a group of objects that are in the same thread of control).

The goal of Active Objects is to introduce concurrency, by using asynchronous method invocation and a scheduler for handling requests. The scheduler could be a resumable thread draining a queue of method calls. This scheduler could also check to see if data needed for this call was already loaded into the system and fault data in from the service store into the running service before the call is made.

The Active Object pattern consists of six elements:

  1. A client proxy to provide an interface for clients. The client proxy can be local (local client proxy) or remote (remote client proxy).
  2. An interface which defines the method request on an active object.
  3. A queue of pending method requests from clients.
  4. A scheduler, which decides which request to execute next which could for example delay invocation until service data if faulted in or which could reorder method calls based on priority or which could work with several related services from one scheduler allowing said services to make local non-enqueued calls to each other.
  5. The implementation of the active object methods. Contains your code.
  6. service callback for the client to receive the result.

Developing high-speed microservices (tools needed)

Docker, Rocket, Vagrant, EC2, boto, Chef, Puppet, testing, perf testing. 

Service discovery and health

Consul, etcd, Zookeeper, Nagios, Sensu, SmartStack, Serf, StatsD.

Similarities to plain microservices?

Data ownership, standalone process, container which has all parts needed, docker is the new war file, etc.

What makes high speed microservices different from plain microservices?

Async, Non-blocking, more focus on data ownership and data faulting.
More focus on being reactive.


Service Store : Sharded fault tolerant opaque storage of service data. The service store enables services to be elastic.
Service Server : A server that hosts one or more services.
High Speed Service: An in-memory high speed service that is non-blocking that owns its service data.
Database: Something that does reporting or long term backups for a high speed service system. A database never holds operational data (there is an uncommon exception to this rule beyond the scope of this article).
Service Router:  The first tier of servers which decide where to route calls to which services based on sharding rules which can be simple or complex. Simple rules could be handled by ha_proxy or an F5. Complex rules can be handled by service routing tier.

Rick Hightower, the author of this blog post is the primary author of QBit and Boon. Boon is a very fast JSON parser, string parsing, and reflection lib. QBit is a fast in-memory service queue lib that employs Actor-like Active Objects.

200 M message in-proc per second, 1 M RPC calls per second = QBit

The above shows the high-speed nature of Boon and QBit.
I put them there to pique your interest.

QBit is a lot more than just raw speed.

Learn more about QBit:

  • [Detailed Tutorial] QBit microservice example
  • [Doc] Queue Callbacks for QBit queue based services
  • [Quick Start] Building a simple Rest web microservice server with QBit
  • [Quick Start] Building a TODO web microservice client with QBit
  • [Quick Start] Building a TODO web microservice server with QBit
  • [Quick Start] Building boon for the QBit microservice engine
  • [Quick Start] Building QBit the microservice lib for Java
  • [Rough Cut] Delivering up Single Page Applications from QBit Java JSON Microservice lib
  • [Rough Cut] Working with event bus for QBit the microservice engine
  • [Rough Cut] Working with inproc MicroServices
  • [Rough Cut] Working with private event bus for inproc microservices
  • Comments