Micro-services Resiliency Patterns
In Software application world, primarily talking about Java, there are different ways to improve the performance of an application, let’s keep that talk for another time, let’s talk about how to make the Services | API offerings performant even in the case of adverse situation or in other words making them ‘fault tolerant’.
Micro Services world, having the following patterns implementation makes the application resilient and performant to the failures which are often encountered in the production environments:
a. Rate Limiter
d. Circuit Breaker
Rate limiter Design pattern
Let’s cover a situation, when your service is overloaded. There are plenty of reasons why it might happen, sudden spike in traffic due to Stock Market fluctuation for example.
An application has its variable capacity. This value is dynamic and depends on multiple factors — such as how robust is application code, CPU configuration, Memory and its footprints, underline JVM configuration and OS etc.
What happens when load surpass capacity? When that happens:
1. Response time grows.
2. Memory, Garbage Collection footprint grows.
3. HTTP request have longer wait time or timeouts eventually.
Rate Limiter pattern comes in handy for this situation to insulate application from going under the water.
Their idea is to refuse some of the incoming load gracefully. This is how excessive load should be handled ideally “Limiter drops extra load above capacity “, thus lets application serve requests in compliance with SLA.
There are Two Types of Rate limiters:
Request rate limiter
Request rate limiter restricts each user to N requests per second. Request rate limiters implement a Quota system for each eligible user to effectively manage a high volume of traffic.
Concurrent requests limiter
An API may support a request rate of 500 TPS , Concurrent rate limiter says You can only have 500 API requests in progress at the same time.
Some endpoints are much more resource-intensive than others, and users often get frustrated waiting for the endpoint to return and then retry.These retries add more demand to the already overloaded resource, slowing things down even more. The concurrent rate limiter helps address this nicely.
Request Rate limiter is more widely used, but doesn’t provide as strong guarantees as concurrency limit does, so if you wish to choose one, stick with concurrency limit and here’s why.
When configuring rate limiter, we think we enforce “ This service can process N requests per second at any point of time ”
Retry Design Pattern
Transient faults could occur which include the loss of network connectivity for a moment to services, temporary unavailability of a service, or timeouts that during busy time. These faults are often self-healing, and if the action is repeated after a suitable short delay it is likely to succeed.
If your request failed — wait a bit and try again. That’s basically it, retrying makes sense, because network might degrade for a moment or the request was refused due to high traffic volume. Now, imagine having chain of micro services like that:
What happens if we set number of total attempts to 3 at every service and service D suddenly starts serving 100% of errors?
It will lead to a retry storm — a situation when every service in chain starts retrying their requests, therefore drastically amplifying total load, in this situation B will face 3x load, C — 9x and D — 27x traffic.
Redundancy is one of the key principles in achieving high-availability, but I doubt you would have enough free capacity on server instances of C and D . Setting total tries to 2 doesn’t help much either, plus it makes user experience worse on small blips.
- Distinguish “retryable exceptions” from ““non-retryable” one. It’s pointless to retry request, when user doesn’t have permissions or payload is not valid. Also, setting max retry count is a good practice.
- Adopt error budgeting — technique, when you stop making retries if rate of retryable errors exceeds threshold, e.g. if 20% of interactions with service D results in error, stop retrying it and try to degrade gracefully. Amount of errors might be tracked with rolling window over N last seconds.
Bulkhead Design Pattern
A bulkhead is used to partition a ship into sections, so that sections can be sealed off if there is a hull breach. You can use bulkheads similarly in software systems, to partition your system and protect it against cascading errors. Essentially, a bulkhead assigns limited resources to specific (groups of) clients, applications, operations, client endpoints, and so on.
In an application Bulkheads can set-up as per Client specification,
If in a cluster, multiple Clients are being hosted or assigning the resources to a function in an application.
For Example: if you are familiar with the model of Main Thread and Worker Thread concept, one can configure the Number of Worker Threads available in a Bulkhead and beyond that it will not handle the request.
If one happen to call a downstream service or an API whole SLA’s or TPS ( Transaction Per Second )are defined where as your application does not have the same gates defined, its often a good place to define a Semaphore based Bulkhead to shield the downstream system from unbounded request, plus, save your application from sinking, if the downstream calls happen to be blocking calls.
Circuit Breaker Design Pattern
Circuit breaker can be explained as a stricter version of error budgeting technique when errors rate is too high, function won’t be executed at all and will return fallback result, which could be like reading from the Cache , if it’s been configured.
A small proportion of requests needs be executed in order to understand if the other services have recovered or not. What we want is to give the other service a chance to recover without any manual work done and if the other service is behind load balancer, another application instance can take care of the request.
one could argue, that it doesn’t make sense to enable circuit breaker if function is on critical path, but bear in mind, that this short and controlled ‘outage’ is likely to prevent a big and uncontrollable one.
Timeout Design Pattern
Timeout is a specified period of time which is allowed to wait for some event to occur. With HTTP he timeout pattern is pretty straightforward and many HTTP clients have a default timeout configured. The goal is to avoid un-bounded waiting times for responses and thus treating every request as failed where no response was received within the either the configured or default configured timeout.
Also you want your timeouts to be high enough to allow slower responses to arrive but low enough to stop waiting for a response that is never going to arrive.
Thoughts for consideration
Implementing and running a reliable service is not an easy job considering use case complexities and taking reliability into the consideration. It takes a lot of efforts,planning and time to build a Unsinkable Ship.
Reliability has many levels and aspects, so it is important to find the best solution for your application. You should make “Reliability” a factor in your business decision processes and allocate enough budget and time for it.
- Dynamic environments and distributed systems — like microservices — lead to a higher chance of failures.
- Services should fail separately and should not bring down the application as a whole.
- Most the outages are caused by changes, reverting build is not a bad thing.
- Fail fast and independently. Teams have no control over their service dependencies.
- Architectural patterns and techniques like caching, bulkheads, circuit breakers and rate-limiters help to build reliable micro services.