====== Design patterns for scaling microservices applications, reports from the fields======
=== Advanced Software Design Essay-2 ===
=== Prepared by Tsaturyan Konstantin ===
==== Contents ====
[[arch:2024:design_patterns_for_scaling_microservices_applications#Introduction|Introduction]] \\
[[arch:2024:design_patterns_for_scaling_microservices_applications#Key_Design_Patterns_for_Scaling_Microservices|Key Design Patterns for Scaling Microservices]] \\
[[arch:2024:design_patterns_for_scaling_microservices_applications#Reports_from_the_fields|Reports from the fields]] \\
[[arch:2024:design_patterns_for_scaling_microservices_applications#Discussion|Discussion]] \\
[[arch:2024:design_patterns_for_scaling_microservices_applications#Conclusion|Conclusion]] \\
[[arch:2024:design_patterns_for_scaling_microservices_applications#References|References]] \\
===== Introduction =====
In today's digital landscape, applications are expected to handle rapid growth, changing user demands, and complex operational environments. This leads development teams to choose microservices architectures, where software is broken into smaller, independent units that can be developed, tested, and operate separately. However, services need to manage repidly growing traffic, maintain performance, and remain stable even when dependencies fail. Hence, as these systems grow, ensuring that they scale effectively becomes a serious challenge. To achieve these goals, developers rely on specific design patterns tailored to scaling microservices. These patterns have been refined over time through real-world projects and reports, offering valuable information on how to handle complexity and maintain reliable systems.
This essay focuses on how practical design patterns help scale microservices applications. It examines some of these patterns, considering case studies, and highlights what experts have learned from implementing them. By examining their experiences, it becomes clearer which strategies work well in certain contexts. Ultimately, this discussion aims to provide insights that guide technical teams through key design patterns for scaling microservice applications. \\
===== Key Design Patterns for Scaling Microservices =====
Certain design patterns have emerged since developers were creating microservices architectures, that should keep the systems responsive and stable. These patterns address various aspects, from managing communication between services to handling sudden increases in requests. Each pattern helps to solve some particular problem. So, understanding them makes it easier to choose the right approach for the problem arise. \\
\\
=== Circuit breaker ===
One widely known pattern is the Circuit Breaker [[[arch:2024:design_patterns_for_scaling_microservices_applications#References|[1][2]]], which helps prevent cascading failures when one service depends on another that has failed (became unavailable or slow). By applying a Circuit breaker for two services: the caller and a potentially failing resource, the system can quickly detect problems and stop making calls that are about to fail too. Instead, it returns an error message. This approach keeps the rest of the application stable and prevents minor issues from growing into major outages. \\
{{:arch:2024:circuit_breaker_pattern.png?direct&800|}} \\
Fig. 1 Circuit breaker pattern \\
Nithy, 2023. Microservice Architecture Design Patterns: The Secret to Building Scalable, Resilient, and Maintainable Microservices. [[[arch:2024:design_patterns_for_scaling_microservices_applications#References|[2]]] \\
\\
=== Bulkhead ===
Similar to the Circuit breaker idea is the Bulkhead pattern [[arch:2024:design_patterns_for_scaling_microservices_applications#References|[1]]], inspired by ship design. It limits the damage caused by one failing part of a system. By segmenting resources into separate areas, a problem in one section does not spread to others. While Circuit breaker is monitoring the system, Bulkhead divides the system resoruces and services into isolated pools, ensuring that if one of them is failed, that no shared resources would be affected by fail. \\
\\
=== Database-per-service ===
Another one common approach is to give each service its own dedicated database [[arch:2024:design_patterns_for_scaling_microservices_applications#References|[1][2]]]. This pattern defines strong boundaries, allowing developers to choose the best storage technology for each service. It also prevents changes in one service's data structure from breaking others. However, the problem of the data consistency across services may arise, since traditional transactions don't easily span multiple databases. Nevertheless, the database-per-service pattern leads to simpler codebases and clear ownership of data, making it easier to scale individual services. \\
\\
=== SAGA ===
As was stated before, distributed systems struggle with transactions that span more than one service or database. SAGA helps solve this by breaking a large transaction into a series of local actions [[arch:2024:design_patterns_for_scaling_microservices_applications#References|[1][2]]]. Each service performs its part and emits an event that triggers the next step. If anything fails, the system runs compensating transactions to undo the changes. SAGA is effective for keeping data synchronized when working in an environment where services must remain independent. This approach reduces the risk of partial updates causing confusion in data or even errors. \\
\\
=== CQRS ===
Command Query Responsibility Segregation (CQRS) separates reading and writing operations into different models [[arch:2024:design_patterns_for_scaling_microservices_applications#References|[1][2]]]. In other words, commands update data, while queries simply retrieve it. By doing this, each side can evolve independently, allowing for better performance under high read loads and simpler scaling strategies for write operations. CQRS can boost performance and clarity, especially when certain services read or write far more often than they do the opposite. \\
{{:arch:2024:cqrs_pattern.png?direct&800|}} \\
Fig. 2 CQRS pattern \\
Nithy, 2023. Microservice Architecture Design Patterns: The Secret to Building Scalable, Resilient, and Maintainable Microservices. [[[arch:2024:design_patterns_for_scaling_microservices_applications#References|[2]]] \\
\\
=== API Gateway ===
As microservices proliferate, clients may struggle to know which service to call directly. That's where an API Gateway comes in. Acting as an entry poiny, the gateway receives client requests and routes them to the correct microservices, possibly combining data before sending a single response [[arch:2024:design_patterns_for_scaling_microservices_applications#References|[1][2]]]. This pattern helps simplify the client-side experience, while giving developers a central place to implement security, rate limiting, and request transformations. Though it adds another layer to the architecture, the API Gateway results in better control over how traffic flows through the system. \\
{{:arch:2024:api_gateway_pattern.png?direct&800|}} \\
Fig. 3 API Gateway pattern \\
Nithy, 2023. Microservice Architecture Design Patterns: The Secret to Building Scalable, Resilient, and Maintainable Microservices. [[[arch:2024:design_patterns_for_scaling_microservices_applications#References|[2]]] \\
\\
These patterns are essemtial toolkit for developers and architects. Real-world application show that systems utilizing these patterns experience fewer outages, shorter recovery times, and improved user satisfaction. Over time, these expeirence has led the community toward proven solutions. By selecting the right patterns, teams can create systems that evolve gracefully as demands shifts, workload grows, and technologies change. Each pattern offers a piece for the overall system, and when combined together effectively, they form a solid foundation for long-term scalability in microservices applications. \\
===== Reports from the fields =====
Spotify and Netflix both offer valuable insights into how Circuit Breaker pattern improves large-scale microservices. In Spotify's case, engineers had to manage thousands of servers scattered across the globe what caused latency issues. Each incoming request could trigger several subrequests as when a user logs in, their profile triggers a query to retrieve personal data, advertising preferences and so on. A single failure in this chain risked spreading errors throughout the system. Initially, Spotify relied on precise exception handling and error codes to mitigate such problems, but they still encountered issues like memory leaks. Eventually, Spotify adopted a fast-fail strategy with Circuit Breakers, which monitor response times and error rates [[arch:2024:design_patterns_for_scaling_microservices_applications#References|[3]]]. If a particular service is deemed unhealthy on defined by developer thresholds - the breaker switches and immediately halts further calls to that service. This prevents the rest of the platform from being affected by one malfunctioning component. Once the Circuit Breaker sees that the problematic service has recovered, it reconnects. This change improved resiliency and allowed Spotify's backend to focus on healthy services during partial failures, increasing overall stability. \\
Netflix, which handles over a billion daily incoming calls and billions more outgoing calls to internal subsystems, relies on Circuit Breakers too [[arch:2024:design_patterns_for_scaling_microservices_applications#References|[4]]]. When any internal dependency starts slowing down or failing, Circuit Breakers preserve other threads from taking down. Netflix uses this mechanism with fallback logic, returning cached or default data rather than no response. So that users experience minimal disruption while the failing system recovers. \\
Also Netflix uses Zuul as an API Gateway to act as a entry point for all external requests from various different devices into its ecosystem [[arch:2024:design_patterns_for_scaling_microservices_applications#References|[5]]]. This service intercepts traffic, applies various filters for authentication, dynamic routing, or load balancing, and then forwards the request to the relevant microservices. By centralizing and simplifying the routing logic, Netflix can rapidly roll out new features, test different loads, and update configurations without changing every service. Combined with Circuit Breakers, this approach significantly reduces the probability of widespread downtime. \\
===== Discussion =====
From Spotify's global server network to Netflix's billions daily requests, these real-world stories illustrate how Circuit Breakers and an API Gateway can reinforce microservices architectures. This fail-fast model reduces the risk of blocked threads, memory leaks, and system-wide slowdowns. Equally important is the concept of a fallback response, which lets a service degrade gracefully rather than fail completely. \\
Meanwhile, Netflix's use of the Zuul API Gateway shows advantages of centralizing traffic control. Instead of scattering authentication, logging, and routing logic across multiple services, an edge layer can quickly adapt to new demands or failures. \\
In total, not only these patterns, but every one of them illustrates how thoughtful architectural choices limit the impact of failures and make it easier to scale and evolve the underlying systems. \\
===== Conclusion =====
By examining the field reports from Spotify and Netflix, we see how design patterns such as Circuit Breakers and API Gateways improve modern microservices. Moreover, other patterns like Database-per-service and CQRS give each component autonomy to grow without risking the entire system. Together, these strategies help build scalable, flexible apps that adapt to evolving demands, reduce downtime, and minimize cascading failures. In doing so, design patterns for scaling microservices not only protect core functionalities but also foster rapid innovation, making software more resilient and future-proof. \\
===== References =====
1. Atlassian - Microservices design patterns for DevOps teams. https://www.atlassian.com/microservices/cloud-computing/microservices-design-patterns \\
2. Nithy, 2023. Microservice Architecture Design Patterns: The Secret to Building Scalable, Resilient, and Maintainable Microservices. https://gnithyanantham.medium.com/microservice-architecture-design-patterns-the-secret-to-building-scalable-resilient-and-82747886c017 \\
3. Hiren Dhaduk, 2022. Top 10 Microservices Design Patterns that Every Developer Should Know. https://www.simform.com/blog/microservice-design-patterns/ \\
4. Netflix Technology Blog, Ben Christensen, 2012. Fault Tolerance in a High Volume, Distributed System. https://netflixtechblog.com/fault-tolerance-in-a-high-volume-distributed-system-91ab4faae74a \\
5. Netflix Technology Blog, Mikey Cohen, Matt Hawthorne, 2012. Announcing Zuul: Edge Service in the Cloud. https://netflixtechblog.com/announcing-zuul-edge-service-in-the-cloud-ab3af5be08ee \\