A comparison of load balancing approaches in k8s clusters on-cloud

by Askretkova Valentina, vdaskretkova@edu.hse.ru

Kubernetes is a widely proven instrument for the creation of failover scalable clusters. This software provides a wide range of tools for detailed configuration and balancing of the software deployment environment, from configuring the memory and disk bandwidth settings to dynamically scaling these settings based on the number and load of applications running to the regions to which the node has access. However, in light of the prevalence of cloud-based infrastructure services, the focus of this essay is directed towards the examination and comparison of cluster balancing tools that impact the software aspect of the cluster.

The mechanism of scaling Kubernetes is based on the concept of nodes and pods. Each pod can be automatically lifted by the system on a fall due to the presence of an additional “/pause” service container [1]. This container contains environment variables and network parameters of the pod set. Its functioning lies at the heart of fault tolerance and scaling the Kubernetes cluster.

As Gianluca Turin depicted in [2], each node on the Kubernetes network has its own IP address and can communicate with all other nodes via a flat network without NAT. The Kubernetes system requires that the IP address identified by the node as its own and the IP address external to the node in the Kubernetes network be matched. However, the system administrator is responsible for setting up this interaction. In the following sections, methods for implementing this mechanism and nodes balancing both within and outside the cluster will be examined and compared, utilizing various types of services, including Ingress.

For stable cluster performance, due to the reasons described above, nodes require a long-lived address and port. The obligation to ensure their availability is assumed by the entity called the “service”. A multitude of types configuration options enables the balancing of incoming and outgoing traffic across various levels of the cluster.

The first type of service is NodeIP It serves as the default option, generating a distinct IP address exclusively visible within the cluster. It cannot be used outside the cluster unless you create and update routing tables from all the possible clients up to the K8s Nodes to route the requests to the cluster. This service type is solely suitable for load balancing among nodes internal to the cluster and forms the foundation for more complex scaling policies. This option is not applicable per se in the product environment.

The second type of service is NodePort. This service type allows you to map the list of ports from “the .spec.ports[*]” configuration option to the physical nodes ports. This mapping applies to all nodes in the cluster. NodePort permits the exposure of Kubernetes-created ports to an external network, serving as a dataset for an external proxy.. In this way NodePort allows the most flexible configuration for accessing services from an external network, especially when using external balancers.

This option can be used if you have an existing proxy with an access, balancing, and security policy configured. This option also provides L3 balancing, which is faster than balancing at the view level. Thus, if the service needs to be configured with the L3 mechanisms and there is an external proxy responsible for the company’s balancing policy, NodePort is a suitable option. However, this option entails complete responsibility for balancing on the external service and does not leverage Kubernetes' internal tools to monitor the status of pods.

The third type of balancing service is LoadBalancer. This service provides ready-made reverse-proxy mechanisms and predefined load balancing algorithms. By default, it contains the following balancing algorithms for incoming requests:

  • Round Robin: This method is used by default and evenly distributes requests among active pods. It is applicable in situations involving identical containers, uniform network traffic, and consistent response times. Round Robin can be employed in development environments or during the product creation stage, where strict load balancing is not critical. However, it becomes impractical later as the real-world conditions necessary for its successful operation are rarely achievable. It also offers a variant supporting sticky sessions for handling prolonged sessions.
  • Ring Hash: Based on computing a hash using the service key, this method allows the management of connections between multiple services. It imposes an additional load on hash computation. This algorithm is suitable for relatively small cluster topologies; otherwise, it generates lookup tables that do not fit into the processor cache, resulting in a sharp increase in system latency.
  • Maglev: Operating on a principle similar to ring hash, Maglev consumes less memory and performs hash calculations more quickly. However, a significant drawback is the need to generate a lookup table in the event of node failure. Thus, the algorithm is only suitable for long-lived nodes, which somewhat contradicts the design principles of Kubernetes as shown by Maria Toeroe in [6]. Personally, I’ve seen usage of this option only for the custom firewall inside cluster.
  • Least Connection: Redirects requests to the node with the fewest active connections at the moment. It is applicable for systems handling lots of lightweight requests with quick response times but is unsuitable for machine learning and statistics.
  • Least Response Time: Redirects requests to the node with the fastest response time at the moment, taking into account the number of active connections. Amit Dua in [7] showed that this algorithm may be well-suited for systems with an approximately linear dependence of response time on the number of processed requests.
  • Least Bandwidth: Redirects requests to the node with the lowest network utilization. Applicable for systems operating with streaming traffic. It's important to note that this algorithm may not perform well if servers have different computational and storage capabilities, as it assumes that all servers can handle an equivalent load. I’ve found this option to be suitable for log collecting services.
  • Token-based Aware Algorithm: This algorithm operates on a set of “tokens” representing a quota for the number of requests to a group of servers. Essentially implementing the “Leaky bucket” pattern, widely used in unstable networks or fluctuating loads. It considers the number of active connections and node performance, helping to prevent traffic spikes and DDoS attacks on the service. Due to this, it is frequently used in real-life clusters.

It is worth mentioning the use of LoadBalancer with custom third-party tools. Examples of such tools include “metalB” [3] and “inlets” [4]. This load balancing strategy allows the configuration of a load distribution policy for the service, taking into account any custom metrics of applications. Some of implementations, such as described in [5] include even resource consumption prediction functionality to further increase optimization of tools using custom metrics.It also provides access to all protocols that the third-party tool can implement, significantly expanding the range of load balancing options from a network perspective.

This service type enables detailed configuration of load distribution and access policies to the node set. However, it results in the creation of a separate LoadBalancer for each service, incurring additional costs for cluster maintenance.

Thus, LoadBalancer stands as an expansive, flexible, and customizable tool suitable for balancing traffic within a cluster as well as at the cluster boundary. It proves effective for medium-scale topologies or those with relatively few heterogeneous services. However, its functionality may fall short when configuring multi-service balancing policies.

Unlike the aforementioned examples, Ingress itself is not a service. Instead, it serves as the proxy for multiple services, acting as an “intelligent router” or the entry point into the cluster. This tool furnishes an API contract, under which the implementation of the load balancer itself operates (e.g., Nginx, HAProxy, AWS ALB, Istio, etc.).

This mechanism supports plugins that allow the system administrator to configure traffic filtering and balancing using the set of restrictions required: from network SSL certificate management to domain name grouping and package inspection.

The balancing and filtering of traffic occur through a set of rules applied to all incoming requests. These rules can be combined with network policies, providing high-performance mechanisms for both subnet isolation and load balancing, as demonstrated in experiments conducted by Gerald Bidigiri in [8]. This implies that, with Ingress, one can effortlessly establish rules for routing traffic without the need to create multiple Load Balancers or expose each service on the node. Consequently, it emerges as the optimal choice for deployment in production environments.

Kubernetes is software with a rich toolkit designed to cover any requirements encountered in the creation of a scalable and resilient network. Load balancing algorithms are no exception.

If you have an external proxy connected to traffic analysis databases and security statistical tools like Suricata, ClusterIP may be suitable as a data source for use in the cluster access policy.

For clusters composed of a small number of services that do not require general management for individual groups, LoadBalancer is the preferred choice. It allows for quick and flexible configuration of access policies for each individual service, both externally and internally. When the need arises to establish group-based balancing policies between service groups, the Ingress mechanism with detailed customization becomes the optimal choice. This approach, while more labor-intensive, provides the full functionality of traffic balancing, including within the cluster, unlike the ClusterIP and external proxy options.


  1. Luksha Marko, “Kubernetes in action”, 2019, p. 166-195, 416-421.
  2. Gianluca Turin, “A formal model of the Kubernetes container framework”, https://link.springer.com/chapter/10.1007/978-3-030-61362-4_32
  3. G Turin, “Predicting resource consumption of Kubernetes container”, https://www.sciencedirect.com/science/article/pii/S0164121223001450
  4. Maria Toeroe, “Deploying Microservice Based Applications with Kubernetes: Experiments and Lessons Learned”, 2018, p.971-973
  5. Dua, S. Randive, A. Agarwal and N. Kumar, “Efficient Load balancing to serve Heterogeneous Requests in Clustered Systems using Kubernetes,” 2020 IEEE 17th Annual Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, 2020, pp. 1-2, doi: 10.1109/CCNC46108.2020.9045136.
  6. G. Budigiri, C. Baumann, J. T. Mühlberg, E. Truyen and W. Joosen, “Network Policies in Kubernetes: Performance Evaluation and Security Analysis,” 2021 Joint European Conference on Networks and Communications & 6G Summit (EuCNC/6G Summit), Porto, Portugal, 2021, pp. 407-412, doi: 0.1109/EuCNC/6GSummit51104.2021.9482526.