The Consistent Hashing Algorithm: Balancing Load Efficiently in Distributed Systems

In the realm of distributed systems, managing the load across multiple servers is a critical challenge. Load balancers play a vital role in distributing client requests to servers effectively. As the number of clients increases, the load on the system rises, necessitating the addition of more servers. However, this can result in an uneven distribution of load among the servers, requiring a process to rebalance the load. This is where the Consistent Hashing algorithm comes to the rescue. In this article, we will explore how Consistent Hashing aids in load balancing and minimizes the need for re-allocations.

You can find an implementation of the Consistent Hashing algorithm, enhanced with an interactive example at the bottom of the page.

Load distribution

Load balancers act as intermediaries between clients and servers, evenly distributing the incoming client requests across a cluster of servers. As the number of clients grows, the load on the system increases, which can lead to performance degradation and potential bottlenecks. To handle the increased load, additional servers are added dynamically to the cluster.

Reassigning clients to different servers to achieve load balance is a delicate process. Frequent re-allocations not only introduce overhead but also disrupt the client-server mapping, potentially causing additional latency and complexity. Therefore, it is essential to minimize the number of re-allocations while ensuring a fair distribution of load.

Shortoming with regular hashing

In regular hashing, the hash function evenly distributes clients across the available servers based on their hashed values. However, when a new server is added or removed from the system, the entire client-server mapping can be affected. The introduction of a new server can cause a shift in the distribution of clients, requiring a re-allocation of every client to different servers to achieve a balanced load. Similarly, removing a server requires redistributing all the clients to the remaining servers. With regular hashing, even small changes in the number of servers can result in significant modifications to the client-server mappings. That is a problem.

The Power of Consistent Hashing

Consistent Hashing, an algorithm commonly employed in load balancers, provides an elegant solution to the load imbalance problem. Unlike traditional hashing techniques, Consistent Hashing minimizes the amount of reallocations, offering a more stable and scalable approach. In Consistent Hashing, both clients and servers are mapped onto a circular hash space, represented by a ring. Each server is associated with one or more positions on the ring, determined by applying a hash function to the server's identifier or address. Clients are also mapped onto the ring using the same hash function.

To determine the server for a given client, we place the client on the ring and search for the closest server by traversing the ring clockwise.

To add a new server, we calculate it's hash and place the server on the hash ring. Now the new server may have become the closest server for some clients. As illustrated below, the new server does not affect the allocation of clients on the other side of the ring. In Consistent Hashing, adding or removing a server affects only a limited portion of the ring.

Only the clients c5 and c6 need to be re-allocated.

Shortcomings of Consistent Hashing: Uneven distribution of servers

In the illustration above, the servers are depicted with evenly distributed distances between them, each covering an equal fraction of the ring. However, in real-world scenarios, the distances between servers may not be uniform, resulting in certain servers covering larger portions of the ring compared to others. This imbalance can lead to a higher load on servers that have a greater chance of having clients on their clockwise side.

To address this issue, Consistent Hashing incorporates the concept of virtual nodes. Rather than representing each physical server with a single node on the ring, multiple virtual nodes are assigned to each server.

By utilizing virtual nodes, load distribution becomes more balanced across the servers. Each physical server now handles multiple portions of the hash ring, allowing for a finer-grained distribution of load. As a result, the likelihood of overloaded or underutilized servers is reduced, and the load is more evenly distributed.

The use of virtual nodes mitigates the impact of uneven distances between servers, ensuring that no single server bears a disproportionately higher load. This enhanced load balancing mechanism contributes to improved system performance, as resources are optimally utilized across the cluster.

In summary, virtual nodes play a crucial role in Consistent Hashing by facilitating a better spread of load among servers. By representing each server with multiple virtual nodes on the hash ring, load imbalances caused by variations in distances between servers are effectively mitigated. The utilization of virtual nodes enhances the efficiency and fairness of load distribution, ultimately leading to a more stable and reliable distributed system. The interactive example below uses 20 virtual nodes per server.

Interactive Exmaple

You can find an implementation of the Consistent Hashing algorithm, enhanced with an interactive example here.