What is CONSISTENT HASHING and Where is it used?

Gaurav Sen

21 Apr 201810:50

Summary

TLDRThe video script delves into the concept of consistent hashing, a technique used for load balancing in distributed systems without disrupting local data when servers are added or removed. It explains how hashing request and server IDs to a ring structure ensures a uniform distribution of requests, minimizing the impact of server changes. The script also discusses the practical challenges of server distribution and introduces the idea of virtual servers through multiple hash functions to maintain load balance and efficiency.

Takeaways

🔄 The core issue discussed is not about load balancing but the dynamic addition and removal of servers which affects local data distribution.
📐 The concept of a 'ring' is introduced where requests are hashed based on their IDs and mapped onto positions in a circular data structure.
🔄 Servers are assigned IDs and hashed using a function to determine their position on the ring, ensuring a uniform distribution of requests.
🔄 A clockwise approach is used to find the nearest server to handle a request, ensuring a simple and effective load distribution method.
📈 The expected load factor for each server is ideally one, meaning each server handles roughly an equal number of requests on average.
🔧 The addition of a new server adjusts the load by redistributing requests to the nearest clockwise server, minimizing the impact on existing servers.
⚠️ The loss of a server results in its requests being redistributed to the next clockwise server, which can lead to a skewed load if there are few servers.
🛠 To address potential load imbalances, the concept of 'virtual servers' is introduced, using multiple hash functions to create multiple points for each server on the ring.
🔢 By using K hash functions, each server effectively has K points on the ring, greatly reducing the likelihood of skewed load distribution.
💡 The choice of K, such as log(N) or log(M), can significantly mitigate the risk of load imbalance, promoting a more uniform distribution of requests.
🌐 Consistent hashing is a widely used technique in distributed systems such as web caches and databases, offering flexibility and efficient load balancing.

Q & A

What is the primary issue discussed in the script regarding server management?
-The primary issue discussed is the problem of adding and removing servers, which changes the local data on each server, affecting load balancing.
What concept is introduced to address the problem of server data changes?
-The concept of consistent hashing is introduced, which uses a ring structure to map hashed request IDs to server IDs, ensuring a more uniform load distribution.
How does the ring structure in consistent hashing work?
-The ring structure maps hash values of both requests and server IDs to positions on a circular ring, allowing requests to be directed to the nearest server in a clockwise manner.
What is the significance of hashing server IDs in the consistent hashing algorithm?
-Hashing server IDs allows each server to have multiple points on the ring, which helps distribute the load more evenly and minimizes the impact of adding or removing servers.
Why is it important to have a uniform load distribution among servers?
-Uniform load distribution ensures that no single server is overwhelmed with requests, maintaining system performance and reliability.
What happens when a server is added in the consistent hashing model?
-When a server is added, it is mapped to a point on the ring, and requests that fall between the new server's point and the next server's point are reassigned to the new server, reducing the load on the adjacent servers.
What is the impact of losing a server in the consistent hashing model?
-Losing a server means that its requests are reassigned to the nearest clockwise server, which could potentially increase the load on that server, but the impact is minimized due to the distribution of points.
What is the role of virtual servers in the consistent hashing algorithm?
-Virtual servers, created by using multiple hash functions for each physical server, increase the number of points on the ring and further reduce the likelihood of skewed load distribution.
Why is it recommended to use multiple hash functions for each server?
-Using multiple hash functions for each server creates multiple points on the ring for each server, which helps to distribute the load more evenly and reduces the impact of server addition or removal.
How does consistent hashing provide flexibility in distributed systems?
-Consistent hashing provides flexibility by allowing for efficient load balancing and easy scalability, as servers can be added or removed with minimal impact on the overall system.
What practical considerations are there for implementing consistent hashing?
-Practical considerations include choosing the right number of hash functions (K), ensuring a sufficient number of servers to avoid skewed distributions, and handling edge cases where load might still become uneven.