Google SWE teaches systems design | EP26: Redis and Memcached Explained (While Drunk?)
Summary
TLDRIn this candid video, the speaker, despite being inebriated, dives into the technical differences between Redis and Memcached, two popular caching solutions for large-scale distributed systems. Redis, an enhanced version of Memcached, offers additional features like data persistence, transactions, and support for various data types. The video also touches on Memcached's lack of built-in replication and Redis's high availability through its cluster feature. The speaker emphasizes the importance of choosing the right tool based on specific system requirements, highlighting the trade-offs between flexibility and built-in functionality.
Takeaways
- 🕒 The video was recorded at 1:39 a.m. on a Saturday by a speaker who was intoxicated, taking advantage of having the house to themselves.
- 📹 The speaker initially recorded a video while drunk but decided to re-record it the next morning for clarity.
- 🔑 Both Memcached and Redis are used for caching in large-scale distributed systems, primarily because they store data in RAM for faster access.
- 🔄 Memcached allows building a distributed hash map but nodes are unaware of each other, requiring consistent hashing to route requests correctly.
- 🚫 Memcached lacks built-in failure handling, replication, or availability measures, often requiring custom solutions like Facebook's 'gutter rat' instance strategy.
- 🌟 Redis offers more features than Memcached, including support for various data types, transactions, range queries, and disk persistence.
- 💾 Redis supports disk persistence through checkpointing or write-ahead logging, offering different trade-offs between speed and data integrity.
- 🔄 Redis Cluster provides high availability and consistency through replication and a gossip protocol for node communication and failover.
- 🔑 Redis uses a fixed number of partitions (16,384) to ensure even load distribution and avoid hotspots in a distributed setup.
- 🤔 Choosing between Redis and Memcached depends on specific needs; Redis offers more out-of-the-box features, but Memcached allows for more customized solutions.
- 💡 The importance of understanding the differences between Redis and Memcached is highlighted for making informed decisions in system design and interviews.
Q & A
What are the primary use cases for Redis and Memcached?
-Redis and Memcached are primarily used for caching in large-scale distributed systems. They are beneficial because they store data in RAM, which allows for the creation of a distributed hash map with fast access times.
Why is RAM used for caching in Redis and Memcached?
-RAM is used because it provides faster access times compared to databases on disk. This is crucial for high-performance caching in distributed systems.
What is the main difference between Memcached and Redis?
-Memcached is more limited in features compared to Redis. While both can build a distributed hash map, Redis offers additional features such as support for various data types, transactions, range queries, and disk persistence.
What is consistent hashing and how is it used in Memcached?
-Consistent hashing is a technique used to distribute keys across multiple nodes in a way that minimizes reorganization when nodes are added or removed. In Memcached, it helps in directing requests to the correct node.
What is the LRU (Least Recently Used) cache and its role in Memcached?
-LRU cache is an eviction policy used when a cache instance reaches its capacity. It removes the least recently used items to make room for new data. In Memcached, LRU helps manage memory by evicting old data when necessary.
How does Redis differ from Memcached in terms of data types and structures?
-Redis supports a variety of data types and structures, such as strings, lists, sets, and sorted sets, along with atomic operations. This is in contrast to Memcached, which is primarily a hash map from strings to strings.
What is the significance of transactions in Redis?
-Transactions in Redis allow multiple write operations to be executed serially and as an atomic unit on a single node. This ensures data integrity and consistency.
What is disk persistence in Redis and why is it important?
-Disk persistence in Redis is the ability to save the dataset to disk, which makes Redis more viable as a database. It helps prevent data loss and allows for data recovery in case of a system failure.
How does Redis Cluster provide high availability and consistency?
-Redis Cluster provides high availability through built-in replication and automatic failover. It ensures consistency through a fixed number of partitions and a gossip protocol that prevents split brain scenarios.
Why might someone choose Memcached over Redis despite its fewer features?
-One might choose Memcached over Redis if they require a simpler system or need to implement custom features such as strong consistency, alternate replication patterns, or a coordination service for partition management.
What is the importance of caching in large-scale distributed systems as illustrated by Facebook's use case?
-Caching is crucial in large-scale distributed systems to reduce load on databases and improve performance. Facebook's example shows that caching handles 99% of their read requests, demonstrating its significant impact on system efficiency.
Outlines
🍻 Drunk Tech Talk: Redis and Memcache Basics
In this humorous and candid video, the speaker, despite being inebriated, attempts to discuss Redis and Memcache, two popular caching solutions used in large-scale distributed systems. The speaker acknowledges their state and promises a more coherent re-recording the next day. They explain that both systems store data in RAM to create a distributed hash map for fast data access. The main difference is that Memcache is a simpler system without built-in replication or availability features, while Redis offers more complex data structures, transactions, and disk persistence options.
🔑 Redis vs. Memcache: Features and Use Cases
This paragraph delves deeper into the technical differences between Redis and Memcache. It explains that Memcache operates with a client library directing requests to the appropriate node using consistent hashing and an LRU cache for eviction policies. However, it lacks failure handling. Redis, on the other hand, is described as an enhanced version of Memcache with support for various data types, transactions, range queries, and disk persistence. The paragraph also touches on Redis Cluster for high availability and consistency, using a gossip protocol and fixed partitioning. The speaker concludes by discussing scenarios where one might prefer Memcache over Redis due to its simplicity and flexibility in custom implementations.
Mindmap
Keywords
💡Redis
💡Memcached
💡Caching
💡Distributed Hash Map
💡Consistent Hashing
💡LRU Cache
💡Replication
💡Failover
💡Gossip Protocol
💡Partitioning
💡Checkpointing
💡Write-Ahead Logging
Highlights
Redis and Memcache are both used for caching in large-scale distributed systems, storing data in RAM for fast access.
The speaker is recording while intoxicated, aiming to discuss Redis and Memcache despite the state.
The Balmer curve is mentioned as a way to gauge the speaker's ability to discuss coding topics after drinking.
Memcache allows building a distributed hash map with nodes unaware of each other, using consistent hashing.
Memcache uses LRU cache for eviction when necessary, lacking built-in failure handling or replication.
Facebook's approach to Memcache instance failure involves adding new instances without data replication.
Redis is a modification of Memcache with additional features, supporting various data types and atomic operations.
Redis supports transactions and range queries on a single partition, enhancing its capabilities over Memcache.
Redis offers disk persistence through checkpointing or write-ahead logging, increasing its viability as a database.
Redis Cluster provides high availability and consistency with replication and automatic failover.
Redis uses a gossip protocol for node communication and heartbeat sharing, crucial for failover processes.
Redis prevents split brain using a quorum of master nodes and epoch numbers during failover.
Redis employs fixed partitioning with 16,384 partitions, unlike other databases using consistent hashing.
Redis's built-in features can make it harder to diverge from its design pattern compared to Memcache.
The choice between Redis and Memcache depends on specific needs like strong consistency or replication patterns.
Caching is crucial for large-scale distributed systems, with Facebook handling 99% of read requests through cache.
The video concludes by emphasizing the importance of understanding the differences between Redis and Memcache.
Transcripts
hello everyone it is uh i think 1 39 a.m
on a saturday morning if you can't tell
i am actively hammered
why am i doing this right now it's
because my roommate is at his
girlfriend's house and uh i have the
free house to record so you know what
screw it let's get this done um
i'm gonna talk about redis and memcache
today and assuming my brain can support
it because i guess the balmer curve
tells me that i can have a couple drinks
and and you know talk about code then uh
hopefully this video will be coherent
that being said i'm also about eight or
nine drinks in so i'm a little bit past
the balmer curve but
you know what i can still talk about
computer science so let's get it done
hey yeah so this is me from the next
morning uh
i actually did record that video and uh
i looked back on it and i was like damn
i am slurring a lot of words right now
so
um we're gonna do a re-recording and uh
hopefully i'll explain a little bit more
concisely this time so let's do memcache
and redis
so basically in terms of what memcache
and redis are they're both solutions
that are typically used for caching in
large-scale distributed systems the
reason for this is that they generally
speaking store their data in ram random
access memory and that allows you to
basically build something called a
distributed hash map which can access
keys and set keys at of one time which
is great because obviously databases on
disk can't do that
as we can see though there are pretty
significant differences between the two
services themselves
mainly in the sense that
memcache is basically like a subset of
redis but you know sometimes it's good
to have a more limited feature set
because you can build out more
so
what is memcache like i said memcache
allows you to build a distributed hash
map amongst a bunch of nodes however the
nodes basically don't know about one
another so generally speaking you're
actually kind of just using this client
library to go ahead and wire all those
requests to the proper node how do we
wire each request to the right node well
generally speaking we're using
consistent hashing so basically you give
all of your you know application servers
that are going to be using memcache
instances a list of all the nodes that
are running memcached and then that will
allow them to create a consistent
hashing ring
additionally you have an lru cache so if
each memcache instance gets too big or
basically there's too much data in there
and you need to evict something in order
to make room for a new element you're
using the lru algorithm and then you
know a couple of other features that are
built in are
you know basically just like compare and
set um generally speaking there's not
really any failure handling for memcache
they don't have any built-in like
replication or availability measures so
um what actually a company like facebook
did and i'll link this lecture
from like mit basically in the video
description is that they use something
called like a gutter rattus instance
where every time um a red is not a redis
instance um a memcache instance failed
they basically go ahead and throw a new
memcache instance in there to just take
its place and then eventually it'll get
repopulated over time but there's no
like you know copying over of the data
from one instance to another which is
interesting
okay so what is redis well at least on a
single node
redis is a modification of memcache that
basically has the following features so
obviously it's an lru distributed hash
map but it's got some other stuff too so
instead of just being a hash map from
strings to strings you can actually have
other data types as values in the hash
maps so those can be sets
you know strings but with atomic
operations like appending to the string
maps and lists and you can also even
have transactions so if you want to make
multiple writes to a single node and
make sure that those are
executed both serially and also as an
atomic unit you can do that you can also
even make range queries on a single
partition and there's also a way to kind
of
go ahead and hash keys using only part
of the key
doing something called the hashtag which
allows you to kind of have some control
over where each key is going to be sent
to in a partitioning scheme
then finally there's also a concept of
actual disk persistence which makes
redis a little bit more viable as an
actual database for your application as
well and you can do disk persistence via
checkpointing which is probably faster
but obviously comes with the cost of
losing some rights if you don't
checkpoint everything or you can use a
write ahead log
where basically every single write is
going to the disk before it's written in
memory and this obviously comes at the
cost of slower writes
in terms of redis cluster redis cluster
is basically what redis calls
you know running a bunch of radis nodes
in a distributed manner so basically the
point here is that this provides both
high availability and consistency so how
do they provide high ability well unlike
memcache they actually support
replication out of the box so that uses
single leader replication with an
automatic failover
the thing with single leader replication
here is that some rights can be lost so
if the leader has some rights and it's
replicating them asynchronously just
some of its replicas and then the leader
goes ahead and fails before all of those
rights get properly sent out to all of
the followers then those rights are
probably just going to be lost um so how
do we actually do a failover well
there's a gossip protocol between all
the nodes where you're basically sharing
heartbeats that convey the you know kind
of the state of the node as well as the
partitions that it's holding and then uh
in order now to basically you know put a
new replica to the master what needs to
happen is that a quorum of masternodes
amongst all partitions need to basically
go ahead and agree that this new
follower is going to become the leader
and they use an epoch number to do that
and so this basically allows us to
prevent split brain because obviously a
quorum of nodes can't make a
conflicting decision for a single epoch
so that's pretty smart there how they
make sure to prevent split brain and
then finally
in order to kind of do partitioning
unlike a bunch of other database
solutions which most of which we've kind
of just seen using
consistent hashing so far this actually
uses the fixed number of partition
solution that i discussed from ddia
where there are exactly 16 384 fixed
partitions with fixed ranges and then
your job is basically to make sure that
you know you're putting the right number
of partitions on each node such that
there aren't too many hotspots and
obviously you know due to the fact that
there's replication if there are certain
hotkeys hopefully the load won't be too
great on them because you'll actually be
able to serve a lot of requests from
those replicas as opposed to just having
to serve them all from one instance
okay so in terms of a comparison between
memcached and redis
as you can see um redis is basically
just memcached with a bunch of
additional features built in out of the
box so why would you ever not want to
use it well
by virtue of having all these features
built in it makes it harder to kind of
diverge from that design pattern so
let's say you needed strongly consistent
data maybe you'd be better off just
using memcache and then kind of building
out your own system using something with
like a coordination service in order to
ensure strong consistency
maybe you want alternate replication
patterns like a leaderless replication
schema which you know kind of resembles
a dynamo database then perhaps you'd be
better off using memcache and then you
know kind of implementing that yourself
or maybe even you want to use just a
coordination service in general for all
of that partition management as opposed
to just using a gossip protocol
because gossip protocol even though it
does generally work you know it's just a
little bit harder to reason about
sometimes
you know you can do that too
so in conclusion both redis and memcache
are super useful systems for
implementing caching and in an interview
i imagine it probably won't really come
up if you just said yeah i'm going to
use redis or memcache in order to
implement my cache but i think it's
important to know the subtle differences
between the two of them because that's
kind of what this channel is all about
is being able to you know hear all of
the names of these different technology
services and basically go and say okay
well actually now i understand why one
is different from the other
obviously caching is hugely beneficial
for any large scale distributed system
and as a result of that you should
basically be using it whenever you can
until it pretty much prices you out so
you know as long as you can afford it
you should be using caching
that video i mentioned about caching at
facebook basically says that pretty much
99 of their read requests are handled by
their cash in order to take load off
their databases and allow them to keep
operating so okay i hope this video was
useful sorry i couldn't post the drunk
one but it was probably double as long
and very incoherent but uh have a good
one guys
تصفح المزيد من مقاطع الفيديو ذات الصلة
Database Design Tips | Choosing the Best Database in a System Design Interview
Airflow Vs. Dagster: The Full Breakdown!
Google SWE teaches systems design | EP21: Hadoop File System Design
Google SWE teaches systems design | EP23: Conflict-Free Replicated Data Types
Redis Tutorial for Beginners #1 - What is Redis?
Google SWE teaches systems design | EP20: Coordination Services
5.0 / 5 (0 votes)