Distributed Systems in One Lesson by Tim Berglund
Summary
TLDRこのビデオは、分散システムの構築における諸問題と、それらを解決するためのオープンソースプロジェクトについて説明しています。ストレージ、コンピューティング、メッセージングの3つの側面からアプローチし、単一システムと比較して、分散システムではどのような課題が生じるかを詳しく解説しています。Cassandra、Spark、Kafkaなどのオープンソースプロジェクトが、それぞれの領域でどのように問題を克服しているかを紹介しています。分散システムの構築は困難を伴いますが、このビデオはユーモアを交えながら、明快で実践的な洞察を提供しています。
Takeaways
- 🔑 分散システムでは、単一のコンピューターシステムよりも多くの問題が発生する。ストレージ、コンピューティング、メッセージングの3つの側面でそれぞれ異なる課題がある。
- 📚 単一のマスターリレーショナルデータベースでは、スケーリングの問題に対処するために一貫性やデータモデルを犠牲にする必要がある。一方、Cassandraのようなスケーラブルなノンリレーショナルデータベースは、最初から分散システムを前提としている。
- 🧮 分散コンピューティングでは、MapReduceやSparkなどのフレームワークが用いられる。Sparkの方がMapReduceよりも開発者フレンドリーである。
- 💬 メッセージングシステムは、マイクロサービス間の疎結合を可能にする。ApacheKafkaは分散メッセージブローカーで、パーティショニングによりスケーリングが可能になる。
- 🌐 分散システムでは、一貫性と可用性とパーティション耐性の3つの特性のうち、2つしか同時に達成できない(CAP定理)。
- 🔄 分散システムでは、問題を解決すると別の問題が発生することがある。たとえば、データの複製によって一貫性の問題が生じる。
- 🌉 メッセージバスを徹底的に活用することで、アーキテクチャそのものが変わる可能性がある。サービスはイベントをそのままストリームとして処理できる。
- ⚖️ 単一サーバーでは簡単に実現できる機能(一貫性の保証、メッセージの順序保証など)を、分散システムでは実現するのが難しい。
- 🔩 分散システムでは、スケーリングのためにシャーディングを行うと、結合ができなくなるなどデータモデルに制約が生じる。
- ✨ 分散システムの構築は大変だが、Apacheのようなオープンソースプロジェクトがその課題を解決するのに役立つ。
Q & A
分散システムとは何ですか?
-分散システムとは、複数のコンピューターが1つのコンピューターとして振る舞うシステムのことです。分散システムには、多数のコンピューターから構成され、並行動作し、独立してシステムの一部が故障する可能性があるという3つの特徴があります。
分散システムを構築する際の主な課題は何ですか?
-分散システムを構築する際の主な課題は、データの保存、計算の分散、メッセージングの3つです。単一のシステムではこれらは比較的簡単に行えますが、分散環境下では非常に難しくなります。それぞれの分野で異なる問題が生じ、トレードオフの選択を強いられます。
データの保存における主な課題と対処方法は?
-主な課題は一貫性とスケーラビリティのトレードオフです。読み取り用レプリカを追加して一貫性を犠牲にする、シャーディングでスケーラビリティを確保しデータモデルを破棄する、などの対処が必要になります。キーバリューストアのCassandraは、一貫性レベルを調整できるなどの対処方法を提供しています。
分散計算の場合はどうですか?
-MapReduceはHadoopでよく使われていましたが、プログラミングモデルが非常に複雑でした。SparkはMapReduceよりも開発者フレンドリで、ストリーム処理にも対応しています。Kafkaはイベントストリームをベースにしたストリーミングのためのプラットフォームで、分散計算にも使えます。
分散メッセージングではどのような問題が発生しますか?
-単一サーバーのメッセージキューでは、メッセージの順序性や一度だけの配信が保証されますが、スケーラビリティと冗長性に欠けます。ApacheKafkaは分散メッセージブローカーで、パーティションとレプリケーションによりスケーラビリティと冗長性を実現しますが、グローバルな順序性は失われます。
分散システムを構築する際の一般的なアプローチは?
-一般的には、単一システムから徐々にコンポーネントを分散化し、必要に応じてトレードオフを選択しながらシステムを拡張していきます。ストレージ、コンピュート、メッセージングの各層で異なる分散ソリューションを採用し、マイクロサービスのようなアーキテクチャへと移行することが多くなっています。
マイクロサービスアーキテクチャにおけるメッセージングの役割は?
-メッセージングは、マイクロサービス間の疎結合な通信を実現する上で重要な役割を果たします。Kafkaなどのメッセージングプラットフォームにより、サービス間の依存関係を排除し、個別のデプロイと拡張が可能になります。また、イベントドリブンなアーキテクチャの実現にも役立ちます。
分散システムを構築する上での一般的なアドバイスは?
-分散システムには避けられない問題がつきものですが、単一システムを徐々に分散化していく過程で、トレードオフを注意深く検討し、適切なソリューションを選択することが重要です。オープンソースのツールやプラットフォームを上手く活用し、柔軟に対応することをお勧めします。
分散システムのCAPセオリームについて説明してください。
-CAPセオリームは、分散システムにおいて一貫性(Consistency)、可用性(Availability)、分割耐性(Partition Tolerance)の3つを同時に満たすことはできないことを示しています。ネットワーク分割が発生した場合、一貫性と可用性のどちらかを選択する必要があります。システム設計の際にこのトレードオフを意識することが重要です。
分散システムにおけるコンシステンシーについてもう少し詳しく説明してください。
-コンシステンシーとは、データの整合性を指します。単一システムの場合は常に書き込んだデータを読み取れますが、分散環境では複製によりデータの不整合が発生する可能性があります。Cassandraなどでは、読み取り/書き込み時のレプリカ数を調整することで、コンシステンシーのレベルをコントロールできます。最終的な一貫性と可用性のトレードオフが避けられません。
Outlines
👋 講演の冒頭、自己紹介と背景
講演者のTim Berglundが自己紹介し、今回の講演が分散システムについて包括的に説明することを前置きしています。分散システムは恐ろしいものだが、避けられない現実でもあると述べ、それでも簡単なシステムを選択することをお勧めしています。分散システムが抱える問題を3つの側面(ストレージ、計算、メッセージング)から説明し、各領域で利用されるオープンソースプロジェクトについても紹介すると宣言しています。
💾 分散ストレージの課題と対策
単一サーバーの関係データベースから規模を拡大していく過程で発生する問題について説明しています。リードレプリケーションでスケーリングすると整合性が失われ、さらにシャーディングを行うと結合ができなくなるなど、データモデルが壊れていきます。これらの問題に対処するために作られた分散ストレージ製品であるCassandraについて、そのアーキテクチャやデータモデル、整合性の扱い方を解説しています。分散システムでは1つの問題を解決すると別の問題が生じることを指摘し、CAPセオリームについても簡単に触れています。
💻 分散計算の課題と対策
単一サーバー/プロセッサでの計算は容易ですが、分散環境では困難さが増すことを説明しています。MapReduceの仕組みと課題を解説した上で、Hadoopが抱えていた問題点を指摘し、その後継としてのApache Sparkの特徴を紹介しています。さらに、ストリーミングデータに対する分散計算の枠組みとしてApache Kafkaについても言及しています。要するにデータの在処を意識せずに分散計算を行える環境が重要だと主張しています。
📩 分散メッセージングの課題と対策
メッセージングはマイクロサービスの普及とともに注目を集めるようになったと述べた上で、その基本的な概念やメリットについて解説しています。単一サーバーでは簡単だったメッセージ配信の保証などが分散環境では難しくなることを指摘し、そうした課題を解決するソリューションとしてApache Kafkaを取り上げています。さらに、分散メッセージングを徹底的に活用することで、イベントストリームをシステムの中心におき、サービスの疎結合化が実現できると説明しています。
🚀 まとめとプラグ
講演の終盤では、ストレージ、計算、メッセージングの3つの側面から分散システムに伴う課題とオープンソースによる対処法を概説しました。分散システムは恐ろしいものの、避けられない現実でもあり、その中でどう上手く活用していくかが重要だと述べています。最後に自身のオンラインコースやKafkaの地域コミュニティについての宣伝を行い、講演を締めくくっています。
Mindmap
Keywords
💡分散システム
💡ストレージ
💡計算
💡メッセージング
💡一貫性
💡スケーラビリティ
💡マイクロサービス
💡キューイングシステム
💡パーティショニング
💡ベクタークロック
Highlights
Distributed systems have three characteristics: made up of lots of computers, operate concurrently, and don't share a global clock.
In a single server database, reads are usually cheaper than writes, so read replication is used to scale reads, but this creates an eventually consistent system.
Sharding a database can help scale writes, but it breaks the data model and ability to join across shards.
Cassandra uses consistent hashing to distribute data across nodes and allows tuning consistency on a per-request basis.
The CAP theorem states that in a distributed system, you can only have two of three properties: consistency, availability, and partition tolerance.
MapReduce funnels all computation through map and reduce functions, which is a terrible thing to do, but it was useful for large distributed data.
Spark provides a much more developer-friendly way to do distributed computation than MapReduce, with abstractions like RDDs and Datasets.
Kafka treats everything as a stream, allowing computation on data in flight without the need for a cluster.
Messaging enables loose coupling of systems, which is useful for microservices communication.
Kafka partitions topics across brokers, maintaining ordering within partitions but losing global ordering across the topic.
The lambda architecture emerged as an anti-pattern, forcing developers to write the same code twice for batch and stream processing.
Treating everything as an event stream in Kafka allows processing in-place and decoupling services from production and consumption of events.
Moving from a single server relational paradigm to distributed systems like Cassandra requires rethinking consistency, data modeling, and system architecture.
The speaker encourages attending Kafka meetups to learn more about distributed messaging and streaming.
The speaker makes a shameless plug for his O'Reilly video on distributed systems, which goes into more detail on the topics covered in this talk.
Transcripts
all right
i'm on
hey that's convenient
good afternoon everybody
time to get started
i am tim berglund
this is distributed systems in one
lesson
so these guys realize that's not what
they wanted to be
uh
see you guys
if you're expecting to be in distributed
systems in one lesson then you are in
the right place
um
a little bit about me i already said my
name i'm tim
as of a couple of months ago i work for
a company called confluent
and a lot of my attention is shifting
to kafka and that that whole ecosystem
that's what conflict does uh i'm the
director of developer experience there
so my team is in charge of documentation
and evangelism and making sure meetups
are staffed uh and training and all
those things so
uh it's not a big team yet
but we're hiring and it's an exciting
time to be
at confluent
also an exciting time to be here
i realized
just so you know i picked up a bottle of
water and i thought it was still water
and it turns out it is light sparkling
water and do you guys know
when you have a microphone
if you have a chance between a choice
between still water
and sparkling water
always go with the still water
so i might not drink very much of this
anyway we've got about 50 minutes to
spend together and i want to talk about
what a distributed system is
and why they are terrible
now this might seem strange coming from
a guy who makes his living in
distributed systems and has for some
years but that's kind of a theme of this
talk is you really
if there's anything i can do
to talk you out of building a
distributed system and choosing a
simpler life
i want to now i know i'm going to fail
i know you're going to do it anyway but
we'll talk about the various things that
go wrong when we try to do simple things
like store data compute data move data
from one place to another all of those
things are relatively easy
at small scale on a single computer when
we get multiple computers involved they
become terrible so i'm going to show you
the things that go wrong and for each
category of things we'll talk about uh
storage and computation and messaging
for each one of those things i'll try to
dive into
an open source
technology that addresses that problem
that's really what we're doing today
kind of an introduction to some topics
an introduction to some open source
technologies that solve the problems
that come up when we build distributed
systems a definition might be in order
andrew tanenbaum
defines a distributed system as a
collection of independent computers that
appear to its users
as one computer
so you might say something like
amazon.com
i guess it looks like one computer so
that would be a distributed system a
cassandra cluster
a kafka cluster all of these things at
one level of abstraction or another they
act like a single system but in fact
they're composed of many computers
now distributed systems have three
characteristics for something to qualify
as a distributed system it has to be
made up of lots of computers they have
to operate concurrently that's kind of
obvious right those computers are all
running at the same time
they might fail independently and the
possibility of one of our machines
failing is
uh a specter that is always looming over
us we always have to deal with that and
here is those are frankly are two
trivial things this last one is a little
subtle and this is what makes things
interesting uh the computers don't share
a global clock so if you ever have a
collection of processors that are in
some sense synchronized
uh
that their clocks are synchronized uh
that technically is not going to
function as a distributed system and all
of the various terrible things that
happen when we build conventional
distributed systems some of those
terrible things won't happen if you have
a global clock but that's just not
practical that's not how we build real
systems in the real world so we have
independent clocks independent failure
and of course computers operating
concurrently that's what i mean
when i say distributed system so as i
said we will dive into three topics here
storage computation and messaging now in
the single server
single processor even single thread
version of these three things life is
grand these are more or less solved
problems and everybody knows how to do
them like everybody knows how to use a
single master relational database that's
a that's a fairly uh commodity skill
uh everybody knows how to do computation
in a single thread if you're a
programmer at all you know how to do
that but when we try to distribute these
problems more broadly life gets real
hard uh and like i said i'll look at
each one of those and it's it's my my
my goal here again is to convince you
don't do this
get an easier job
build a smaller system
live in the country
slower pace of life
i'm just looking at you i know you're
not going to do it i wish you would i'm
trying
i got i got a few minutes left to
convince you but let's let's see where i
get
also by way of a shameless plug there is
also an o'reilly video available on
safari with the same title distributed
systems in one lesson that's like three
or four hours long i go into more detail
uh
and yeah it's by me so that's the
shamelessness of the plug if you're
interested and want more
all right
let's talk about storage now to remind
you
single master storage the the pleasant
life you used to lead
the salad days the simpler times when
you have a database that exists on one
server it's very simple but
life gets hard life gets complicated and
if you could imagine the behavior of a
database
and if if you had like a slider where
you could turn the scale of the system
up and down
certain things happen to us as we crank
that scale slider up and the system gets
busier and dizzier
now typically in
in a lot of web systems there are more
reads than rights right there's more
read traffic than right traffic now
in
a relational database reads are usually
cheaper than writes so that's okay but
imagine as we're turning this scale
slider up at some point this single
server runs out of gas we've got all
these reads happening not as many writes
happening and we need to scale reads
what do we do we all know the answer we
do read replication okay that's easy
enough
here we'll take rights that go to the
master and we'll propagate them to uh
the two follower servers
and you're able to do reads against
those keep the rights going to the
master and you buy yourself some time
right as this scale slider is turning up
now you've broken something already
you've created a distributed system a
distributed database it's just not a
particularly intentional one yet
and
what is the thing that we have broken
we have broken consistency
when we were back in the simpler days
i can't say i didn't warn you uh anytime
i write to this database and then i read
from it i am always going to read the
thing that i wrote that is a guarantee
and um
you know for for any for any database
we're ever likely to use we're going to
read the thing i write now when i do to
go to read replication that's not
necessarily true i'm going to write here
and when i read here that read may not
have propagated yet so i now have an
eventually consistent uh database
accidentally
and
the the bummer here is that i may be
telling myself well wait this is a
relational database i've got
strong consistency guarantees it's okay
but i've broken that so a little bit of
life has gotten bad
also this only works for so long as i
continue to turn up that slider what
happens well i can keep adding follower
servers for reads and scale my reads as
much as i want but all of the writes
still go here and so i'm going to run
out of gas on the the master on the the
leader server
so then i do an even more terrible thing
and that is sharding
again this is not a new technique people
have been sharding relational databases
for approximately as long as there have
been relational databases but what i do
here is i will
find some key i'm using name here and i
basically just break up the database
into three pieces based on that key
so if i'm storing people's settings
based on their username let's just say i
will pick username as that key and put a
through f here f through n here and n
through z as my people say on the last
server i suppose i should say zed huh
no i'm going to keep saying z all right
and each one of these is its own little
read replicated cluster so i can
sort of make the database as big as i
want there but i've broken something
else i already broke consistency and now
kind of i've broken my data model if
this is a relational database and when
one scales like this it almost certainly
is a relational database
now for example i can't join across
shards i've got completely independent
databases from shard to shard
and that's a little bit of a bummer that
works sometimes
i have a talk later this afternoon where
we're going to talk about sharding as a
way of scaling an entire system
and in that case um
sharding a whole system sometimes i get
away with that real well sometimes i
don't and can be a bit of a bummer
anyway briefly that's just what you do
with
a database that's not built as a con as
a as a
as a distributed database uh that's how
you can make it get a little bit bigger
again the bad thing is certain behaviors
and guarantees and claims of a
relational database that i start off
with little by little i give those up in
fact it's even a little worse than that
um
there are other things if you can
imagine that scale slider going up and
up i have to do things with the topology
to address that i have to add computers
and make a distributed system i'll also
have to take things away from my data
model as i turn that slider up right so
suppose for example uh the
your database comes under pressure
latency is going up uh you're having
higher transactional volume and suddenly
reads are not performing properly
what are you going to do and what's the
one of the first things you'll do in
response to that
most people will say at this point
add an index
kind of an okay answer right maybe
that'll help maybe you're missing an
index that's good uh suppose you've
already done all the index optimizing
you can that's trivial uh after that
what you do is you you realize oh wait
the reason reeds are slow is because of
this wonderful relational model i'm
doing all this joining because my my da
my schema is properly normalized i have
to take joins away in order to make
reeds perform and how do you take joins
away
well you denormalize
which we all know is immoral
right that's what we're taught in school
so you actually become what you think is
a terrible person and you denormalize
because you your scale is forcing you
you know you're
just you have no choice
also
if your first answer was to add indexes
that is another pressure that you'll
come under on the right side right as as
right
right traffic increases
uh before you shard or even after you
shard you're likely to give up on
indexes which will then have further
pressure on you denormalizing your data
model down the line so all of the
wonderful things that you thought were
true about your database when you
started they're just gone they're taken
away from you by the time you have to
scale it and you end up with something
that doesn't really look all that
relational
by the time you're done so why not just
do it right and here is an approach that
is taken by uh really i'll be talking in
terms of cassandra because it's a
database that i know fairly well uh but
this is sort of
if you look at at the the
scalable non-relational databases that
family of products they all kind of work
like this they use a technique called
consistent hashing
in fact consistent hashing is a nice
thing to know when you're looking at
other systems
and other open source projects in the
distributed systems world
this method is it often kind of pokes up
its head is like winking at you from
various systems it's front and center
and cassandra but it shows up in other
places so just be advised now
here's an example of how you build a
database from the ground up to be
distributed now it's still doing bad
things to you all right there's still
lots of features cause sandra's not
gonna give you that you wish you had but
it's very upfront about it right right
from the beginning it's saying here here
are the limitations of the data model
here's how scale works
period and and sort of just deal with it
and uh move forward from there so
suppose i've got this database here i've
got eight computers and they're all the
same they're all peers it's a pretty
simple architecture they're just
identified with a numeric token and in
the case of cassandra we always draw
this in a circle now that's not because
they're actually connected that way
you know these are regular computers on
a ethernet network just like you were a
normal person
but they all have this unique token this
unique numerical id and it helps to draw
them in order in uh clockwise in a
circle and i'll show you why
right now
i'm storing favorites
and per the
coffee
picture because i forgot to say this at
the opening slide as many of these
examples as possible i will draw from a
coffee shop just to be cute so suppose
this web scale coffee shop is trying to
remember people's favorite coffee
basically i'm an americano guy sometimes
if i'm feeling a little crazy
almond milk latte that's about as fancy
as it gets but
we want to store my uh my favorite
coffee americano what we do is we take
that key value pair
cassandra's not a key value database but
this is an easy abstraction to kind of
get on board and this is as far as we'll
go here in the summary take the key and
run it through a hash function
doesn't matter what the hash function is
as long as it's always the same hash
function we hash it we say okay where is
that number going to fit in this ring we
want to see okay it's bigger than eight
thousand it's less than a thousand so
just based on the way those arrows are
going i'm going to write that key into
that node simple as that now the cluster
can grow the hash values can change and
as long as internally the database moves
keys around as those things as those
things change all i ever need to do when
i'm writing to the database is look at
it and see what are your tokens right
now how many nodes do you have what are
your tokens and
hash my key and i know where to write
and i know where to read from later on
if i say hey what's tim's favorite
coffee there's a new barista he doesn't
know what my favorite coffee is it's
fine this is web scale he can't remember
all those things we'll hash that we'll
say oh nine f seven two fine that's to
come from a thousand i'll go there get
that key and pull it out
so that's basically how you read and
write
with a consistently hash database
of course
we would never want to have just one
copy of of a piece of data that would be
crazy
so we can do things like replicate
if i say well you know the the first
copy would have gone to this node i'll
just go to the next two nodes down
uh the ring
in the cluster and put a copy on
those nodes as well right that fixes the
problem of now that i'm in a distributed
system what was my second characteristic
computers fail independently
we know this we have no way to predict
when a node's going to die i would never
want a key in just one place so now i've
got three copies of it i've solved that
problem i'm i'm now able to tolerate
hardware failure and be up
sounds great
i told you distributed systems are
terrible i created a problem by solving
one and what is that problem
consistency i have three copies on three
computers of something that can change
i'm a fickle coffee drinker it's
americano today tomorrow maybe it's
triple skinny two thirds decaf half
chocolate no whip grande mocha with
extra foam you have no idea what i might
come up with tomorrow i am the very
definition of mutable state so
now that i have three copies of
something that can change i have created
a consistency problem and to uh to be
clear one of these
if i if i write tomorrow
my new that crazy thing i just said um
maybe one of these nodes is down while
i'm doing the right
and it doesn't succeed and it has an old
copy of
the favorite coffee and the other two
nodes have the new copy entirely
possible and then when i go to read how
do i know who's telling you the truth
all of these things become problems now
basically i want to give you a rule for
thinking about consistency in a database
like this
when i write to the database i'm going
to require that a certain number of
nodes take the right
so as a client as a client application
i'm going to say
you two at least two of you need to
succeed if only one of you succeeds i'm
going to consider it to be a failure or
i'll say only one of you needs to
succeed i know i have three replicas but
i'm in a hurry i'll take one those are
options that i have i could be i could
be
particularly fastidious and say i need
all three of you to succeed
nobody would do that really just one and
two are the practical options likewise
when i read i need to say well i'm gonna
hear from two of you or one of you
i have a simple formula here
for knowing whether you are using your
database in a strongly consistent way if
the number of nodes i write plus the
number of nodes i read
is greater than the number of replicas
then i have strong consistency
so if i if i write one and read one
and there's three replicas that's
eventually consistent if i write two and
read two and i have three replicas
that's strongly consistent cassandra is
a pretty cool database if it's if you
want a big database it's kind of the way
to go
and it lets you it lets you
tune this
on on a request by request basis you can
you can change your consistency
semantics on the fly
pretty cool
so even though
everything's a problem in a distributed
system and solving one problem creates
another one there are some things that
have elegant answers um and and
cassandra is one such
all right
the cap theorem should we talk about the
cap theorem just a tiny bit
it
back when nosql was new it was like 2010
reading hacker news in the morning which
is a thing i do
it's like my newspaper
it was painful because about once a week
there would be a new and newly
misinformed blog post that would make it
to the front page of hacker news that
was something about the cap theorem
everybody cared all of a sudden because
everybody was making scalable databases
and so people had to to blog about it
and say things that weren't true it's
really not hard to grasp and it comes up
sometimes still in distributed systems
discussions so i want to give you a
simple way of understanding it
the cap theorem says if you're building
let's just make it simple a distributed
database
you you want your database to have three
properties you want to be consistent
that is that's the c
when i read something i want that to be
the most recent thing i have written
so i write and then i read and i see
what i read
i want it to be available and that
simply means when i try to write it
works and when i try to read it works i
also want it to be partition tolerant
that means it's a cluster i've got lots
of nodes being a database together and
if some of those nodes go away for a
little bit and then come back together i
don't want the database to completely
fall apart i wanted to be able to deal
with the fact
that sometimes
parts of the system can't talk to the
the rest of the system that's partition
tolerance so
um imagine if you would you're in a
coffee shop like i said we're oh wait i
forgot the slide consistency
availability partition to hearts you
guys got this all right imagine you're
in a coffee shop forget about databases
for a minute and you are writing
something together you're working on
your screenplay with a friend your
co-writer and your old schools you're
doing it on a yellow legal pad with a
pen
it's you it's not me i don't know why
you're doing this this way but you are
and both of you have a copy of the
screenplay one of you makes a change
here the other one makes a change on
their copy that person makes a change
you delete something you keep your
copies consistent because it's how you
want to do it it's a little strange but
it's what you're doing
the coffee shop closes i'm not going to
judge you i don't know why you're doing
it this way but you just are cough shop
closes but you're in the zone so you go
home and you keep going over the phone
and you keep in sync over the phone you
say well i added this line of dialogue i
removed this scene heading and you're
keeping things up to date but then
your battery dies
and you can't talk anymore you lost your
charger you now have a network partition
you keep writing because you're in the
zone presumably your partner keeps
writing
your spouse comes in the room and says
hey
how's the screenplay going
read me back your most recent page
which is great that your spouse is so
interested i think
here we have a decision so back to the
the venn diagram here you can't have all
three of these things that's the
fundamental insight of the cap theorem
in the presence of a partition
in this situation
you can either
choose to be unavailable
that is you say to your spouse i can't
tell you
because i might tell you the wrong thing
and i would hate to do that that would
be unfair to my writing partner or you
could say well i'll just give you an
answer it might be wrong
um
so i could i could give up on
availability and keep consistency by not
answering or i could
keep availability and give up on
consistency that's the fundamental
insight of the cap theorem is i can only
have two of these three good things i
always want all three
but sometimes i can't have all three
um it's uh
again
there's a there's a
uh
a lot of writing on the internet about
what this means and what its
implications are and when it applies and
so forth but just think of distributed
databases it's a simple trade-off
between those three things that you want
all right
enough of storage
let's talk about computation
again computation is super easy if you
have just one
processor um if you go to multiple
threads on one processor and you try to
do computation in a threaded way that's
usually pretty terrible and uh there are
talks at this conference about ways to
avoid that terribleness
and
i should say but
you got multiple computers and you're
trying to distribute a computation task
among multiple computers
life gets really hard and we've created
some truly terrible tools for solving
this problem the good news is life is
getting a lot better
trying to do this 10 years ago or 12
years ago was bad trying to do it now is
starting to not feel like somebody is
hitting you with a 2x4 so i think
there's a lot of a lot of reason for um
for optimism and mapreduce is not one of
those reasons but let's talk about it
just to make sure we're on the same page
and you know the badness that we've come
from so the typical way to explain
mapreduce is with word count
and this is a union rule people i have
to do this
so
i'm going to use a word count example
just to show you basically what's going
on here so i've got this big chunk of
text now if i'm actually
oh this works really well in the u.s
because everybody knows this
it's an american poet he's from
baltimore
his name is edgar allan poe
the american football team
from baltimore does anybody know the
name of the team
the ravens
because this poet is from baltimore this
is his most favorite poem it's an
amazing poem strongly recommended
and i'm not even a football fan but i
kind of love the ravens for this reason
anyway if you actually had the text of
the raven i think it's about 1800 words
the right way to count that is to use
the wc command don't screw around that's
a small data problem but this is helpful
just to see how mapreduce works map
reduce
funnels all of your computation through
two functions one called map and one
called reduce that is a terrible thing
to do to a person but here's how it
works i would dump the text into my map
function and what map would do is it
would it would tokenize it it would
split it up into words
and then it would turn those words into
key value pairs
this is i'd like to go through this
because this is kind of a neat insight
into the way you would solve a problem
like this in a distributed way
what we begin by doing
is counting words
that is what i have just done here
that's the first step in the mapreduce
canonical mapreduce word count example
is you count the words now what people
say is i tokenize the text and i turn
the words into key value pairs the word
is the key and the value is the number
one really what i have done though is
i've just counted each word
and i've seen each word one time it's
like very naive stateless counting i
don't remember whether i've seen the
word a or came or wrapping before i just
count each one one time all right
then
there is a step in between that always
gets ignored it's called shuffle so
after i do that mapping i count all my
words once then potentially i have to
move these words around the network
to different computers to make sure
similar words are uh nearby each other
so at or rather a
wrapping tapping i want to get those in
order
then i take these collection of
collections of similar words and i dump
them into a function called reduce
and what reduce is going to do is
basically add up the numbers
so it's a less naive counting it says
well i counted the word a 3 000 times
let me just add up all those numbers
now i am skipping a bunch of steps here
in the spirit of summary and a bunch of
complexity here in the spirit of summary
um
but
uh this i end up with with counted words
and i'm able to do this
um i'm able to split up the work of
mapping and the work of reducing and
distribute it among lots and lots of
computers this is a really good idea if
your data is already say in a
distributed file system
if you've got so much data that you need
300 computers to store it all and
replicate it all
then
doing your compute this way
it seemed like a good idea at the time
and it was a moderately good idea a lot
of people built i shouldn't say a lot
some people built systems using
mapreduce that were worthwhile all right
and it by itself mapreduce is just a
pattern it's not a product but it's the
pattern the compute pattern used by
hadoop
hadoop is
still widely deployed
rapidly becoming in my view in the view
of many others a legacy technology it
does not it is not the cool new thing
anymore at all
but you should know
that whole mapreduce thing yep that's
the fundamental layer uh the bottom of
hadoop and
the other the other i think more
long-lived part of hadoop is the
distributed file system hdfs
that is in the words of one of its
co-creators a cockroach
nobody is really building any new
mapreduce systems
anymore and that's not you don't see
that in lots of new designs
hdfs is going to be with us
uh until our grandchildren are writing
software i think it's it's probably
going to be a very long-lived piece of
infrastructure and for good reason it
works just fine
hadoop spouted an enormous ecosystem and
not necessarily
in a good way in my opinion at least it
became very very complicated because the
underlying program auto programming
model was so terrible
that we needed this big ecosystem of of
more pleasant layers on top of it and
that's kind of what happened things like
hive uh grew up and that became a
relatively pleasant way to use hadoop
actually programming mapreduce not many
people did that which is why
a tool when we can we keep going through
the world of distributed computation a
tool like spark
has
really taken over the uh the mind share
of mapreduce in the last three years
the way people
have been talking about spark
at least you know when it was new
they would start off by saying hey this
is a way better thing than hadoop
just believe me and uh you know
mapreduce is dumb and then you start
looking at spark internally you kind of
learn the api you're like yeah this is
way better than mapreduce instead of map
i have transform instead of reduce i
have action
i like it
so the the underlying paradigm is still
very similar we still assume we've got a
bunch of data sitting out on a bunch of
computers
and then we need to go to that data and
do stuff to it and if that's what you're
going to do then that basic scatter
gather paradigm is always going to
happen
and it's probably always going to follow
that
basic map reduce
sort of duality
now uh spark's insight was to say
let's create an abstraction on top of
the data
that's officially a part of our api
and
originally the the
in early versions of spark that
abstraction was the rdd now kind of the
front-facing abstraction and spark is a
thing called the data set it's very
similar but uh in contrast to hadoop
spark gave you an object that you could
program
you you felt when you're writing
mapreduce code you feel like
um
like someone is hitting you with a stick
when you're writing spark code you feel
like a programmer
you've got an object that represents
some big collection of data and what do
you do you call methods on that object
and it does stuff to it now maybe
they're essentially map and reduce that
are that are just have a little bit of
syntactic sugar on them uh but
the spark programming model is uh so
much more developer friendly than hadoop
that mapreduce has even in in the eyes
of the hadoop vendors at this point
essentially been completely taken over
by spark and it it deserves it
spark is a much much more pleasant way
to write code it also doesn't have
storage opinions
hadoop's mapreduce really said you could
put data anywhere you want as long as
it's hdfs spark says of course i will
operate on hdfs data i'll operate on
cassandra data i will operate on dang
parquet files and s3 i don't care it
just wants to make rdds and data sets
and let you transform them and do
actions on them and do that distributed
computation now the assumption is still
in the world of spark again i've got a
bunch of computers i have a distributed
storage problem that's why we started
with distributed storage i have i have a
distributed storage problem and now i
have to do compute over all that data
that's sitting out there
so if i have a distributed file system
or a distributed database
that's that's data at rest out there
then spark the spark paradigm still
makes sense
hadoop is the mapreduce paradigm really
you could think of that as yesterday's
news and spark is more of a today thing
now kafka it's not quite time for kafka
stay in the sun but because we're
talking about computation and in the
last year kafka has entered this world
uh i should mention
it is an approach to distributed
computation in which everything is a
stream
so as long as you've got streaming data
you can do computation on that streaming
data as the data is in flight you don't
have the assumption that you have put
the data out somewhere and you're going
to go to that data and do compute on it
this is a stream processing framework
where streams are first-class citizens
and you don't even need a cluster to do
your computing
spark importantly also has
a stream processing api and and it's not
just api there's actual infrastructure
in a spark cluster that helps with
stream processing
there are other solutions to the stream
processing problem but kafka needs
mention here because it is uh not just a
messaging messaging framework but is
also a
computation framework now
all right
we are at about the t-minus 15-minute
mark
oh you know what i didn't say
at first i didn't say if you have a
question ask it
did you assume that it's a little late
now because i i i won't i will never
leave time for questions at the end i
always prefer that those just happen
when they come too late dang it
now you know if you got a question ask
it the worst thing i'll say is i don't
know
all right let's talk about
my third promise and that is messaging
um messaging is great because it lets us
loosely couple systems and that lets us
build little chunks of functionality and
really release them and version them
independently and still have them
communicate and
does that sound like anything we do
these days if i say little chunks of the
system that are released and versioned
independently
that sounds kind of micro service-y
that's amazing so messaging is because
of microservices in the last three years
messaging and streaming i'm seeing come
way more to the foreground in the last
year or so people are realizing that
microservices are a super good idea for
a number of reasons and people have been
struggling with how to get them to talk
um
you know early mike when maybe this is
your microservices deployment and i
should be respectful but kind of an
early approach to microservices is well
let's just have a bunch of little pieces
of code and one big database and they
talk through the database
people realize that was a terrible idea
it didn't really work and so getting
those micro services to talk
suddenly messaging is coming back into
the foreground so
it's a way of loosely coupling systems
we usually use the term subscribers
um
or producers
um sometimes subscribers we might call
them consumers all these kind of uh
words we almost always in messaging
system organize messaging systems
organize them into named topics which
are just namespaces for similar messages
and there's usually this this this uh
term broker which is just a computer in
the messaging framework
um we typically think of messages as
being persistent over the short term
now it's up to you how you define short
term
uh the new york times is a kafka open
source kafka user and they keep all of
their newspaper content going back to
the 1860s
in kafka so you get to set
the data retention yourself
and believe me 1860s in america is a
long time ago
so
that's that's a lot of newspaper data
for us
now yeah the basic idea here is uh
you've got a producer it sends a sends
messages into a topic and then the
consumer gets to read those out at its
own
uh rate
the problem is
when that producer gets big
what if a topic gets too big for one
computer now how can it be too big well
you can retain too much data suppose you
want to retain data into the 19th
century
hey again i'm not going to judge it's
your system so that's big or your
messages are big or um
you are transacting messages too fast
for one computer to keep up maybe data
retention isn't a big thing but you're
reading and writing too fast for one
machine given the fastest machine you
can reasonably deploy right now what if
one computer is not reliable enough
remember they break
and you would like your system not to go
down
um what if you really want to be able to
deliver to to guarantee delivery
even if a computer is down well
enter apache kafka
i should say when you have a messaging
system and it is a single server
messaging system it can do all kinds of
fantastic things like it can trivially
guarantee that messages are delivered
exactly one time
what it can't do is be particularly
resilient to failure or particularly
scalable
oh i should say also it can always
guarantee that messages are ordered
this is a message queue after all and we
would expect messages to be ordered and
when i have a single server message
queue or message broker then
everything's great my messages are
always going to be in order i'm going to
have to give up on some of those things
when i scale
but i'll get things in return and it's
as it turns out a lot of awesome things
so
kafka now just a few quick definitions
in case i use these words of course
message
that's the basic
thing topic i've already said that
producer this is a client process
outside the kafka cluster that puts
stuff in consumer this is a client
process outside the kafka cluster that
takes messages out and broker is simply
one of the computers that the kafka
cluster is made up of
so a
a broker will have many topics for
example i'm not really deep diving on
kafka here just like i wasn't deep
diving on cassandra so it does things
like replication also and it answers the
consistency question also in various
ways so all of those problems come up
and all of those problems are solved by
this particular distributed message
broker
but the money in kafka the really
interesting thing is when we get past
just being a message queue so i want to
show you how it works as a message queue
and where
the distributed system ugliness creeps
in
like what kafka does to you that's mean
because it's distributed
and of course what we get in return for
that
then i want to talk real quickly in
closing about what happens
when
we have made everything into a message
like how
how thoroughly can you live that life
and what sorts of things happen uh when
you do so kafka all right again producer
consumer topic
this is the trivial account of kafka
kafka as a pipe
and this is currently how most people
think of kafka because it's it's it's
value-added features that are more than
just pipe are really only a year old so
let's just go through the pipe version
when that topic
gets big
what do we do
of course we partition it we'll just
split that up onto several computers so
let's say instead of one computer i've
got that single topic partitioned over
three
computers now each of these partitions
of course i would also replicate just
put that to the side for a moment we're
not going to talk about that we've got
these three partitions so these three
uh rows here are those three brokers
each of which has one partition of the
topics these are independent computers
the gears over there are producers
they're going to produce messages and
these diligent people working at uh like
2010 imax those are
uh the consumers so
when i create a message
um as a as a producer what's gonna
happen is we're going to look at some
part of that message
now maybe i'll just hash the whole
message maybe i'll look into the field
look into a field of the message like
username or ip address or something that
means something to me as a producer but
i'm going to take some part of that
message up to all of it and i'm going to
hash it
now this is like consistent hashing over
there
poking its head up and winking at you
like i warned you it would i'm going to
hash that message mod the number of
partitions and that's going to tell me
what partition to write to
within each partition
i will be able to keep things in order
each partition is ordered so i'm going
to keep writing in here and i will kind
of randomly really or uniformly have
these messages assigned
to my partitions
uh oops i consumed one let's not consume
that yet
now already you can't remember what
order these happened in
system-wide or topic-wide ordering is
lost
because this is a distributed message
queue
i can only have ordering within a
partition that's a limitation i have to
accept i don't get global topic-wide
ordering that's a bummer i would love to
have it there is no day i will wake up
in the morning
roll out of bed
have a cup of coffee and say i literally
do not want global ordering on my topics
darn it i wish i didn't have it shake my
fist at the heavens it's just not going
to happen i would always want it but i
can't have it
it's just not a possibility but i do
have it within partitions and so i
that's why i said that at the at the
producer level you can be smart about
what part of the message gets hashed and
so i can guarantee within a topic that
um messages from the same user or from
the same host or from the same whatever
will always end up in the same partition
and then i can consume them in order
from each consumer and those consumers
are independent computers operating
independently scaling my system like a
boss so
cool
i have a message bus
i have solved the problem of distributed
messaging i already came up with a way
to solve distributed storage we talked
about one
i came up with there are a few different
ways to solve distributed computation
now i've got this distributed messaging
system so all my services can
communicate with one another through
topics in this elastically scalable
essentially infinitely scalable message
bus that's great
and that's what i did yesterday i would
generate events into this bus this is
kind of how people start out and then i
would i would i would consume service
would consume one of those messages and
what do i do i write it to a database
so i've got this event and i turn that
event into a row and that row sits in a
place
the data is sitting in a place somewhere
maybe
tomorrow instead of doing that instead
of saying here's an event i'll put it
here and i'll go back here when i need
it maybe tomorrow i could just do
computation on the events after all i've
got the giant message bus sitting there
and everybody has an api everybody is is
coupled to that bus through that api
they're producing they're consuming
maybe i could do that so without
that
this this what is now really an
anti-pattern emerges about seven years
ago this was considered pretty cool and
it's it's the so-called lambda
architecture that means i've got all
these events out here and i have to
build
two parallel systems to process them
this is what happens if you don't if you
don't use your message bus thoroughly i
have one system that all my batch
processing takes place in that's my
distributed database
those are slow but they're very thorough
those jobs up there they can take
minutes or hours and then i've got this
other system down here
that really is just my temporary message
bus thing and i might do little bits of
stream processing for quick summaries
all right this lambda architecture
emerges this is bad because essentially
what lambda did to us and if you come to
my talk this afternoon
i'll talk about this in more detail
is it forced people to write the same
code twice you had to write the stream
code and the batch code
and
yeah they say the best code is the code
you never write
well the worst code would be the code
that you write two or more times that's
a terrible idea so
the
the basic idea once i've got a message
bus and i say hey look how about i stop
obsessing over taking events and writing
them to some place
in a database and why not just let them
be events and process them in place
through the uh you know through a stream
processing api
at the very top level i can do something
like this i've got a kafka cluster and
i've got various services attached to it
i've got some service maybe that's doing
some analytics or some monitoring a
service that's doing some analytics i've
got a couple of little databases that's
okay for a microservice to maintain a
local database but all of these are
services i might even have a hadoop
cluster because it's still there we
spent a lot of money on it the cio
doesn't want to get in trouble
for not using it so you have to keep
using the hadoop cluster i might have an
elastic search cluster down here all
doing different kinds of things that my
system needs based on this messaging bus
that they all talk to
um
pardon me so
i already did this and i just wasn't
looking at the screen so i have events
out in the world that go into that
cluster and get propagated to the
services that need them
and the nifty thing here is
each service doesn't really need to
worry about what other services might be
doing with it an event is just an event
it happens it's there in a topic and a
new service can grow up and say hey i'm
interested in that event without having
to have any coupling to the way that
event is produced or who else might
consume it
so once i start putting things into a
message queue
i can be
it could cause me to think very
differently about the architecture of
the system this can be an impactful
thing this distributed messaging thing
it's possible to go from my single
server relational paradigm to a
distributed cassandra paradigm i have to
change my mind about some things about
the way consistency works about uh the
way data modeling works right like
that's a big deal when i go from single
threaded programming to distributed
computation that's kind of hard uh
that's a tough thing to do when i when i
get streaming into my blood suddenly the
world is very very different and i start
building i start building systems in a
very different way indeed
now um
with that
just about out of time
and we have covered storage computation
and messaging a brief introduction to a
few things that go wrong and right when
you build distributed systems and how a
few open source projects solve them
shameless plug
videos by the same name on o'reilly if
you care to learn more
absolutely not shameless plug for kafka
meetups because this is what i do if
you're interested in more uh check that
out take a picture of that slide if you
want to go to a kafka meetup i would
love to see you there sometimes i am
even seen
in europe not often enough
but thanks for having me here today
have a good day
Посмотреть больше похожих видео
Caching Pitfalls Every Developer Should Know
【는데요/韓国語】10分で日常会話に必須のネイティブ表現をマスター!
Day 29/75 Build Text-to-Video AI with LLM [Explained] OpenAI SORA Stable Diffusion VideoPoet Runway
【新番組】EXITりんたろー。と国山ハセンが資産運用をプロから本気で学ぶ/3000億円運用のプロ投資家がお金の熱血授業/コカコーラの株価は何故下がらないのか?(MONEY SKILL SET)
Biosystems Informatics 1-1: Information Processing
ビジネスにおける最強スキルを3ステップで解説!【トヨタで学んだ「紙1枚! 」で考え抜く技術】
5.0 / 5 (0 votes)