Distributed System Design

Menchita F. Dumlao
6 Jan 202328:24

Summary

TLDR本次讲座深入探讨了分布式系统的概念、优势和设计模式。分布式系统通过多台计算机的集群提高应用的可扩展性和容错性,适用于大数据和高并发场景。介绍了MapReduce框架、无状态与有状态系统、Raft一致性算法等关键组件,以及CQRS、两阶段提交和Saga等设计模式。这些模式有助于简化系统复杂性,提高数据处理效率,但也存在各自的优缺点。讲座强调了在系统开发中采用分布式框架的重要性,以及如何根据应用需求选择合适的设计模式和NoSQL数据库类型。

Takeaways

  • 🌐 分布式系统使得开发可扩展的应用程序变得更容易,因为软件应用、数据和文件的增长需要大量存储空间。
  • 🔄 当今许多公司使用复杂的分布式系统来处理各种请求和存储需求,以实现应用程序的全面可扩展性。
  • 💡 分布式系统的特点是共享和同时操作相同状态的集合,且这些机器可以独立故障而不影响整个系统。
  • 📈 分布式系统的优点包括可扩展性、模块化增长、容错性、成本效益、低延迟、高效性和并行计算。
  • 🔄 可扩展性意味着应用程序能够通过不同平台工作,并允许与不同类型的系统进行水平通信。
  • 🔧 容错性意味着如果一个机器或节点发生故障,其他机器或节点不会受到影响。
  • 💰 成本效益表明,尽管分布式系统的初始成本高于传统系统,但由于其扩展能力,它们很快变得具有成本效益。
  • 🚀 MapReduce 是 Google 开发的一个框架,用于高效处理大量数据,适用于大数据应用,如在线商店、社交网络站点等。
  • 📊 分布式系统设计模式提供了一种构建适合特定用例的系统的方法,它们是允许我们利用现有知识而非从零开始构建系统的构建块。
  • 🛠️ 分布式系统设计模式分为对象通信、安全性和事件驱动三类,帮助开发者理解如何构建和设计系统。
  • 🔍 选择适合的分布式系统设计模式和NoSQL数据库类型对于系统开发至关重要,需要根据数据类型和所需信息来决定。

Q & A

  • 分布式系统的主要优势是什么?

    -分布式系统的主要优势包括可扩展性、容错性、成本效益、低延迟、高效性和并行计算能力。这些系统能够处理大量数据,支持横向扩展,并能在多台计算机之间共享和同时操作相同状态,即使某些机器独立故障也不会影响整个系统。

  • MapReduce框架是如何帮助处理大数据的?

    -MapReduce框架通过将大数据处理划分为更小的部分来提高效率。它使用map和reduce两个阶段:map阶段处理数据并生成中间键值对,reduce阶段则对这些数据进行排序和汇总。这个过程允许用户专注于程序的高级逻辑,而不必关心底层的处理细节。

  • 什么是无状态和有状态系统?

    -无状态系统不会保存任何关于过去事件的状态信息,它仅根据提供的输入执行操作。而有状态系统则负责维护和改变状态,这意味着它们会记录用户的操作历史和系统的变化。

  • Raft协议在分布式系统中扮演什么角色?

    -Raft协议在分布式系统中确保复制状态机的正确性和日志的一致性。它通过支持多个连续的共识轮次来实现节点之间的一致性,即使在节点故障的情况下也能保证系统的稳定性。

  • 分布式系统设计模式有哪些主要类别?

    -分布式系统设计模式主要分为三类:对象通信、安全性和事件驱动。对象通信模式描述了系统不同组件之间的消息协议和权限;安全性模式处理系统的保密性、完整性和可用性;事件驱动模式描述了事件的生产、检测、消费和系统对事件的响应。

  • CQRS模式在分布式系统中的作用是什么?

    -CQRS(命令查询职责分离)模式通过分离读和写操作来提高分布式系统的可伸缩性和安全性。它使用命令来写入数据,并使用查询来检索数据,这样可以通过命令中心来处理数据的修改,并通过读服务来更新读模型,从而向用户展示变化。

  • 两阶段提交(2PC)协议如何确保分布式事务的一致性?

    -两阶段提交(2PC)协议通过两个阶段——准备阶段和完成阶段——来确保分布式事务的一致性。在准备阶段,所有参与的服务被锁定并准备好发送数据;在完成阶段,协调者逐个解锁服务并请求其数据。一旦所有服务都准备好,它们就会被解锁以接受新任务。

  • Saga模式在处理分布式事务时有哪些优势?

    -Saga模式通过事件总线在微服务系统内部进行服务间的通信,每个参与的服务都会创建本地事务,并通过发布事件来触发其他服务的操作。这种模式的优势在于它可以处理更长的事务,适合去中心化的分布式系统,并且减少了瓶颈和来回通信。

  • 为什么说分布式系统对于现代应用程序尤为重要?

    -现代应用程序,尤其是在线商店和社交网络站点,需要处理大量数据和请求。分布式系统框架使得这些应用程序能够通过多台计算机的集合来工作,仿佛它们是单一计算机一样,从而提供了可扩展性、高可用性和高效的数据处理能力。

  • 在设计分布式系统时,应如何选择NoSQL数据库类型?

    -在选择NoSQL数据库类型时,应考虑数据的特性和所需的信息类型。有多种类型的NoSQL数据库,包括文档型、键值对型、列族型和图形数据库等,选择的依据是数据存储的需求和应用程序的具体用例。

  • 分布式系统设计模式如何帮助开发者?

    -分布式系统设计模式为开发者提供了一套标准化的系统设计模型,帮助他们理解如何构建适用于特定用例的系统。这些模式是基于现有知识的构建块,允许开发者从已有的解决方案中汲取经验,而不是从头开始构建系统,从而提高开发效率和系统质量。

Outlines

00:00

🌐 分布式系统概述

本段介绍了分布式系统的基本概念,强调其在当今软件开发中的重要性。分布式系统使得开发可扩展的应用程序变得更加容易,因为现代软件应用的数据量和文件大小不断增长,需要更大的存储空间。许多公司使用复杂的分布式系统来处理各种请求和存储需求。分布式系统由多台计算机组成,这些计算机共享和同时运行相同的状态,并且可以独立故障而不影响整个系统。尽管分布式系统复杂且难以部署和维护,但它们提供了许多好处,如可扩展性、容错性、成本效益和低延迟。此外,还介绍了分布式系统的设计模式,如MapReduce框架,以及它们如何帮助处理大数据应用程序。

05:00

🚧 分布式系统可能遇到的故障

这一部分讨论了分布式系统可能遇到的故障类型,包括系统故障、通信故障、次要存储故障和方法故障。系统故障可能导致主存储器内容丢失,但次要存储器保持安全。通信故障可能由通信链路故障或节点移动引起。次要存储故障发生在次要存储设备上的信息无法访问时。方法故障可能导致分布式系统停止运行或无法执行任何操作。了解这些基本概念对于设计和维护分布式系统至关重要。

10:02

🔍 分布式系统的关键组件和设计模式

本段详细介绍了分布式系统的关键组件和设计模式。首先介绍了MapReduce框架,它是Google开发的用于高效处理大数据的框架。接着讨论了无状态和有状态系统的概念,以及Raft协议,它用于建立复制状态机的内容和相关的复制日志。此外,还探讨了分布式系统设计模式,它们提供了构建特定用例系统的标准化模型。这些模式分为对象通信、安全性和事件驱动三种类型,每种类型都有其特定的应用场景和优势。

15:03

🛠️ 常见的分布式系统设计模式

这一部分列举了五种常用的分布式系统设计模式,它们是构建特定场景系统的经过验证的方法。这些设计模式包括对象通信、安全性和事件驱动等方面。对象通信描述了系统不同组件之间的消息协议和权限。安全性处理了保密性、完整性和可用性等方面的问题。事件驱动则描述了系统事件的生产、检测、消费和响应。这些模式有助于开发者理解如何构建和设计分布式系统,以及如何在系统开发中重用代码。

20:05

📈 分布式系统设计模式的应用

本段深入探讨了分布式系统设计模式的应用,特别是Command and Query Responsibility Segregation (CQRS)和Two-Phase Commit (2PC)模式。CQRS通过分离读写操作来提高系统的可伸缩性和安全性,而2PC是一种事务性方法,侧重于中央命令但按类型和完成阶段处理分区。此外,还介绍了Saga模式,它是一种不使用中央控制器的同步模式,而是在微服务系统之间通过事件总线进行通信。这些模式各有优缺点,适用于不同的应用场景,如数据密集型应用、高风险交易和分布式系统。

25:08

🎓 分布式系统设计的总结

最后一部分总结了分布式系统设计的要点,强调了使用MapReduce和NoSQL平台来开发能够处理大数据和流处理的强系统。通过介绍不同类型的分布式系统设计模式,本讲座旨在提供如何开发可扩展系统的见解,并展示了大型公司如何开发运行在不同服务器上的应用程序,以及这些服务器如何相互通信。

Mindmap

Keywords

💡分布式系统

分布式系统是由多台计算机组成的系统,它们协同工作,对外表现为一个单一的系统。在视频脚本中,分布式系统被提及为能够处理大量数据和请求的关键技术,使得应用程序如在线商店和社交网络站点能够实现高可扩展性。

💡可扩展性

可扩展性指的是系统能够适应不断增长的工作负载或数据量的能力。在视频脚本中,可扩展性是分布式系统的一个重要优势,它允许系统通过增加更多的服务器或节点来处理更大的数据量,而不会影响系统的性能。

💡MapReduce

MapReduce是一种编程模型,由Google开发,用于处理和生成大型数据集。它通过将大数据处理任务分解为Map(映射)和Reduce(归约)两个阶段,使得处理过程更加高效。

💡容错性

容错性是指系统在部分组件失败时仍能继续运行的能力。在分布式系统中,容错性非常重要,因为系统中的单个机器可能会独立故障,而不影响整个系统的运行。

💡低延迟

低延迟是指系统响应用户请求或处理数据的时间非常短。在分布式系统中,通过在多个地理位置部署节点,可以使用户请求快速到达最近的节点,从而减少等待时间,提高用户体验。

💡并行计算

并行计算是指同时使用多个处理器或计算机来解决计算问题的过程。在分布式系统中,通过将复杂问题分解成小块,分配给多个处理器同时处理,可以显著提高计算效率。

💡状态无关和状态有关联

状态无关(Stateless)系统不保存任何会话信息或用户请求的历史,每个请求都是独立的。状态有关联(Stateful)系统则保存会话信息或状态,能够根据之前的交互来处理当前的请求。

💡Raft

Raft是一种用于管理分布式系统日志复制和状态机复制的一致性算法。它通过选举一个领导者(Leader)来协调其他节点(Follower和Candidate),确保所有节点的状态保持一致。

💡设计模式

设计模式是在特定上下文中解决常见问题的一套标准解决方案。在分布式系统中,设计模式提供了一种构建系统的方法,使得不同节点能够有效地通信和协作。

💡NoSQL数据库

NoSQL数据库是一种非关系型数据库,它不遵循传统的关系型数据库的结构,如行和列。NoSQL数据库特别适合处理大规模、非结构化或半结构化的数据,并且通常用于分布式系统中。

💡CQRS

CQRS(Command Query Responsibility Segregation)是一种架构模式,它将读取(查询)和写入(命令)操作分离,以提高系统的可扩展性和性能。这种模式通过分开处理读取和写入请求,减少了系统的复杂性,并提高了数据处理的效率。

Highlights

分布式系统使得开发可扩展的应用程序变得更加容易。

现代软件应用程序的数据和文件不断增长,需要大量存储空间。

许多公司使用复杂的分布式系统来处理多种类型的请求和存储需求。

分布式系统框架类似于许多计算机的集合,它们可以独立工作,并且相互连接,就像在单个计算机中工作一样。

分布式系统中的机器可以共享和同时运行相同的状态,并且可以独立故障而不影响整个系统。

分布式系统虽然复杂且难以部署和维护,但性能上有许多优势,如可扩展性、容错性、成本效益、低延迟、高效性和并行计算。

MapReduce是Google开发的一个框架,用于高效处理大量数据。

MapReduce使用宇宙服务器进行数据映射管理和分发,提供用户命令执行过程中底层进程的抽象。

MapReduce工作流程包括分区、映射中间文件、归约和聚合。

无状态和有状态系统是分布式系统中非常重要的概念。

Raft协议建立了复制状态机的内容和相关的复制日志,支持多轮连续共识。

分布式系统设计模式提供了构建适合特定用例的系统的方法,它们是允许我们利用现有知识而不必从头开始构建系统的构建块。

分布式系统设计模式分为对象通信、安全性和事件驱动三类。

CQRS(命令查询职责分离)模式专注于将分布式系统的读写操作分离,提高可扩展性和安全性。

两阶段提交(2PC)是一种事务型方法,依赖于中央指挥,但分区根据其类型和完成阶段进行处理。

Saga模式是一种不使用中央控制器的同步模式,服务之间通过事件总线进行通信。

分布式系统设计框架或并行计算框架使用MapReduce并考虑NoSQL平台。

AWS在许多功能中使用基于Saga的设计,如Step Functions和Lambda函数。

本讲座旨在提供如何在分布式设计框架中开发强大的可扩展系统的想法。

Transcripts

play00:07

good day to everyone today our topic

play00:11

will be distributed systems

play00:13

as part of our discussion and lecture

play00:16

for systems analysis and design and

play00:21

implementation so our discussion for for

play00:24

distributed systems

play00:26

um considering your design could

play00:29

possibly and should possibly be a

play00:33

workable

play00:34

for distributed systems

play00:40

so we're going to discuss in here what

play00:44

are distributed systems

play00:50

okay so we as you can see in our

play00:54

module here distributed systems makes it

play00:57

easier

play00:58

to develop scalable applications

play01:03

and it is because nowadays more of

play01:07

software applications

play01:10

are growing

play01:12

its data and data file and it requires a

play01:18

large storage because data that it

play01:21

accumulates every day are increasing in

play01:25

exponentially and many companies uses

play01:28

complex distributed systems

play01:31

to handle many types of requests

play01:34

and storage requirement

play01:38

and that's the reason why we can see

play01:41

that most of the applications that we

play01:44

use nowadays especially those online

play01:46

shops and social networking sites are

play01:50

fully scalable it is because

play01:54

their platform are based on distributed

play01:57

systems framework

play01:59

which is like collection of many

play02:01

computers

play02:02

which means their systems are enabled to

play02:06

work

play02:07

by of any servers and they are

play02:10

interconnected to each other as if it is

play02:13

working in a single computer

play02:16

and in distributed systems

play02:19

there are collections of share

play02:23

and the same state

play02:25

and operate concurrently okay or

play02:28

simultaneously and these machines can

play02:31

also fail independently without

play02:33

affecting the entire system and that is

play02:36

the reason why distributed system is one

play02:40

of the framework which have been used by

play02:43

many companies now

play02:46

actually this type of framework or

play02:50

distributed systems is quite complex and

play02:53

it is difficult to deploy and maintain

play02:56

but the performance has a lot of

play02:59

benefits like scaling modular growth

play03:03

full tolerance

play03:05

and it is also cost effective

play03:08

and it has low latency

play03:11

efficient

play03:13

and you can do parallel

play03:17

um Computing so these are some of the

play03:21

um known benefits

play03:24

of distributed systems

play03:31

okay so we can discuss it one by one

play03:35

um

play03:36

scaling so it means that

play03:39

um your applications are scalable or it

play03:42

is working through different platforms

play03:44

okay and it allows you to do horizontal

play03:50

communication with different types of

play03:52

systems

play03:53

modular growth so there's almost no cap

play03:56

on how much you can scale and it is full

play04:00

tolerance

play04:01

it means that if one machine or a

play04:05

malfunction the other machine will not

play04:07

be affected and it's cost effective so

play04:11

the initial cost is higher than the

play04:13

traditional systems but because their

play04:15

capacity to scale they quickly become

play04:17

cost more cost effective

play04:20

and low latency you can have a node in

play04:22

multiple locations so traffic will hit

play04:25

the closest node and an efficiency

play04:28

distributed systems break complex data

play04:31

into smaller bases

play04:33

and parallelism distributed systems can

play04:36

be designed for parallelism where

play04:39

multiple processors divide up a complex

play04:41

problem into smaller chunks

play04:45

so in distributed systems failure so

play04:49

these are some of the common failure

play04:51

that you may encounter if you try to

play04:55

develop a system which is like a

play04:58

distributed system

play05:00

okay so we all know that system failure

play05:05

may occur anytime okay and it is usually

play05:10

the result in the loss of content of

play05:12

primary memory but the secondary memory

play05:14

secondary Memory Remains safe so that's

play05:17

what's uh advantage in terms of a

play05:21

handling system failure in distributed

play05:23

systems whenever there's a system

play05:25

failure the process fails to perform the

play05:27

execution and the system very good or

play05:30

freeze communication needle failure so

play05:33

it occurs as a result of communication

play05:35

link failures or shifting of notes

play05:40

and there is also another failure it's

play05:43

possible failure in distributed systems

play05:45

which is secondary storage failure that

play05:48

occurs when the information are stored

play05:50

on the secondary storage device is

play05:53

inaccessible

play05:55

okay and it can be caused by many things

play05:58

like crashing dirt on your devices and

play06:02

carry the errors

play06:05

how about method failures so it's

play06:07

usually a halt the distributed system

play06:10

and make it an unavailable to perform

play06:13

any execution at all so it may enter a

play06:17

deadlock state or do protective

play06:19

violations during the method failures

play06:23

so what are the fundamentals of

play06:26

distributed system

play06:28

so in here we are going to discuss some

play06:30

of the lists of our fundamental

play06:33

components of

play06:36

distributed system okay first is

play06:39

mapreduce what is mapreduce it is a

play06:42

framework developed by Google to handle

play06:44

large amounts of data in an efficient

play06:47

manner so if you're trying to develop a

play06:50

system which is like for example an

play06:52

online shop or a social like a social

play06:55

networking sites or some kind of

play07:00

automation okay or reservation sites so

play07:04

you you expect that in the future it is

play07:07

going to collect lots of data or require

play07:10

you to hold large amount of data and

play07:13

with that mapreduce is one of the

play07:16

framework that can be used as part of

play07:19

being distributed systems okay so we

play07:21

encourage the development of new systems

play07:25

like application of produce for

play07:29

distributed systems in your

play07:32

system's design

play07:35

so mapreduce uses Universe servers for

play07:37

data map management and distribution and

play07:40

the framework provides an abstraction to

play07:42

underlying process happening during the

play07:44

execution of user command so mapreduce

play07:48

as it is being applied for all big data

play07:52

applications

play07:53

include full tolerance partitioning data

play07:56

and aggregate data so it allows user to

play08:00

focus on High level logic of their

play08:02

programs while trusting the framework to

play08:04

smoothly continue the process so in this

play08:07

diagram we're showing the map reduce

play08:11

workflow so first is partitioning and

play08:14

then map intermediate files reduce and

play08:17

aggregate

play08:18

so what is partitioning so in

play08:21

partitioning the data is usually in the

play08:23

form of a big chunk so just like the big

play08:25

data and then

play08:28

partitioning it into smaller portions

play08:31

making it more manageable pieces okay

play08:34

that can be handled

play08:36

um through a map and then map or map

play08:39

worker receives the data in the form of

play08:42

key value pair so we discussed in

play08:45

our discussion that the most of uh no

play08:50

SQL

play08:52

database platform it could also be key

play08:55

value pair so if we try to develop a

play08:58

distributed systems then we are

play09:00

considering we have to consider no SQL

play09:04

platform for our database so this data

play09:07

is processed by the map worker and

play09:09

according to the user defined

play09:10

malfunction to generate intermediate key

play09:12

value pair

play09:13

and then intermediate files the data is

play09:16

partitioned into our partition with r

play09:20

being the numerous reduced workers so

play09:22

these files are buffered in the memory

play09:24

until the primary load for forwards to

play09:28

reduce workers and then reduce so what

play09:31

is reduced as soon as the reduced

play09:34

workers get the data stored in the

play09:36

buffer they sort it accordingly and

play09:39

group data with the same key okay so

play09:43

that's really a good way to manage your

play09:46

data storage and that's why distributed

play09:48

system is really well encouraged for

play09:51

um

play09:52

web applications that could handle big

play09:55

data

play09:56

and it is the future of storage

play09:59

aggregate the primary node is notified

play10:01

when the reduced workers are done with

play10:04

their task and in the end the sorted

play10:07

data is aggregated together and our

play10:10

output files are generated for the user

play10:13

so you see how systematic it is

play10:17

now let's discuss another

play10:21

um

play10:21

another component of distributed system

play10:24

which is stateless and stateful system

play10:27

so stateless and stateful systems are

play10:30

important

play10:31

very important concept of distributed

play10:34

systems

play10:35

okay because a system can be stateless

play10:37

or stateful so we say a stateless system

play10:41

is the one that maintains no state of

play10:44

past event

play10:45

okay and it executes based on the inputs

play10:49

that we provide to it how about stateful

play10:51

systems these are responsible for

play10:54

maintaining and mutating a state so

play10:57

those are stateful and stateless

play11:03

and then another component of

play11:04

distributed system is raft so it's

play11:07

stabbed ref establishes the content of a

play11:10

replicated State machine and Associated

play11:13

replicated log of plants so as first

play11:16

class citizens and support

play11:18

multiple consecutive rounds of consensus

play11:22

by default it requires a set of nodes

play11:24

that form a consensus group or rough

play11:27

clusters

play11:29

so the following can be one of these

play11:33

states leader follower and candidate so

play11:37

in this diagram you can see the

play11:39

different draft

play11:41

okay

play11:45

is the implementation of wrap

play11:48

okay so now let's discuss distributed

play11:51

systems design patterns

play11:53

so what are design patterns it gives us

play11:56

a way to build systems that fit

play11:59

particular use cases so they are like

play12:02

building blocks that allows us to pull

play12:05

from existing knowledge rather than

play12:08

start even systems from scratch so they

play12:11

also create a set of standard models for

play12:13

system design that helps other

play12:15

developers see how their projects can

play12:18

enter interface with the system

play12:22

okay so these are

play12:26

some of the examples so creational

play12:29

design patterns provide Baseline when

play12:31

building new objects so structural

play12:33

patterns Define the overall structure of

play12:37

a solution and behavioral patterns on

play12:40

the other hand describe objects and how

play12:43

they communicate with each other

play12:46

okay so distributed system design

play12:48

patterns outline a software architecture

play12:50

of how different nodes communicate with

play12:53

each other

play12:56

so most of distributed system patterns

play12:59

fall into one of the three categories

play13:01

below object communication so that's one

play13:05

of the types of distributed

play13:08

stem design pattern another is security

play13:11

and then

play13:12

another is event driven so for object

play13:16

communication it describes the messaging

play13:19

protocol and permissions for different

play13:21

components of system and security it

play13:25

handles confidentiality Integrity

play13:28

availability that concerns to ensure the

play13:31

system is secure for authorized

play13:34

and then event driven it describes the

play13:37

production deduction consumption and

play13:39

response to system okay so if you uh try

play13:45

to design your system

play13:48

um and consider

play13:49

um having a a type of distributed system

play13:52

for your development so you have to find

play13:57

or you have to think of

play14:00

um which type of distributed design

play14:02

pattern that is applicable for your case

play14:05

it could be object communication

play14:07

security and event driven and also

play14:11

considering your database applications

play14:14

it has to start with what type of

play14:18

applications you would like to

play14:22

develop it's because you know

play14:26

um no SQL databases May differ from one

play14:31

another there are many types okay four

play14:34

or five types of

play14:36

um

play14:36

no SQL databases and the choice of type

play14:40

that you will consider will depend on

play14:45

what type of data you have and what type

play14:49

of information you would like to start

play14:51

as

play14:53

um no SQL systems is very concerned with

play14:56

data storage so you really need to think

play14:59

of what is the best fit

play15:02

type of no SQL and considering that

play15:08

um

play15:08

a type of no SQL

play15:12

choosing a type of design pattern for

play15:15

distributed system is also one of

play15:18

crucial considerations that you have to

play15:21

do okay so those are Interlink each

play15:23

other

play15:25

okay

play15:26

so now let's continue our discussion and

play15:28

we're going to discuss

play15:31

um

play15:33

the top five uh distributed system

play15:37

design patterns so these are the top

play15:40

five

play15:41

commonly used design patterns so these

play15:43

are tried and tested way of building

play15:45

system that could fit in a particular

play15:48

case

play15:49

uh they are abstract way of structuring

play15:52

your

play15:53

system or designing your system or the

play15:56

structure of your system so most design

play15:59

patterns of development updated over the

play16:01

years and many different Developers

play16:04

are starting

play16:07

very efficient okay

play16:11

design patterns are building blocks that

play16:14

allow programmers to pull from existing

play16:17

knowledge rather than starting from

play16:19

scratch with every system so that is now

play16:22

the new way of systems development okay

play16:26

the old ways of

play16:28

development like starting first from

play16:32

scratch is already gone and in this new

play16:36

era in the Ford Industrial Revolution

play16:39

developmental system

play16:41

encourage code reuse or reusing code

play16:46

which have been used in other

play16:47

applications

play16:49

that's why it is also encouraged in

play16:51

distributed systems

play16:53

so these design patterns

play16:56

also create a set of standard models for

play16:59

systems design that help other

play17:01

developers see how their projects can

play17:03

interface with a given system

play17:05

so there are what we call creational

play17:07

Design patterns

play17:10

that provide a baseline when building a

play17:13

new object so structural patterns Define

play17:15

the overall structure of a solution

play17:18

while on the other hand behavioral

play17:21

patterns describe objects and how they

play17:24

communicate with each other okay so as

play17:28

we can see

play17:29

these different patterns have

play17:32

each specializations okay in terms of

play17:37

Designing different

play17:39

um components okay and different aspects

play17:42

of your system distributed systems

play17:45

design patterns or Design This is used

play17:47

when developing Distributing systems and

play17:50

essentially collections of computers and

play17:53

data centers that act as one computer

play17:56

for the end user and that is how we are

play17:58

feeling it right now like you know the

play18:00

social networker side AWS

play18:04

um it feels like we're using only one

play18:06

computer but you know these computers

play18:08

are located in many different places

play18:10

around the world

play18:11

so just like um Facebook it has

play18:15

it has many it has many data centers

play18:17

over 100 over 200 data centers all

play18:20

around the world and they can they can

play18:23

make us feel that as if we're using a

play18:25

single system it's because of this

play18:27

framework okay the distributed system

play18:30

and this distributed design patterns

play18:33

outline a software architecture for how

play18:35

the different nodes communicate with

play18:37

each other which nodes handle its task

play18:40

and flow for different tasks and that's

play18:43

the reason why distributed system is

play18:46

really strongly encouraged to be one of

play18:49

your framework in systems development

play18:52

and this pattern are widely used when

play18:54

designing distributed architecture those

play18:57

patterns that we have discussed the

play18:58

behavioral pattern the distributed

play19:01

systems design pattern

play19:03

creational design okay these patterns

play19:07

are encouraged

play19:09

okay and as I've said these are used for

play19:12

large-scale computing or large scale

play19:16

um

play19:17

systems web applications like cloud

play19:20

computing and microservice software

play19:23

systems

play19:25

so these are the types of distributed

play19:28

design patterns

play19:41

okay here object communication it

play19:44

describes

play19:46

the messaging protocol and permission

play19:49

ensure that system is secured from an

play19:51

authorized access okay another types is

play19:55

event driven so you can choose with

play19:57

which type of Distributing system you're

play19:59

going to consider based on use case Okay

play20:02

based on your different cases or on the

play20:05

type of system that you're going to

play20:06

develop event driven it it has patterns

play20:10

that describe the production detection

play20:12

consumption and response to system

play20:15

events

play20:18

okay so another component that is

play20:21

related to our

play20:25

okay here

play20:27

so these are the top five distributed

play20:29

systems command and query

play20:32

responsibility segregation or cqrs which

play20:36

focus on separating the read and write

play20:38

operations for distributed systems so

play20:40

this is another

play20:42

um one of the top five distributed

play20:44

system pattern

play20:46

this is one of the mostly used uh

play20:50

distributed system pattern what we call

play20:52

cqrs

play20:54

so it increases the scalability and

play20:56

security and this model uses commands to

play20:59

write data with persistent storage and

play21:01

queries to locate and fetch data okay

play21:06

it is handled by Command Center which

play21:08

receives requests from users and the

play21:11

command centered infectious the data and

play21:13

makes any necessary modification saves

play21:17

the data and notifies the read service

play21:20

so the read Service then updates the

play21:22

read model to show the change to the

play21:24

user

play21:26

what are the advantages of this type of

play21:29

pattern or cqrs it reduces systems

play21:32

complexity by delegating tasks it

play21:35

enforces a clear separation between

play21:37

business logic and validation it helps

play21:40

categorize process by their job

play21:42

it reduces the number of unexpected

play21:44

changes to share data and reduces the

play21:47

number of entities that have modifying

play21:49

access to data how about disadvantages

play21:52

it requires constant back and forth

play21:54

communication between commands and read

play21:56

models can cause increased latency when

play21:59

sending High throughput queries and no

play22:02

means to communicate between service

play22:03

process so

play22:07

how about the use case

play22:09

okay this is our use case related to

play22:12

secure s secure s is best for data

play22:15

intensive applications just like SQL or

play22:17

non-sql database management system it's

play22:20

also helpful for data heavy microservice

play22:22

architecture and it's great for handling

play22:25

stateful applications because read

play22:28

writer reader a writer reader

play22:31

distinction helps with immutable States

play22:34

okay

play22:36

so in another type or another

play22:39

another

play22:41

um distributed system pattern

play22:44

uh next to cqrs okay here is secure as

play22:51

okay yes okay here is two phase commit

play22:55

or two PC

play22:58

so it is similar to cqrs and its

play23:00

transactional approach and Reliance on

play23:02

Central Command but partitions are

play23:04

processed by their type and what stage

play23:07

of completion they're in so the two

play23:10

phases are prepared and complete so

play23:13

that's what we call two phase commit all

play23:15

services in two PC systems are locked by

play23:17

default and meaning they cannot send

play23:20

data so while Lock Service complete the

play23:23

prepare stage so they're ready to send

play23:26

once unlocked and the coordinator

play23:29

unlocked service one by one and request

play23:32

its data

play23:34

the service is not ready to submit its

play23:37

data then the coordinator moves on to

play23:39

another service and once all prepared

play23:43

data has been sent all services unlocked

play23:46

to awake new tasks from the coordinator

play23:50

so these are the two advantages are the

play23:53

advantages of two PCS consistent

play23:55

resistant to error due to lack of

play23:58

concurrent requests scalable and can

play24:02

handle big data pools as easily as it

play24:04

can handle data from single

play24:06

machine and allows for isolation and

play24:09

data sharing at the same time

play24:11

and these are the disadvantages

play24:13

non-fault tolerant prone to bottleneck

play24:16

and blocking due to synchronous nature

play24:21

excuse

play24:24

requires more resources than other

play24:26

design patterns

play24:29

okay

play24:31

so to PC is best for distributed systems

play24:34

that deal with high high stakes

play24:37

transaction

play24:39

that favor accuracy over resource

play24:42

efficiency and it's resistant to error

play24:45

and easy to track mistakes when they

play24:47

occur even as scale so another type of

play24:51

um distributed system design pattern is

play24:54

Saga okay

play24:56

it is a synchronous pattern that does

play24:58

not use a central controller and instead

play25:02

communicates entirely between services

play25:05

so this overcomes some of these

play25:07

advantages of previously covered

play25:09

synchronous pattern

play25:11

Saga uses event bars to allow services

play25:13

to communicate with each other in a

play25:17

micro service systems and the bus sends

play25:20

and receives requests between services

play25:22

and each participating service creates a

play25:26

local transaction so the participating

play25:28

Services then each emit an event of

play25:31

other services to receive and other

play25:35

services all listen for events and the

play25:37

first service to receive the event will

play25:40

perform the required action so you can

play25:43

see now

play25:44

um

play25:45

uh conceptually the flow of how Saga is

play25:49

working and if service fails to complete

play25:51

the action it's the same to other

play25:53

devices

play25:54

and the structure of saga is similar to

play25:57

the PC design in that services are

play26:00

cycled if one cannot complete task

play26:02

however Saga removes the center control

play26:05

element together manage the flow and

play26:07

reduce the number of back and forth

play26:09

communication

play26:10

okay so these are the advantages is

play26:13

individual service can handle much

play26:16

longer transaction and great for

play26:18

distributed system due to

play26:20

decentralization

play26:21

and reduces bottleneck

play26:25

um like peer-to-peer communication

play26:27

between each services

play26:29

disadvantages asynchronous autonomy

play26:32

makes it difficult to track with

play26:33

services

play26:35

which services are doing individual

play26:37

tasks and difficult to debug true to

play26:39

complex orchestration and less service

play26:41

isolation in the previous pattern

play26:45

okay so just like in AWS so AWS uses

play26:50

Saga based design in many functions like

play26:53

step and Lambda function

play26:58

okay

play27:00

so that's all for that's all for our

play27:03

lecture regarding distributed systems

play27:07

design and uh I hope that this lecture

play27:11

has given you

play27:13

um an idea of how to develop

play27:18

um

play27:19

systems design on how how are you going

play27:23

to consider

play27:25

um designing a systems

play27:27

in a distributed design framework or

play27:31

distributed systems design framework or

play27:33

parallel Computing framework that uses

play27:36

um

play27:37

mapreduce and also considering no SQL

play27:41

platform so I hope that this lecture

play27:44

have given you

play27:46

um the chance to connect the dots on how

play27:49

really this big companies were able to

play27:52

develop

play27:54

um applications which are running on

play27:57

different servers and how each of these

play28:00

servers are communicating with each

play28:01

other so thank you very much and I hope

play28:05

that this lecture has given an idea on

play28:08

how to further develop

play28:10

strong systems scalable systems which

play28:14

could uh which could be able to handle a

play28:18

big data and even streaming system

play28:21

thank you very much

Rate This

5.0 / 5 (0 votes)

Related Tags
分布式系统系统设计可扩展性MapReduceNoSQLCQRS模式Saga模式大数据处理云计算微服务
Do you need a summary in English?