The NEED TO KNOW Info On Amazon's Software Development

Continuous Delivery
23 Aug 202314:03

Summary

TLDRIn this video, Dave Farley discusses the importance of learning from successful companies like Amazon in software development. Amazon, deploying changes over 136,000 times daily, has adopted continuous delivery, focusing on source, build, test, and prod phases. They practice trunk-based development and continuous integration, ensuring fast feedback and visibility. The video explores Amazon's approach to testing, including health checks, integration testing, and canary releasing, emphasizing the need for observability and rollback strategies to manage change effectively.

Takeaways

  • 🚀 Amazon deploys changes into production over 136,000 times per day, highlighting the importance of efficient software development practices.
  • 📈 Amazon's early technology was a constraint on growth, illustrating the need for scalable systems as companies expand.
  • 🔄 The shift from relational databases to a distributed model was a key change for Amazon, emphasizing the move towards event-based systems and eventual consistency.
  • 👥 Amazon's adoption of small teams, influenced by Jeff Bezos' 'two-pizza rule', led to the development of their service-oriented architecture.
  • 🔧 Amazon's continuous delivery model includes four fundamental parts: source, build, test, and prod, aligning with the deployment pipeline concept.
  • 🔄 Amazon practices trunk-based development and continuous integration, ensuring visibility of the system's true state at all times.
  • 🛠️ The commit phase in Amazon's model provides fast feedback to developers, crucial for a fine-grained development process.
  • 🔍 Amazon's testing phase includes health checks and integration testing to ensure the release candidate's readiness for production.
  • 🔒 Amazon's approach to microservices involves supporting multiple versions of APIs to manage changes between services.
  • 🌐 Amazon's strategy for dealing with potential race conditions is to optimize for rollbacks and limit the impact of failures through canary releasing and observability.
  • 📚 The script suggests learning from successful companies like Amazon, despite their unique challenges, to improve software development practices.

Q & A

  • What is the common mistake mentioned in software development circles?

    -The common mistake is dismissing successful companies like Google, Amazon, or Facebook, and not learning from their practices.

  • How many times per day does Amazon deploy changes into production?

    -Amazon deploys changes into production more than 136,000 times per day.

  • What was one of the main constraints on Amazon's growth for many years?

    -Amazon's technology, particularly their relational database back system, was the main constraint on the growth of the company for many years.

  • What was the famous initiative introduced by Jeff Bezos that impacted Amazon's development?

    -The famous initiative introduced by Jeff Bezos was the 'two-pizza team' concept, which led to the distributed service model and the genesis of Amazon Web Services.

  • What are the four fundamental parts of Amazon's version of continuous delivery?

    -The four fundamental parts of Amazon's version of continuous delivery are Source, Build, Test, and Prod.

  • How does Amazon ensure that their software is always releasable?

    -Amazon ensures that their software is always releasable by following a deployment pipeline consisting of commit acceptance and production phases, evaluating an independently deployable unit of software.

  • What does Amazon call the phase where they give fast feedback to developers following any change?

    -Amazon calls the phase where they give fast feedback to developers following any change the 'commit phase'.

  • What is Amazon's approach to development in terms of visibility into the system's true state?

    -Amazon practices trunk-based development and full-blown continuous integration, maximizing visibility into the true state of the system at all times.

  • How does Amazon handle testing of their release candidates?

    -Amazon handles testing of their release candidates by deploying them and performing health checks to confirm the system is ready before starting tests.

  • What is the purpose of integration testing, or 'Jazz' as Amazon calls it, in Amazon's deployment pipeline?

    -The purpose of integration testing, or 'Jazz', is to deploy the same bits and bytes representing the release candidate into a production-like test environment and evaluate it in realistic scenarios.

  • How does Amazon manage change between microservices?

    -Amazon microservices support multiple versions of all their APIs as a mechanism to manage change between them.

  • What strategy does Amazon use to cope with potential race conditions between teams?

    -Amazon optimizes for being great at rolling back changes and limits the blast radius of any failure through sophisticated forms of canary releasing and rolling out changes progressively.

Outlines

00:00

🚀 Learning from Amazon's Software Development

The speaker begins by discussing the common oversight of ignoring successful companies like Amazon, Google, and Facebook in software development. They argue that while these companies face unique challenges, there are valuable lessons to be learned from their practices. The speaker highlights Amazon's impressive software deployment pace, with over 136,000 deployments per day. Amazon initially faced scalability issues with their relational database system, which led to the adoption of a distributed service model, the precursor to Amazon Web Services. The speaker recalls an internal presentation by Amazon CTO Werner Vogels, which emphasized the challenges of web-scale computing and the need for a more complex system architecture. The talk inspired the speaker to consider how development should be structured to ensure the correctness of changes, leading to the concept of continuous delivery. The video also acknowledges the sponsors that support the channel's content on continuous delivery and software engineering.

05:03

🔄 Amazon's Continuous Delivery Model

The second paragraph delves into Amazon's adoption of continuous delivery, as described by Amazon engineer Claire Liguri. The speaker outlines the four fundamental parts of Amazon's continuous delivery model: source, build, test, and prod. They align these with their own terminology of deployment pipeline stages: commit acceptance and production phases. The speaker emphasizes the importance of version control in defining releasable units of software and the commit phase's role in providing rapid feedback to developers. Amazon's approach to development, which includes trunk-based development and continuous integration, is highlighted as a way to maintain visibility of the system's true state. The testing phase is detailed, with a focus on health checks, integration testing, and the evaluation of release candidates in a production-like environment. The speaker notes the similarities between Amazon's practices and their own recommendations for continuous delivery, indicating a convergence towards effective software development strategies.

10:03

🔍 Amazon's Advanced Testing Strategies

The final paragraph explores Amazon's advanced testing strategies, including one-box testing for independent team progress and compatibility checks with pre-production APIs. The speaker discusses Amazon's approach to managing multiple API versions as a change management mechanism and raises questions about potential race conditions in compatibility testing. They note Amazon's acceptance of the possibility of discovering issues later in the process and emphasize the company's focus on effective rollback strategies and limiting the impact of failures. The speaker also mentions the use of canary releasing and observability to manage changes and quickly identify problems. The paragraph concludes with a recommendation to read Claire Liguri's detailed post on Amazon's practices and a mention of a Patreon competition for members to host a Q&A session with the speaker.

Mindmap

Keywords

💡Unicorns

In the context of the video, 'unicorns' refers to successful and highly valued startup companies. The speaker suggests that it's a common mistake for software development circles to dismiss these companies, implying that there are valuable lessons to be learned from their success. The video aims to explore what can be learned from Amazon's software development practices.

💡Software Development

This is the overarching theme of the video. Software development is the process of creating software applications or systems. The video discusses how Amazon's approach to software development, including continuous delivery and microservices, has contributed to its success.

💡Continuous Delivery

Continuous Delivery is a software development practice where code changes are automatically built, tested, and prepared for a release to production. It's highlighted as a key strategy employed by Amazon, allowing them to deploy changes into production over 136,000 times per day. The video discusses how Amazon implements this practice.

💡Deployment Pipeline

A deployment pipeline is a series of steps that software goes through in the process of moving from development to production. The video describes how Amazon's pipeline consists of source, build, test, and prod phases, aligning with the concept of continuous delivery.

💡Version Control

Version control is a system that records changes to a file or set of files over time so that developers can track every modification. The video mentions that Amazon ensures everything is under version control to define the versions of all releasable units of software.

💡Trunk-Based Development

Trunk-Based Development is a software development approach where developers merge their changes back into the main branch frequently, rather than using long-lived feature branches. The video notes that Amazon practices trunk-based development to maximize visibility into the system's true state.

💡Continuous Integration

Continuous Integration (CI) is the practice of merging all developers' working copies to a shared mainline several times a day. The video emphasizes Amazon's use of CI to maintain a high level of system visibility and to catch integration issues early.

💡Release Candidate

A release candidate in software development is a version that, while feature-complete, is considered 'complete' enough to be released to production. The video describes how Amazon packages and stores a release candidate after passing all tests in the commit phase.

💡Health Checks

Health checks in the context of software deployment are automated tests to confirm that a system is functioning correctly before starting more extensive testing. Amazon uses health checks to ensure the system is ready before proceeding with tests.

💡Microservices

Microservices is an architectural style that structures an application as a collection of loosely coupled services. The video discusses how Amazon's microservices support multiple versions of APIs to manage change effectively.

💡Canary Releasing

Canary releasing is a technique where a new version of a service is slowly rolled out to a small subset of users before a full release. The video mentions Amazon's use of sophisticated canary releasing to limit the impact of potential failures.

Highlights

The importance of learning from successful companies like Amazon, despite their unique challenges.

Amazon's staggering pace of software deployment, with over 136,000 changes per day.

The evolution of Amazon's technology from a conventional relational database to a distributed service model.

The challenges of web-scale computing and how Amazon addressed them.

The introduction of small two-pizza teams and the genesis of Amazon Web Services.

The necessity of adopting a distributed model for system architecture to overcome scaling issues.

The shift towards event-based systems, asynchrony, and eventual consistency.

The structure of development to provide developers with insight into the correctness of their changes.

Amazon's early adoption of continuous delivery in software development.

The four fundamental parts of Amazon's continuous delivery model: Source, Build, Test, and Prod.

The practice of trunk-based development and continuous integration at Amazon.

The focus on fast, accurate feedback to developers following any change.

The concept of packaging and storing artifacts as release candidates.

The importance of health checks to confirm system readiness before starting tests.

The approach to integration testing, also known as Jazz testing at Amazon.

The strategy of deploying the same bits and bytes into a production-like test environment.

The concept of one-box testing to allow teams to make progress independently.

The management of API versioning and compatibility between microservices.

The approach to handling potential race conditions between teams during compatibility testing.

The emphasis on being excellent at rolling back changes and limiting the blast radius of failures.

The use of sophisticated canary releasing and observability to manage changes.

The independent verification and validation of the continuous delivery model by Amazon.

Transcripts

play00:00

I think that we make a common mistake in

play00:02

software development Circles of

play00:03

dismissing the unicorns I suppose that

play00:06

we often make the mistake of too slowly

play00:07

following them too but that's probably

play00:10

the topic for a different episode one

play00:12

day

play00:13

I can't tell you how many times people

play00:14

have told me don't mention Google or

play00:17

Amazon or Facebook we aren't them

play00:20

I can understand that but also I think

play00:23

it's a little bit of a risky strategy to

play00:25

ignore the lessons of successful

play00:27

companies we can learn from them

play00:30

sure there are things that make the

play00:32

problems these web Monsters Face unique

play00:36

but there are also lessons that we can

play00:38

take away and they grew to the States

play00:41

because they were doing some things

play00:43

right

play00:44

so today I'm interested in what we can

play00:47

learn from how Amazon develops software

play00:53

[Music]

play00:57

hi I'm Dave Farley of continuous

play00:59

delivery welcome to my channel if you

play01:02

haven't been here before please do hit

play01:04

subscribe and if you enjoy the content

play01:06

today hit like as well Amazon of one of

play01:09

the most successful companies in the

play01:10

world and produce software at a

play01:13

staggering Pace these days Amazon

play01:15

deployed change into production more

play01:18

than 136

play01:20

000 times per day

play01:22

that's more than 1.5 times per second

play01:26

but it wasn't always like that Amazon

play01:29

began with a fairly conventional

play01:31

relational database back system written

play01:33

in C plus they grew very quickly so

play01:37

quickly in fact that for many years

play01:38

their technology was the main constraint

play01:42

on the growth of the company

play01:44

I remember sitting in the audience at an

play01:47

internal presentation by Amazon CTO

play01:50

Verna Virgos he was describing the

play01:54

challenges of web scale Computing this

play01:57

was not long after the famous Jeff Bezos

play02:00

email that introduced small two Pizza

play02:02

teams and what became the distributed

play02:06

service model that was the Genesis of

play02:09

Amazon web services

play02:11

in his presentation Werner described

play02:13

several of the challenges and two stuck

play02:16

very clearly in my mind they resonated

play02:19

because at the time I was working in a

play02:22

different context on both of these

play02:23

problems one was that relational

play02:26

databases don't scale

play02:28

my take on that is that the answer to

play02:30

this DB problem was to move to a

play02:33

generally distributed model for the

play02:34

system architecture overall and that

play02:37

tends to lead us into the Realms of

play02:38

event-based systems asynchrony and

play02:40

eventual consistency all stuff I was

play02:43

actively working on at the time

play02:46

the other idea was that if this more

play02:49

complex model of systems is what we

play02:51

really need then how do you structure

play02:53

development so that you can give

play02:55

developers insight into the correctness

play02:57

of their changes

play02:59

at this time I was part way through

play03:02

writing the continuous delivery book and

play03:04

so believed that I knew the answer to

play03:06

this too at the end of his talk Werner

play03:08

said if you know any how to do any of

play03:11

this come and talk to me but I didn't

play03:13

because I was rather enjoying what I was

play03:15

working on at the time and there was a

play03:17

long queue of people waiting to speak to

play03:19

him presumably about other aspects of

play03:21

his talk but it looks like Amazon did

play03:24

okay anyway to me

play03:25

before we go any further let me say

play03:27

thank you to our sponsors we're

play03:29

fortunate to be sponsored by equal

play03:31

experts trisentis and transfig all of

play03:35

these companies offer products and

play03:36

services that are very well aligned with

play03:38

the topics that we discuss here on this

play03:40

channel every week so if you're looking

play03:42

for excellence in continuous delivery

play03:44

and software engineering please do click

play03:47

on the links in the description below

play03:48

and check them out Amazon were fairly

play03:51

early adopters of continuous delivery as

play03:53

a general approach to software

play03:55

development I recently came across a

play03:57

post by Amazon engineer Claire liguri

play04:01

who writes about some of the detail of

play04:03

how they organize their work

play04:05

this isn't the team topologies part or

play04:07

the message-based event-driven micro

play04:09

service part this is the basics of

play04:12

working so that your software is always

play04:15

releasable and keeping it there

play04:16

continuous delivery clay describes the

play04:19

Amazon version of continuous delivery as

play04:21

comprising of four fundamental Parts

play04:24

Source build test and prod my

play04:28

terminology is slightly different but we

play04:30

are talking about exactly the same ideas

play04:32

I describe a deployment pipeline as

play04:35

consisting of commit acceptance and

play04:38

production phases and Define the scope

play04:41

of the pipeline is evaluating an

play04:43

independently Deployable unit of

play04:45

software

play04:46

so I'd map Amazon stuff onto my model

play04:49

rather like this

play04:52

Amazon lists the things they expect to

play04:54

be in the repository that the pipeline

play04:57

evaluates these are the things that

play04:59

define a releasable unit of software

play05:03

everything is under Version Control

play05:05

together

play05:06

so the Version Control defines the

play05:08

versions of all of these things that

play05:10

work and change together

play05:13

this eliminates any problems of

play05:15

dependency management by making the

play05:17

scope of the stuff in a repo the stuff

play05:20

that defines a releasable unit of

play05:22

software

play05:23

what Amazon calls build I'd call the

play05:27

commit phase

play05:29

the job of the commit phase is to give

play05:31

Fast accurate feedback to developers

play05:33

following any change

play05:35

if all these tests pass then the Amazon

play05:38

developer can move on to work on new

play05:40

things

play05:41

at the end of the build or commit step

play05:44

if all of the tests pass they package

play05:46

and store the artifact I call this

play05:49

creating the release candidate but once

play05:52

again despite differences in terminology

play05:54

this is exactly how I Define the job of

play05:57

the commit stage in continuous delivery

play06:00

all of these things are strongly focused

play06:03

on delivering great fast feedback on

play06:05

changes to the development team and so

play06:08

supporting the fine-grained development

play06:11

process there's no hiding change away on

play06:14

feature branches here or the change

play06:16

management theater of git flow Amazon

play06:19

practice trunk-based development and

play06:21

full-blown continuous integration all of

play06:24

the time this maximizes visibility for

play06:27

the development teams into the true

play06:28

state of the system that they're working

play06:30

on at all times

play06:33

the next phase in the deployment

play06:35

pipeline is what I call the acceptance

play06:37

phase but what Amazon just call testing

play06:40

inevitably I suppose I like my words

play06:43

better because there's testing in every

play06:46

phase but the goals of and the

play06:48

fine-grained detail of the approach are

play06:51

still identical to the model that I

play06:53

recommend

play06:54

the build or commit phase produce the

play06:57

artifact or release candidate as the

play07:00

result of all tests passing and from now

play07:03

on any further testing that we do will

play07:05

be evaluating things at the release

play07:07

candidate level

play07:08

rather than at the level of source code

play07:11

so any more testing from now on starts

play07:14

by deploying the risk release candidate

play07:17

and then checking that it's up and

play07:18

running and ready for use

play07:21

Amazon refers to this part as health

play07:23

checks which confirm that the system is

play07:25

ready before starting the tests

play07:28

here's the picture that I use in my

play07:30

training to describe what this process

play07:31

looks like they're almost identical

play07:35

I don't know if my work influenced

play07:37

Amazon but I didn't get this from them

play07:40

so either they got it from me or

play07:43

probably more likely this is an example

play07:46

of convergent evolution

play07:49

this approach works so well that when

play07:52

you apply a disciplined engineering

play07:55

approach to thinking about problems and

play07:57

solving them and trying different things

play08:00

keeping the things that really work and

play08:01

discarding the things that don't now you

play08:04

tend to end up in the same place

play08:06

even if you began from different

play08:08

starting positions in science this kind

play08:12

of thing is called reproducibility and

play08:15

that is one of the strongest assertions

play08:16

of the correctness of your findings that

play08:19

there is

play08:19

again there are some terminology

play08:21

differences but the process is pretty

play08:24

much identical once again if you'd like

play08:26

to learn more about this reproducible

play08:28

world-class approach to software

play08:30

development by the way check out my free

play08:32

introductory training course there's a

play08:33

link to that in the description below

play08:36

Amazon calls the next phase of testing

play08:38

integration testing Jazz and I called

play08:41

its acceptance testing but the goal is

play08:44

the same aim is to deploy the same bits

play08:47

and bytes that represent our release

play08:49

candidate into a production-like test

play08:51

environment and then to evaluate it in

play08:54

realistic lifelike scenarios or as

play08:57

Amazon describe it

play08:58

these tests exercise the full stack end

play09:00

to end by calling Real apis Running on

play09:03

real infrastructure

play09:05

fundamentally what's going on here is

play09:08

completely perfectly in line with the

play09:10

approach that we describe as continuous

play09:11

delivery

play09:13

I describe this phase of the pipeline is

play09:15

evaluating the release ability of our

play09:17

changes and the idea is that this

play09:19

evaluation is completely definitive if

play09:23

the pipeline says pass we're free to

play09:25

release the change into production with

play09:27

no further work to do if it says fail

play09:30

we're not and need to submit a new

play09:32

change to the pipeline to correct the

play09:34

mistake what determines release ability

play09:37

for Amazon might be different to what

play09:40

determines it for you or me but they're

play09:43

doing exactly the same thing here as I

play09:46

do when I when I form a project

play09:49

for the high performance systems that I

play09:51

used to build there's not enough about

play09:52

performance testing here Amazon focuses

play09:55

more explicitly on testing observability

play09:57

than when we did building high

play09:59

performance systems we bundled

play10:01

observability testing in with our

play10:03

regular acceptance testing presumably

play10:05

Amazon bundled performance testing into

play10:08

what they call load testing but these

play10:10

are differences due to the nature of the

play10:13

differences in the business the model of

play10:15

development is still identical Amazon

play10:18

talk about one box testing which is

play10:20

aimed at allowing teams to make progress

play10:23

independently of one another

play10:25

microservices called the production

play10:27

version of the API from Services owned

play10:29

by other teams but in the pipeline they

play10:32

also just check the compatibility with

play10:35

the pre-production version of the those

play10:37

dependent apis too

play10:39

so to be clear Amazon micro Services

play10:42

support multiple versions of all of

play10:44

their apis as a mechanism to manage

play10:47

change between them as I described in

play10:50

this video

play10:51

I'm not entirely clear how Amazon cope

play10:54

with the potential race conditions

play10:55

between teams in this approach to

play10:57

compatibility testing that is what

play11:00

happens if team a checks compatibility

play11:02

with Team B service just as Team B is

play11:05

moving the pre-prod API to being the

play11:07

prod API for example

play11:10

at this scale there isn't going to be a

play11:12

single copy of the truth of the whole

play11:14

system anywhere other than in production

play11:17

this isn't monolith a monolithic testing

play11:19

strategy so that means there are

play11:22

multiple copies of collections of

play11:24

related Services if I understand this

play11:27

correctly the way that Amazon cope with

play11:29

this is to accept that there's no

play11:32

perfect here and accept that for some

play11:35

things they might discover problems

play11:38

later in the process maybe even in

play11:40

production

play11:41

so what they do is to optimize to be

play11:44

great at rolling back changes and to

play11:46

limit the blast radius of any failure

play11:48

that does actually make it into

play11:50

production they accomplish this through

play11:52

sophisticated form of canary releasing

play11:56

rolling out change progressively and

play11:58

having great observability on the

play12:01

assumption that if there is a problem it

play12:03

will be found before the change is

play12:05

rolled out to everyone this isn't the

play12:07

right answer for every kind of business

play12:09

but it's exactly right for Amazon I'd

play12:12

also assume that there are some other

play12:13

forms of contract testing somewhere in

play12:15

the picture of pre-release testing at

play12:18

least that's what I think I would do in

play12:20

their position

play12:21

to further reduce the chances of

play12:23

failures in production turning up even

play12:26

if I was good at containing them and

play12:27

recovering from them

play12:29

there's a lot more of really interesting

play12:32

detail in Claire's post and I recommend

play12:34

that you check it out there's a link to

play12:35

that in the description below too and a

play12:38

lot there's a lot to learn from it but

play12:39

what I find really interesting is just

play12:43

how closely this tracks the model of

play12:45

continuous delivery that I present I

play12:48

guess that I'm not really surprised by

play12:50

all of this since I know that this stuff

play12:51

works better than anything else that

play12:53

we've found so far

play12:54

but I think that the level of detail

play12:56

with which Amazon's model for continuous

play12:58

delivery tracks what I've done and what

play13:01

I describe on this channel is an

play13:03

important independent verification and

play13:06

validation of this model of world-class

play13:09

software development thank you over on

play13:12

patreon WE host a regular q a show where

play13:16

you can send me specific questions and I

play13:18

answer them in a pre-recorded session

play13:20

once per month

play13:21

right now we're hosting a competition

play13:23

for our members where you can win the

play13:25

chance to host the Q a show where you

play13:28

can ask me the questions submitted by

play13:30

members as well as throwing in as many

play13:32

as you like of your own if that's

play13:33

something that you're interested in then

play13:35

enter the competition by becoming a

play13:37

supporter on patreon if you're not

play13:39

already there's a seven day free trial

play13:41

available

play13:42

once you are a member submit a question

play13:45

on our Discord server for the next q a

play13:47

show that's it the deadline for entry

play13:50

will be the 31st of August this year 23.

play13:53

so good luck to you

play13:57

[Music]

Rate This

5.0 / 5 (0 votes)

Etiquetas Relacionadas
Software DevelopmentAmazon StrategiesContinuous DeliveryDevOps PracticesDistributed SystemsMicroservicesEvent-DrivenAsynchronous SystemsReliability TestingRollback Strategies
¿Necesitas un resumen en inglés?