The NEED TO KNOW Info On Amazon's Software Development
Summary
TLDRIn this video, Dave Farley discusses the importance of learning from successful companies like Amazon in software development. Amazon, deploying changes over 136,000 times daily, has adopted continuous delivery, focusing on source, build, test, and prod phases. They practice trunk-based development and continuous integration, ensuring fast feedback and visibility. The video explores Amazon's approach to testing, including health checks, integration testing, and canary releasing, emphasizing the need for observability and rollback strategies to manage change effectively.
Takeaways
- đ Amazon deploys changes into production over 136,000 times per day, highlighting the importance of efficient software development practices.
- đ Amazon's early technology was a constraint on growth, illustrating the need for scalable systems as companies expand.
- đ The shift from relational databases to a distributed model was a key change for Amazon, emphasizing the move towards event-based systems and eventual consistency.
- đ„ Amazon's adoption of small teams, influenced by Jeff Bezos' 'two-pizza rule', led to the development of their service-oriented architecture.
- đ§ Amazon's continuous delivery model includes four fundamental parts: source, build, test, and prod, aligning with the deployment pipeline concept.
- đ Amazon practices trunk-based development and continuous integration, ensuring visibility of the system's true state at all times.
- đ ïž The commit phase in Amazon's model provides fast feedback to developers, crucial for a fine-grained development process.
- đ Amazon's testing phase includes health checks and integration testing to ensure the release candidate's readiness for production.
- đ Amazon's approach to microservices involves supporting multiple versions of APIs to manage changes between services.
- đ Amazon's strategy for dealing with potential race conditions is to optimize for rollbacks and limit the impact of failures through canary releasing and observability.
- đ The script suggests learning from successful companies like Amazon, despite their unique challenges, to improve software development practices.
Q & A
What is the common mistake mentioned in software development circles?
-The common mistake is dismissing successful companies like Google, Amazon, or Facebook, and not learning from their practices.
How many times per day does Amazon deploy changes into production?
-Amazon deploys changes into production more than 136,000 times per day.
What was one of the main constraints on Amazon's growth for many years?
-Amazon's technology, particularly their relational database back system, was the main constraint on the growth of the company for many years.
What was the famous initiative introduced by Jeff Bezos that impacted Amazon's development?
-The famous initiative introduced by Jeff Bezos was the 'two-pizza team' concept, which led to the distributed service model and the genesis of Amazon Web Services.
What are the four fundamental parts of Amazon's version of continuous delivery?
-The four fundamental parts of Amazon's version of continuous delivery are Source, Build, Test, and Prod.
How does Amazon ensure that their software is always releasable?
-Amazon ensures that their software is always releasable by following a deployment pipeline consisting of commit acceptance and production phases, evaluating an independently deployable unit of software.
What does Amazon call the phase where they give fast feedback to developers following any change?
-Amazon calls the phase where they give fast feedback to developers following any change the 'commit phase'.
What is Amazon's approach to development in terms of visibility into the system's true state?
-Amazon practices trunk-based development and full-blown continuous integration, maximizing visibility into the true state of the system at all times.
How does Amazon handle testing of their release candidates?
-Amazon handles testing of their release candidates by deploying them and performing health checks to confirm the system is ready before starting tests.
What is the purpose of integration testing, or 'Jazz' as Amazon calls it, in Amazon's deployment pipeline?
-The purpose of integration testing, or 'Jazz', is to deploy the same bits and bytes representing the release candidate into a production-like test environment and evaluate it in realistic scenarios.
How does Amazon manage change between microservices?
-Amazon microservices support multiple versions of all their APIs as a mechanism to manage change between them.
What strategy does Amazon use to cope with potential race conditions between teams?
-Amazon optimizes for being great at rolling back changes and limits the blast radius of any failure through sophisticated forms of canary releasing and rolling out changes progressively.
Outlines
đ Learning from Amazon's Software Development
The speaker begins by discussing the common oversight of ignoring successful companies like Amazon, Google, and Facebook in software development. They argue that while these companies face unique challenges, there are valuable lessons to be learned from their practices. The speaker highlights Amazon's impressive software deployment pace, with over 136,000 deployments per day. Amazon initially faced scalability issues with their relational database system, which led to the adoption of a distributed service model, the precursor to Amazon Web Services. The speaker recalls an internal presentation by Amazon CTO Werner Vogels, which emphasized the challenges of web-scale computing and the need for a more complex system architecture. The talk inspired the speaker to consider how development should be structured to ensure the correctness of changes, leading to the concept of continuous delivery. The video also acknowledges the sponsors that support the channel's content on continuous delivery and software engineering.
đ Amazon's Continuous Delivery Model
The second paragraph delves into Amazon's adoption of continuous delivery, as described by Amazon engineer Claire Liguri. The speaker outlines the four fundamental parts of Amazon's continuous delivery model: source, build, test, and prod. They align these with their own terminology of deployment pipeline stages: commit acceptance and production phases. The speaker emphasizes the importance of version control in defining releasable units of software and the commit phase's role in providing rapid feedback to developers. Amazon's approach to development, which includes trunk-based development and continuous integration, is highlighted as a way to maintain visibility of the system's true state. The testing phase is detailed, with a focus on health checks, integration testing, and the evaluation of release candidates in a production-like environment. The speaker notes the similarities between Amazon's practices and their own recommendations for continuous delivery, indicating a convergence towards effective software development strategies.
đ Amazon's Advanced Testing Strategies
The final paragraph explores Amazon's advanced testing strategies, including one-box testing for independent team progress and compatibility checks with pre-production APIs. The speaker discusses Amazon's approach to managing multiple API versions as a change management mechanism and raises questions about potential race conditions in compatibility testing. They note Amazon's acceptance of the possibility of discovering issues later in the process and emphasize the company's focus on effective rollback strategies and limiting the impact of failures. The speaker also mentions the use of canary releasing and observability to manage changes and quickly identify problems. The paragraph concludes with a recommendation to read Claire Liguri's detailed post on Amazon's practices and a mention of a Patreon competition for members to host a Q&A session with the speaker.
Mindmap
Keywords
đĄUnicorns
đĄSoftware Development
đĄContinuous Delivery
đĄDeployment Pipeline
đĄVersion Control
đĄTrunk-Based Development
đĄContinuous Integration
đĄRelease Candidate
đĄHealth Checks
đĄMicroservices
đĄCanary Releasing
Highlights
The importance of learning from successful companies like Amazon, despite their unique challenges.
Amazon's staggering pace of software deployment, with over 136,000 changes per day.
The evolution of Amazon's technology from a conventional relational database to a distributed service model.
The challenges of web-scale computing and how Amazon addressed them.
The introduction of small two-pizza teams and the genesis of Amazon Web Services.
The necessity of adopting a distributed model for system architecture to overcome scaling issues.
The shift towards event-based systems, asynchrony, and eventual consistency.
The structure of development to provide developers with insight into the correctness of their changes.
Amazon's early adoption of continuous delivery in software development.
The four fundamental parts of Amazon's continuous delivery model: Source, Build, Test, and Prod.
The practice of trunk-based development and continuous integration at Amazon.
The focus on fast, accurate feedback to developers following any change.
The concept of packaging and storing artifacts as release candidates.
The importance of health checks to confirm system readiness before starting tests.
The approach to integration testing, also known as Jazz testing at Amazon.
The strategy of deploying the same bits and bytes into a production-like test environment.
The concept of one-box testing to allow teams to make progress independently.
The management of API versioning and compatibility between microservices.
The approach to handling potential race conditions between teams during compatibility testing.
The emphasis on being excellent at rolling back changes and limiting the blast radius of failures.
The use of sophisticated canary releasing and observability to manage changes.
The independent verification and validation of the continuous delivery model by Amazon.
Transcripts
I think that we make a common mistake in
software development Circles of
dismissing the unicorns I suppose that
we often make the mistake of too slowly
following them too but that's probably
the topic for a different episode one
day
I can't tell you how many times people
have told me don't mention Google or
Amazon or Facebook we aren't them
I can understand that but also I think
it's a little bit of a risky strategy to
ignore the lessons of successful
companies we can learn from them
sure there are things that make the
problems these web Monsters Face unique
but there are also lessons that we can
take away and they grew to the States
because they were doing some things
right
so today I'm interested in what we can
learn from how Amazon develops software
[Music]
hi I'm Dave Farley of continuous
delivery welcome to my channel if you
haven't been here before please do hit
subscribe and if you enjoy the content
today hit like as well Amazon of one of
the most successful companies in the
world and produce software at a
staggering Pace these days Amazon
deployed change into production more
than 136
000 times per day
that's more than 1.5 times per second
but it wasn't always like that Amazon
began with a fairly conventional
relational database back system written
in C plus they grew very quickly so
quickly in fact that for many years
their technology was the main constraint
on the growth of the company
I remember sitting in the audience at an
internal presentation by Amazon CTO
Verna Virgos he was describing the
challenges of web scale Computing this
was not long after the famous Jeff Bezos
email that introduced small two Pizza
teams and what became the distributed
service model that was the Genesis of
Amazon web services
in his presentation Werner described
several of the challenges and two stuck
very clearly in my mind they resonated
because at the time I was working in a
different context on both of these
problems one was that relational
databases don't scale
my take on that is that the answer to
this DB problem was to move to a
generally distributed model for the
system architecture overall and that
tends to lead us into the Realms of
event-based systems asynchrony and
eventual consistency all stuff I was
actively working on at the time
the other idea was that if this more
complex model of systems is what we
really need then how do you structure
development so that you can give
developers insight into the correctness
of their changes
at this time I was part way through
writing the continuous delivery book and
so believed that I knew the answer to
this too at the end of his talk Werner
said if you know any how to do any of
this come and talk to me but I didn't
because I was rather enjoying what I was
working on at the time and there was a
long queue of people waiting to speak to
him presumably about other aspects of
his talk but it looks like Amazon did
okay anyway to me
before we go any further let me say
thank you to our sponsors we're
fortunate to be sponsored by equal
experts trisentis and transfig all of
these companies offer products and
services that are very well aligned with
the topics that we discuss here on this
channel every week so if you're looking
for excellence in continuous delivery
and software engineering please do click
on the links in the description below
and check them out Amazon were fairly
early adopters of continuous delivery as
a general approach to software
development I recently came across a
post by Amazon engineer Claire liguri
who writes about some of the detail of
how they organize their work
this isn't the team topologies part or
the message-based event-driven micro
service part this is the basics of
working so that your software is always
releasable and keeping it there
continuous delivery clay describes the
Amazon version of continuous delivery as
comprising of four fundamental Parts
Source build test and prod my
terminology is slightly different but we
are talking about exactly the same ideas
I describe a deployment pipeline as
consisting of commit acceptance and
production phases and Define the scope
of the pipeline is evaluating an
independently Deployable unit of
software
so I'd map Amazon stuff onto my model
rather like this
Amazon lists the things they expect to
be in the repository that the pipeline
evaluates these are the things that
define a releasable unit of software
everything is under Version Control
together
so the Version Control defines the
versions of all of these things that
work and change together
this eliminates any problems of
dependency management by making the
scope of the stuff in a repo the stuff
that defines a releasable unit of
software
what Amazon calls build I'd call the
commit phase
the job of the commit phase is to give
Fast accurate feedback to developers
following any change
if all these tests pass then the Amazon
developer can move on to work on new
things
at the end of the build or commit step
if all of the tests pass they package
and store the artifact I call this
creating the release candidate but once
again despite differences in terminology
this is exactly how I Define the job of
the commit stage in continuous delivery
all of these things are strongly focused
on delivering great fast feedback on
changes to the development team and so
supporting the fine-grained development
process there's no hiding change away on
feature branches here or the change
management theater of git flow Amazon
practice trunk-based development and
full-blown continuous integration all of
the time this maximizes visibility for
the development teams into the true
state of the system that they're working
on at all times
the next phase in the deployment
pipeline is what I call the acceptance
phase but what Amazon just call testing
inevitably I suppose I like my words
better because there's testing in every
phase but the goals of and the
fine-grained detail of the approach are
still identical to the model that I
recommend
the build or commit phase produce the
artifact or release candidate as the
result of all tests passing and from now
on any further testing that we do will
be evaluating things at the release
candidate level
rather than at the level of source code
so any more testing from now on starts
by deploying the risk release candidate
and then checking that it's up and
running and ready for use
Amazon refers to this part as health
checks which confirm that the system is
ready before starting the tests
here's the picture that I use in my
training to describe what this process
looks like they're almost identical
I don't know if my work influenced
Amazon but I didn't get this from them
so either they got it from me or
probably more likely this is an example
of convergent evolution
this approach works so well that when
you apply a disciplined engineering
approach to thinking about problems and
solving them and trying different things
keeping the things that really work and
discarding the things that don't now you
tend to end up in the same place
even if you began from different
starting positions in science this kind
of thing is called reproducibility and
that is one of the strongest assertions
of the correctness of your findings that
there is
again there are some terminology
differences but the process is pretty
much identical once again if you'd like
to learn more about this reproducible
world-class approach to software
development by the way check out my free
introductory training course there's a
link to that in the description below
Amazon calls the next phase of testing
integration testing Jazz and I called
its acceptance testing but the goal is
the same aim is to deploy the same bits
and bytes that represent our release
candidate into a production-like test
environment and then to evaluate it in
realistic lifelike scenarios or as
Amazon describe it
these tests exercise the full stack end
to end by calling Real apis Running on
real infrastructure
fundamentally what's going on here is
completely perfectly in line with the
approach that we describe as continuous
delivery
I describe this phase of the pipeline is
evaluating the release ability of our
changes and the idea is that this
evaluation is completely definitive if
the pipeline says pass we're free to
release the change into production with
no further work to do if it says fail
we're not and need to submit a new
change to the pipeline to correct the
mistake what determines release ability
for Amazon might be different to what
determines it for you or me but they're
doing exactly the same thing here as I
do when I when I form a project
for the high performance systems that I
used to build there's not enough about
performance testing here Amazon focuses
more explicitly on testing observability
than when we did building high
performance systems we bundled
observability testing in with our
regular acceptance testing presumably
Amazon bundled performance testing into
what they call load testing but these
are differences due to the nature of the
differences in the business the model of
development is still identical Amazon
talk about one box testing which is
aimed at allowing teams to make progress
independently of one another
microservices called the production
version of the API from Services owned
by other teams but in the pipeline they
also just check the compatibility with
the pre-production version of the those
dependent apis too
so to be clear Amazon micro Services
support multiple versions of all of
their apis as a mechanism to manage
change between them as I described in
this video
I'm not entirely clear how Amazon cope
with the potential race conditions
between teams in this approach to
compatibility testing that is what
happens if team a checks compatibility
with Team B service just as Team B is
moving the pre-prod API to being the
prod API for example
at this scale there isn't going to be a
single copy of the truth of the whole
system anywhere other than in production
this isn't monolith a monolithic testing
strategy so that means there are
multiple copies of collections of
related Services if I understand this
correctly the way that Amazon cope with
this is to accept that there's no
perfect here and accept that for some
things they might discover problems
later in the process maybe even in
production
so what they do is to optimize to be
great at rolling back changes and to
limit the blast radius of any failure
that does actually make it into
production they accomplish this through
sophisticated form of canary releasing
rolling out change progressively and
having great observability on the
assumption that if there is a problem it
will be found before the change is
rolled out to everyone this isn't the
right answer for every kind of business
but it's exactly right for Amazon I'd
also assume that there are some other
forms of contract testing somewhere in
the picture of pre-release testing at
least that's what I think I would do in
their position
to further reduce the chances of
failures in production turning up even
if I was good at containing them and
recovering from them
there's a lot more of really interesting
detail in Claire's post and I recommend
that you check it out there's a link to
that in the description below too and a
lot there's a lot to learn from it but
what I find really interesting is just
how closely this tracks the model of
continuous delivery that I present I
guess that I'm not really surprised by
all of this since I know that this stuff
works better than anything else that
we've found so far
but I think that the level of detail
with which Amazon's model for continuous
delivery tracks what I've done and what
I describe on this channel is an
important independent verification and
validation of this model of world-class
software development thank you over on
patreon WE host a regular q a show where
you can send me specific questions and I
answer them in a pre-recorded session
once per month
right now we're hosting a competition
for our members where you can win the
chance to host the Q a show where you
can ask me the questions submitted by
members as well as throwing in as many
as you like of your own if that's
something that you're interested in then
enter the competition by becoming a
supporter on patreon if you're not
already there's a seven day free trial
available
once you are a member submit a question
on our Discord server for the next q a
show that's it the deadline for entry
will be the 31st of August this year 23.
so good luck to you
[Music]
Voir Plus de Vidéos Connexes
CI/CD In 5 Minutes | Is It Worth The Hassle: Crash Course System Design #2
ISTQB FOUNDATION 4.0 | Tutorial 11 | TDD, BDD, ATDD | DevOps and Testing | CTFL | TM SQUARE
Software Testing Explained in 100 Seconds
Day-18 | What is CICD ? | Introduction to CICD | How CICD works ? | #devops #abhishekveeramalla
AI And Machine Learning In Testing: A Roadmap For The Future | Bhavani R
Introduction to DevOps - CI / CD
5.0 / 5 (0 votes)