AWS re:Invent 2022 - A close look at AWS Fargate and AWS App Runner (CON406)
Summary
TLDRArchana Srikanta, a principal engineer at AWS, discusses the evolution of AWS container services, from EC2 to App Runner, highlighting the shared responsibility model and the architectural advancements that enable higher abstraction services like Fargate and App Runner. She also delves into the security and availability considerations that underpin these services.
Takeaways
- 😀 Archana Srikanta, a principal engineer at AWS, has been instrumental in the development of container services including founding roles in App Runner and Fargate.
- 🚀 The evolution of AWS services from EC2 to App Runner has been driven by a desire to abstract complexity away from the user, allowing easier deployment and management of applications.
- 💡 AWS services are designed with a shared responsibility model, where the availability and security of applications are a joint responsibility between AWS and the customer.
- 🛠️ Elastic Beanstalk simplified the process of deploying applications by orchestrating various AWS services, reducing the need for customers to manage individual components.
- 📦 The rise of containerization led to the development of ECS, which abstracted the container orchestration control plane, making it easier for customers to run containers without managing the orchestration layer.
- 🌟 Fargate introduced a serverless container offering, removing the need for customers to manage the underlying EC2 instances and base layer software, further simplifying the deployment process.
- 🌐 App Runner is the latest service, focusing on web applications and abstracting even further by managing containers, load balancers, auto-scaling, and deployment pipelines.
- 🔒 Security is a key consideration, with AWS implementing strict controls such as security groups and private service endpoints to ensure multi-tenant isolation and prevent unauthorized access.
- 🔄 Availability is ensured through a cellular architecture within AWS, with multiple copies of services running across different availability zones to minimize the impact of any single point of failure.
- 🛡️ Firecracker, an open-source virtualization software by Amazon, is used in Fargate to create microVMs that provide fast startup times and strong isolation for containers.
- 🔑 AWS encourages the use of the highest abstraction services suitable for an application, leveraging the security and availability measures built into the platform, and only moving down the stack if necessary.
Q & A
What is Archana Srikanta's role and experience at AWS?
-Archana Srikanta is a principal engineer at AWS with over 11 years of tenure, a large part of which has been with the container services organization. She has worked on multiple container services and was part of the founding team for App Runner and Fargate.
What is the significance of App Runner and Fargate in AWS's container services?
-App Runner and Fargate are significant as they represent the evolution of AWS's container services. App Runner is the newest service offering the highest level of abstraction, while Fargate is a serverless container offering that abstracts away the underlying EC2 instances.
How does the architecture of newer AWS services like App Runner and Fargate build upon the older ones?
-The architecture of newer services like App Runner and Fargate has layered on top of the foundations laid by predecessor services. For instance, App Runner is built on top of Fargate, which in turn is built on top of ECS, showing a progression of abstraction and simplification.
What is the shared responsibility model on AWS, and how does it apply to the discussed services?
-The shared responsibility model on AWS is a concept where the availability and security posture of an application is a joint responsibility between the customer and AWS. Different aspects of the stack are owned by either party, and this model applies to all discussed services, with the division of responsibilities shifting as abstraction layers increase.
How did the evolution from EC2 to Elastic Beanstalk address customer concerns about managing infrastructure?
-The evolution to Elastic Beanstalk addressed customer concerns by providing a central orchestration plane that simplifies the process of managing and stitching together various AWS services. It automated the creation and provisioning of resources, reducing the complexity for customers running applications.
What is the role of Firecracker in the context of Fargate and container services?
-Firecracker is an open-source virtualization software project by Amazon that serves as a hypervisor specifically built for containers and functions. It is used in Fargate to spin up micro VMs, which are optimized for fast startup times and provide EC2 instance-level isolation between workloads.
How does App Runner simplify the process of running web applications compared to other services?
-App Runner simplifies the process by abstracting away the need to manage containers, load balancers, auto scaling groups, and deployment pipelines. Customers only need to focus on their application image, and App Runner handles the rest, providing a URL endpoint for HTTP requests that scales automatically.
What security measures are in place to ensure multi-tenancy isolation in App Runner and Fargate?
-Both App Runner and Fargate implement strict security measures such as using security groups to block task-to-task communication and ensuring that each task runs in its own micro VM with separate network interfaces. This maintains a high level of isolation between tenants and prevents unauthorized access or communication between tasks.
How does the ECS control plane ensure security and availability for its services?
-The ECS control plane ensures security through a cellular architecture that runs multiple copies of its stack within a region, with each service spread across different availability zones. This design minimizes the impact of any single point of failure and allows for regional independence, protecting against outages and software deployment errors.
What is the advice given for customers deciding which AWS service to use for their container applications?
-The advice given is to start with the highest abstraction service that meets their needs and only move down the stack if there are specific reasons why the higher-level services are not suitable. This approach allows customers to take advantage of the security and availability measures built into the higher abstraction services.
Outlines
😀 Introduction to AWS Container Services
Archana Srikanta, a principal engineer at AWS, introduces the session with her experience of over 11 years at AWS, mainly with container services. She outlines the journey from EC2 to App Runner, highlighting her involvement in founding services like App Runner and Fargate. The session aims to delve into the evolution of product ideas and the layered architecture of these services, emphasizing security and availability as key design influences. The use-case of a web application is presented to explore the application of AWS shared responsibility model across different services.
🚀 Evolution of AWS Compute Services
This paragraph discusses the evolution of AWS compute services, starting with EC2, which managed physical servers and virtualization software, leaving customers responsible for the VMs and associated software. Elastic Beanstalk was introduced to simplify the process by orchestrating resources through a cloud formation template. The rise of container technology led to customers using Docker or similar runtimes, which while efficient on a single instance, posed challenges in orchestration at scale. This led to the emergence of container orchestrators like Mesos and Kubernetes, which were complex to manage, identifying a need for AWS to step in and offer solutions.
🛠️ The Emergence of ECS and Fargate
The paragraph explains the launch of ECS in 2015, which moved the container orchestration control plane to AWS's responsibility, simplifying the process for customers. However, customers still managed load balancing, auto scaling, and deployment pipelines. Fargate, introduced in 2017, further abstracted these responsibilities by offering a serverless container service, eliminating the need for customers to manage EC2 instances or base layer software, and allowing them to focus solely on their containerized applications.
🌐 App Runner: Simplified Web Application Deployment
App Runner is highlighted as a service designed to simplify the deployment of web applications. It abstracts away the complexities of container management, load balancing, auto scaling, and CI/CD pipelines. Customers can deploy applications directly from GitHub or prebuilt container images in ECR, with App Runner handling the build process, containerization, and service creation. The service provides a URL endpoint that scales automatically with traffic, abstracting away the underlying infrastructure and network configurations.
🔧 Under-the-Hood: App Runner and Fargate Architecture
The paragraph delves into the technical architecture behind App Runner and Fargate, discussing the use of Firecracker, an open-source virtualization software by Amazon, which creates micro VMs for container and function deployment. It explains how Firecracker optimizes startup times and maintains traditional VM-level isolation, ensuring security and efficient resource utilization. The architecture includes a service VPC, managed language runtimes, and a detailed network configuration involving ENIs for connectivity to the customer's VPC.
🔒 Security Considerations in AWS Services
This section focuses on the security measures implemented in AWS services, particularly for App Runner and Fargate. It discusses the use of security groups to prevent task-to-task communication and the introduction of private service endpoints for App Runner. The paragraph also covers Fargate's data plane security, detailing the isolation provided by Firecracker micro VMs and the strict controls in place to ensure multi-tenant isolation and secure communication with ECS control plane.
🌐 ECS Control Plane Security and Availability
The paragraph discusses the security and availability of the ECS control plane, emphasizing the importance of protecting the state of ECS and ensuring it is accessed and modified in a secure manner. It explains the use of instance roles for identity verification and the implementation of limits and throttles to ensure fairness and protect the service from potential misuse. The availability strategy includes a cellular architecture with multiple copies of the stack within a region, spread across AZs to minimize the impact of failures.
🛡️ Data Plane Security and Availability for Fargate and App Runner
The final paragraph addresses the security and availability of the data plane for both Fargate and App Runner. It describes the zonal VPCs used by Fargate to ensure that no communication is allowed between instances, and the use of separate network interfaces for each micro VM. For App Runner, it outlines the cellular architecture with components striped across multiple AZs for high availability. The paragraph concludes by emphasizing the importance of starting with the highest abstraction service for container applications and understanding the security and availability considerations built into these services.
📚 Conclusion and Recommendations
Archana concludes the session by advising users to start with the highest abstraction service for their container applications and only move down the stack if necessary. She encourages users to leverage the security and availability measures built into AWS services and to provide feedback for continuous improvement. The emphasis is on utilizing the higher abstraction services to take advantage of the work done by AWS teams to ensure a robust and secure deployment environment.
Mindmap
Keywords
💡AWS
💡Container Services
💡App Runner
💡Fargate
💡ECS
💡Shared Responsibility Model
💡Elastic Beanstalk
💡Firecracker
💡MicroVMs
💡Security Groups
💡Cellular Architecture
Highlights
Archana Srikanta, a principal engineer at AWS, shares insights on the evolution of container services at AWS.
App Runner and Fargate are highlighted as key services, with Archana being part of their founding teams.
The talk covers the progression from EC2 to App Runner, detailing how newer services are built on the foundations of older ones.
The architecture of these services is discussed, showing how they layer on top of each other, with newer services leveraging the base laid by predecessors.
Security and availability are emphasized as key tenets in the design of AWS services.
The shared responsibility model on AWS is explained, where the customer and AWS share the responsibility for the application's security and availability.
EC2 is described as the original compute service, with customers managing the VMs and associated software.
Elastic Beanstalk is introduced as a service that simplifies the management of resources by automating the orchestration.
The rise of container technology and its benefits, such as faster start times and better resource utilization, are discussed.
ECS (Elastic Container Service) is presented as a solution that moves the container orchestration control plane into AWS's responsibility, simplifying customer tasks.
Fargate is introduced as a serverless container offering, further abstracting the underlying instance management from the customer.
App Runner is described as focusing on web applications, abstracting even the container layer, making deployment as simple as making an API call.
The under-the-hood architecture of App Runner is detailed, including the use of VPC, managed language runtimes, and Fargate tasks.
Firecracker, an open-source virtualization software by Amazon, is explained as a key technology used in Fargate for running micro VMs.
The security measures in place for App Runner, Fargate, and ECS are discussed, including the use of security groups and private service endpoints.
The availability architecture of ECS, Fargate, and App Runner is described, emphasizing the cellular design and zonal distribution to ensure service resilience.
Archana recommends starting with the highest abstraction service for container applications and only moving down the stack if necessary.
The importance of leveraging the security and availability measures provided by AWS's higher abstraction services is emphasized.
Transcripts
- Thank you for joining this late afternoon session today.
My name is Archana Srikanta,
I'm a principal engineer at AWS.
I've been with AWS for over 11 years now,
and a large part of that tenure
has been with the container services org.
So I've actually had the good fortune of working
on multiple container services during that tenure.
And in fact,
I've actually rotated through all of these services
that we're gonna talk about today, at some point.
And, you know,
App Runner and Fargate are especially close to my heart
'cause I was one of the founding engineers,
part of the founding team for those services.
So I wanna start today by telling you a little bit
of the story of these services
and how the product ideas for these services kind of evolved
from one service to the next.
You know, starting from EC2,
which is our original compute service,
all the way to App Runner,
which is kind of the newest service on the block.
And then we'll pull the curtains back
and look at the under-the-hood architecture
for these services.
And you'll see that it's not just the product ideas
that, kind of, have built on top of each other,
but the actual architecture itself
has kind of layered on top of each other.
So newer services have been built on top of foundations
that we laid with predecessor services.
And for the under-the-hood part of this talk,
we'll go backwards.
So we'll start with App Runner,
which is kind of the highest abstraction service,
and then we'll see how that's built on top of Fargate,
and how Fargate kind of built on top of ECS.
And finally,
security and availability are kind of the key tenets
that we apply at Amazon across all AWS services,
across all Amazon services.
So, we'll go over how, for these architectures,
for these particular services,
how kind of security and availability played a role
in influencing the design.
All right, so for the the product idea evolution,
we're gonna use this use-case of a web application.
So this is your standard kind of http server
that listens on a socket
and responds to http requests, right?
And generally when you think
of how you would run such an application, you have a VM,
you install an operating system on it,
you pick a language of your choice
to write the application in,
and then the app sits on top of it
and you're probably running multiple copies of this stack
for scale and redundancy.
So you put a load balancer in front of it,
you put an auto scaling group around it,
and you probably have some build deployment pipeline.
Now before we see how to run this on AWS, you know,
I wanna pause and talk a little bit
about the shared responsibility model on AWS.
And some of you might have heard about this already,
but the spirit of shared responsibilities
is that no matter what AWS service you use,
not just specific to the ones we're gonna talk about today,
in most cases the availability
and the security posture of your application
at the end of the day is a joint responsibility between you,
the customer, and us, AWS.
So there's gonna be parts of the stack that we own,
the responsibility, and there will be parts of the stack
that you will own the responsibility for.
And we're gonna use this lens of shared responsibility
to go through each of the services
in the context of that web application.
So with EC2, this is the original compute service.
EC2 basically took responsibility
for running the physical servers and data centers
and the virtualization software
that runs on these physical servers.
But you as the customer, you still own the VMs,
what we call EC2 instances.
You still own all the software
that runs inside these instances.
And you know, you own hooking up the load balancer,
the auto scaling group and,
and the build deployment pipeline around these instances.
Now you can use AWS managed services,
like application load balancer etcetera,
but kind of tying it all together
and making sure that it's configured properly
is still your responsibility at the end of the day.
And you know,
a lot of our EC2 customers came back and told us,
especially the ones that weren't, you know,
core infrastructure admin personas,
they said that this is still a lot of things
that you as the customer have to tie together.
It's a lot of things that you have to get right
and it's a lot of services that you have to learn
to run just a super simple web application.
So in 2011 we launched Elastic Beanstalk.
And what Beanstalk did is it said, okay,
you don't have to go to each of these individual services
and learn how to, kind of, stitch it all together.
We will do that for you
as a central kind of orchestration plane,
which is Beanstalk.
So you can just go to Elastic Beanstalk,
you can describe your application
and the environment in those terms.
And Beanstalk will basically create
a cloud formation template behind the scenes
and, you know, deploy and provision all of these resources
behind the scenes in your account.
So the responsibility line here still doesn't shift
because these resources still end up running
in your account at the end of the day.
So you have full access to these resources,
you can go in and customize various aspects
of these resources if you wanna change things.
So in that sense, you as the customer
still own the responsibility for these components.
And then around 2013, 2014 on EC2,
this is before AWS had any container services available,
on EC2 we started seeing
that containers were starting to become popular.
And a lot of our customers
for this web application type use cases,
they were actually using container technology.
And what does that mean?
They would, you know,
install a container runtime like Docker
or Container D etcetera,
and they would basically decouple the app packaging
from the OS.
So instead of building a monolithic army,
they decoupled the app, you know,
layered it with the language runtime
and they would build a container image
and deploy it as containers on these instances.
And because containers provide some amount
of resource isolation,
you can actually co-locate multiple apps,
multiple copies of the same app or even multiple, you know,
different apps within the same instance.
So you get, you know,
all the wonderful benefits of containers,
which is, you know, fast start times,
better fleet utilization, etcetera, etcetera.
Now this looks fine if you're looking at one instance,
but if you're looking at, you know,
hundreds of instances and thousands of applications,
like many customers were doing,
the actual orchestration and placement logic
becomes a fairly complex software problem.
So when you know workload request comes in,
how do you find the right spot on the right instance
to actually go launch this application?
And so we saw a lot of these container orchestrator projects
start to crop up, Mesos was a popular one,
Kubernetes is a big one today.
And a lot of these orchestrators
were basically large open source projects.
And what customers were doing is that, you know,
they would install a orchestrator specific agent
on this instance,
and then the agent typically talks to a control plane,
which is a much larger, beefier piece of software
that has all the smarts for running the placement logic,
and you know, the execution of the container orchestration.
So these container orchestrators,
they're not easy pieces of software, right?
These are large services that you have to run
and customers were running these control planes themselves.
So they would actually launch more instances
and, you know, run these open source projects themselves.
And that was kind of the first opportunity we spotted
and it just felt wrong
that customers have to run more instances
to manage their existing instances.
So with ECS, which was released in 2015, we basically moved
the container orchestration control plane bit
down under the boundary
into the AWS side of the responsibility.
So on the EC2 instance, you install an ECS agent
and your instances is basically registered with our service
and then you can just speak our service APIs
to launch containers on your instances.
Now, you know, this is, oh,
and then you still own, of course,
the load balancing and the auto scaling CICD
because the instances are all still running in your account.
You still, kind of,
have to hook up all of these things together.
And actually some of the problems
got even more complicated with containers
because now, auto scaling for example,
you're not just auto scaling your instances
because you've decoupled the container from the instance.
You have to auto scale your instance fleet
and then you have to auto scale the containers
on top of that.
A similar build it and fort build and deployment pipeline,
there's software that goes on the instance
that you need a pipeline for.
There's software that goes in the container, etcetera.
So there was still a lot of stuff
that was in the customer side of this responsibility line.
Not to mention just the, you know, the OS patching,
the runtime patching, agent patching.
All of that non-trivial amount of work
for someone who just wants to run a web app.
So in 2017, what we did is we moved this line even higher,
with Fargate,
and Fargate was our serverless containers offering.
And what that means is that we said, you as the customer,
if you wanna run a container,
you don't have to ever launch an EC2 instance.
So we took responsibility of the underlying instance,
we took responsibility of all that base layer software
that's running on the instance.
We run a Fargate agent, it's slightly modified version
of the open source ECS agent that we make available,
but we're gonna call it the Fargate agent.
So you as the customer,
you can really only speak in the currency of containers
and you don't have to worry about the instance layer at all.
There is still a certain aspect
of load balancing auto scaling CICD
that you don't have to do at the instance level,
but at the containers level,
you still have to kind of hook everything up together.
So with App Runner, which, like I said,
one of our newer services,
what we did was, we said,
let's focus on this use case of web applications
and see for that specific vertical,
how can we make things even easier for you, the customer?
So we moved that responsibility line even higher
and we said, you know,
you don't have to run the containers even, so you know,
we'll talk about what this experience looks like,
but basically you don't have to run the containers
in your account, you don't have to run the load balancer,
you don't have to worry about the auto scaling groups,
you don't have to worry about deployment pipelines.
Really all you are responsible for is your application image
and all the software that goes in the application image.
So what does this experience look like with App Runner?
So your teams can either start with source code
directly in GitHub
or you can start with prebuilt container images in ECR.
But basically you have to give us permission
to access your artifacts.
So if it's source code in GitHub,
you have to create a connection object,
but if it's a image in ECR,
you have to create an IM role that gives us permissions
and then we will pull it down,
and you just have to make one API call,
the create service API call,
and you get a URL in return,
against which your clients can start making http requests.
And like I said, you won't see the instances,
you won't see the Fargate tasks or the containers.
You won't see the load balancer,
you won't see the auto scaling group.
You just see this end point,
against which you can make requests
and everything magically scales
as you start to send more requests to that end point.
So what's going on under the hood of all this magic?
So, you know, it's not magic.
We have a VPC that we run behind the scenes.
We we're gonna call it the App Runner, our service VPC.
And if you're starting with source code,
we basically have managed language run times
that we make available to you.
So you don't even have to to worry about the language layer
of your application.
We will layer it onto the runtime that that we provide
and then we'll pump it through a build process
and then we'll generate a container image for your app.
Of course, if you're starting with a container image,
we just copy it over into our account
and then we basically deploy these as Fargate tasks in our,
you know, App Runner owned service account.
Now these Fargate tasks have to have some networking.
So they have a, because they live in our service VPC,
they have their primary ENI is attached
to the App Runner service VPC,
but they also have a secondary network interface
that is attached to your VPC.
So the application that you bring to us,
if it needs to talk to a private database
or something in your VPC,
it uses the secondary network interface to do that.
So what happens when you actually send a request?
So when your clients send a request to that URL
that I talked about,
the URL basically gets resolved.
We use Rev53 behind the scenes,
to an NLB network load balancer that we run in our account.
The NLB basically forwards it to an L7 request router
and the L7 request router will then forward it
to the Fargate tasks that we've spun up
for your service through that primary ENI
that we talked about.
So that's kind of the picture of what's going on
at the App Runner level.
So what's actually going on behind the scenes
of these Fargate tasks that we're launching.
So now we're going into, kind of,
the Fargate team's responsibility.
Before I talk about Fargate,
I want to introduce this technology
that we use called Firecracker.
Firecracker is a open source virtualization software project
by Amazon, and it's basically a hypervisor
that's purpose built for, specifically for,
containers and functions.
And Fargate uses it and Lambda uses it.
So what's actually going on here?
So you can take any bare metal server
and then instead of the hypervisor,
you basically install Firecracker
and Firecracker will spin up what we call micro VMs.
Now these are special,
they're not traditional VMs,
they're different from traditional VMs,
that's why they're called micro VMs
because what we've done, is we've basically ripped out
some of the, kind of, devices and things
from a traditional VM,
in order to optimize for the startup time.
So these are things that add latency to VM bootstrap.
And by doing that,
and these are devices that, you know,
especially for abstracted workloads like containers
and functions in the cloud,
a lot of these devices aren't used.
So we could kind of safely pair that down
to gain advantages on startup times.
So with these Firecracker micro VMs,
we were able to go from, you know,
traditional VM bootstrap time is, you know,
probably tens of seconds if not minutes
to basically sub-second bootstrap time.
So we can launch these micro VMs, basically,
just in time as those requests are coming in
to launch containers, right?
And so that's how they're different from traditional VMs,
but how are they similar to traditional VMs?
So the thing about these micro VMs
is that the boundary between two micro VMs
that are running on the same bare metal server,
it's still using the same traditional, kind of,
VM level isolation.
So we can safely, you know,
co-locate multiple VMs on the same same bear metal server
and you basically get, you know,
EC2 instance level isolation between between workloads
that are running on the same server.
And as you can see in this picture, you know,
each micro VM has its own guest kernel.
So you know, if we put multiple Fargate tasks,
they're not sharing the guest dose
or the guest kernel at all.
So how has this been applied to Fargate?
So the Fargate team runs their own VPC,
we're gonna call that the Fargate service VPC.
And within the Fargate VPC
they run these bare metal instances.
These are EC2 bare metal instances,
they're publicly available, all of you can run it.
It's basically, you get the whole machine.
So you can actually install virtualization software
on these machines
unlike traditional other types of EC2 instances.
And because these bare metal instances are running
in the Fargate VPC, you know,
it has an ENI, elastic network interface,
that's in the Fargate service VPC.
So within this instance, like I said,
we install the Firecracker VMM and we install Container D
and we install this piece of, kind of,
glue code between Firecracker and Container D,
which is, basically when we call Container D to launch
the container, rather than using traditional C groups
and name spaces to spin up the container,
it will actually turn around and speak the Firecracker APIs
to spin up the container within a micro VM.
So that's what that glue code is doing.
And then we run our Fargate agent
and then we basically have these micro VMs
that are running your actual application container
within it.
And then we have ENI at the micro VM layer
separate from the ENI for the actual bare metal instance.
There's one, they're dual honed.
So like we talked about before,
there's one ENI attached to this micro VM
that talks to the App Runner VPC,
and then there's a secondary ENI attached to the VM
that talks to to your VPC.
So that's kind of the Fargate story.
So moving on to the ECS orchestration part of this story.
So ECS, like we talked about,
it's an orchestrator and it's job is really to,
when we make a request of the ECS control plane
to launch a Fargate task,
its job is to find an appropriate slot,
on an appropriate bare metal instance
and speak to the Fargate agent
to actually make the launch happen.
So, the control plane itself
is actually pretty sophisticated mesh
of multiple microservices,
and I'm not gonna go into a lot of details into each service
but those of you who've used ECS,
you might recognize some of the things there,
like task definitions, clusters, services,
these are all concepts in ECS
and we basically have a microservice
that owns each concept in ECS.
But the way this this works,
is that when we launch all these bare metal instances
in the Fargate VPC and the Fargate agent bootstraps,
the Fargate agent basically speaks out
through that primary bare metal instance ENI
and registers that the bare metal instance
with one of the microservices,
which we're gonna call the agent communication service.
It's the agent, so that's the service
that owns all these incoming connections
from the actual worker nodes, right?
And then when App Runner calls ECS, you know,
App Runner is a client of ECS,
calls ECS to launch that Fargate task in the App Runner VPC.
Basically the placement algorithm in ECS lights up
and, you know, it picks a specific bare metal instance
and the agent communication service is called upon
to actually communicate with the Fargate agent
and send it the command and all the configuration
that the agent needs.
The agent then speaks to Firecracker or Container D
or it speaks Container D and the micro VM is spun up
with your application container in it.
So how do security considerations layer on top of this?
So let's start with App Runner.
So, so far we kind of looked at the case of a single service
and how it works, a single app runner service
and how the logistics work.
But in reality, these VPCs that we're running,
they're hugely multi-tenant VPCs.
So we actually put, I mean I'm showing showing two here,
but we actually put, you know,
many, many different app services into this VPC
and we have some pretty strict controls
that we have to put in place to make sure that there's no,
you know, you know,
undesired breakages from one to the other,
from one tenant to the other.
So we use actually a lot of the controls
that are available to all of you, as customers of VPC.
So the first one is, we basically use security groups
for these tasks that don't allow
any task-to-task communication,
not even within tasks of the same service
or the same tenant.
There's no reason why these tasks should be talking
to each other at all.
So we completely block that communication.
The only communication that should be coming
into these tasks is the requests that are flowing in
from the load balancer and the request router.
But as I mentioned, shared responsibility,
the secondary ENI that these Fargate tasks are attached to,
that that kind of segues into your, the customer's, VPC,
the security groups on that
is the customer's responsibility to configure, you know,
what kind of outbound traffic you wanna allow
from your application that's running in App Runner.
And this is, kind of, a recently launched feature.
So this is on the inbound traffic side of things.
We just announced private service endpoints for App Runner.
And what this means is,
that instead of getting an app runner service URL
that's publicly reachable over the internet,
you can now create a private endpoint in your VPC
to accept the incoming traffic.
And what we're doing is,
really we're creating a private link endpoint in your VPC.
And again, the security groups that you can figure
on that sort of inbound connection in your VPC
would be your responsibility,
to configure exactly which clients within the VPC
can talk to your service.
All right, Fargate data plane security.
So zooming into this Fargate VPC, like we said,
we have these bare metal instances.
There obviously isn't just one bare metal instance.
We have many, many bare metal instances that we run per VPC.
So a similar kind of theme.
We don't allow any bare metal instance
to instance communication through that primary ENI,
no reason why instances should be talking to each other.
The only communication that's allowed is the communication
that the Fargate agent needs to perform
to the ECS control plane
and just any other kind of image pull
and things like that that the agent needs to perform.
So very controlled, the security groups there.
We do, like I mentioned,
one of the benefits of Firecracker
is that it allows us to safely place multiple tenants
on the same machine.
So we do actually place multiple tenants,
tasks for multiple tenants.
And these are not just App Runner workloads, these are,
you know, any public Fargate workloads.
We can safely co-locate them
on the same bare metal instance,
but we will only run one task per micro VM.
We will never put two tasks,
even if it's from the same customer, same account,
we will never put more than one Fargate task
in the same micro VM.
At Amazon, we don't trust the container boundary
to be safe enough for multi-tenant isolation,
which is the reason why we've, kind of, made this choice,
of just really maintaining the security posture of your task
and not putting multiple tasks within the same VM boundary.
There's, you know,
independent network interfaces for each of those micro VMs.
So your tasks are not kind of cross communicating
through the same network interface.
They have their dedicated ENI
that are connected to the customer VPC.
And as we said, you know,
micro VM boundary hardware, isolated EC2 instance,
like isolation basically.
And as you can see here, you know,
completely separate guest OSs and guest kernels
between the multi-tenant workloads.
So all of these things basically ensures
that we don't have any side words breakages
from tenant to tenant or that there's no downward breakages
to, kind of, some of the Fargate layer software running,
running on the instance.
All right, so going into the ECS control plane security.
So ECS, the, kind of,
protected resource is all the state that ECS is keeping
about the bare metal instances, about the Fargate tasks
that are running on the bare metal instances.
And a lot of, kind of, the security aspect of ECS
is making sure that that state is accessed
and modified in a properly authenticated way.
And there's basically two entry points to the control plane.
Many of these services are internal services,
but there's two entry points from the outside internet.
And the first one is this agent communication service.
So we talked about these, you know,
with Fargate, these bare metal instances
and the agent talking to the agent communication service
and you know,
this is fairly safe 'cause it's our agent,
it's running on our instances, so, you know,
we're all friends
and we don't generally expect any malicious activity
between Fargate and ECS.
But remember that the ECS control plane
is not just serving Fargate, right.
It's also serving the EC2 launch type,
which is customers running their own instances
as worker nodes and having these instances register
with the ECS control plane.
So the agent communication service
also feels these connections from all of these, you know,
customer owned instances as well.
And that's the reason why,
because the agent is just a piece of software
for this EC2 launch type.
It's a piece of software that's running out there,
in the wild, on a customer instance,
it's just an open source piece of software really.
They can write their own custom version of the agent
and try and talk to our APIs.
So we actually don't consider the agent
to be within the trust boundary.
And the way we kind of enforce checks and balances here,
is that the instance role that's present on your instance,
that's the identity that the agent is going to use
to talk to our control plane.
And the agent communication service basically ensures
that you're only modifying
or reading state about other tasks or instances
in your own account and you can't read across accounts.
The other entry point is the front end service of course.
So this is the one that feels all the other, kind of,
incoming APIs, task management, you know,
run task APIs and task definition APIs,
cluster API, service APIs.
And, you know, here there's standard, again,
IM-oth, we enforce IM-oth based on the calling actor.
And then limits and throttles
are another important piece of really just, kind of,
protecting fairness on the service
because unlike, you know,
some of those open source orchestrator projects
that we talked about,
those basically are run in a single tenant mode, right?
One customer installs their installation of, you know,
Kubernetes or Mesos and it only serves that customer.
ECS is basically a multi-tenant AWS service.
So we, you know,
a lot of times these limits and throttles are frustrating
to customers 'cause you have to keep coming to our account
and you know, getting it raised for your use case,
but it's really to protect you from each other really.
So we have to make sure
that all the resources behind the scenes are fairly used
across all of our customers.
So, you know,
that's really the spirit of limits and throttles
that are basically enforced, kind of,
right at the front door, at the front end service.
All right, moving on to availability.
So now we're gonna start from ECS and go backwards.
So ECS control plane availability,
we talked about this kind of web of microservices
and we basically don't just run one one copy
of that entire stack,
we run a copy of that stack for every region.
So there's kind of complete independence between regions.
There's no service in one region
that's trying to talk to a service in another region.
This is all in the spirit of, you know,
if a region is having an an outage,
we don't want to have that impact spillover
to any other regions.
And I think AWS has 30 something regions, give or take.
So we're actually running like 30 copies of the stack
and it's not just infrastructure failures, right?
Even software deployments that we perform to these services,
we've, you know, phased them out region by region.
So any kind of software related errors
that we might roll out are also kind of very controlled
to make sure that they don't hit multiple regions
at the same time.
So within a region,
we actually don't just run one copy of the stack per region
'cause we wanna do better, right?
We don't, when a region is having a problem,
it's not okay for us to take down the entire region
and everyone that's using that region.
So we actually have this notion of a cellular architecture
and we actually run multiple copies of this stack
within a region.
And what we do,
this is completely transparent to you as the customer
is we basically allocate, you know,
I wouldn't say it's arbitrary,
but there's an algorithm that basically when you, you know,
create your cluster or your tasks in ECS,
you get allocated to one of our cells
and then all of your APIs are then routed
to that particular cell
via this thin, kind of, cell router layer.
And like I said, completely transparent to you,
you don't get to pick your cells,
you don't see the cells or anything like that.
It's really just a knob for us.
Again, like I said, to reduce that impact of blast radius
to be subregion.
And if you look at a particular microservice, you know,
within this cellular control plane,
each service is actually spread across AZ.
So this is again like standard best practice
that we recommend for customers, we follow the same.
And really the idea there is,
if a single AZ is having a infrastructure failure,
we're basically losing just a third of each service
and our services are scaled up enough
in the other two AZs that we're able to, kind of,
fail over the traffic from that AZ to the other two AZs,
and we're able to operate with, basically,
no customer facing impact if they're single AZ failures.
So let's talk about the Fargate data plane availability.
So again, similar kind of concept,
that Fargate VPC that we talked about,
that has the bare metal instances.
We're not just running one of that per region,
we actually have zonal VPCs.
So we have single subnet VPCs
completely separated out zone by zone.
And like I said, it's not just about the zonal of failures,
but you know, we also think about,
you know, what if an operator is going into a VPC
and trying to to perform some operations and they're,
you know, they fat finger something,
we don't wanna affect more than one AZ at a time
or software deployments, like I mentioned,
you know, they're rolled out zoned by zone
to make sure that if there's any problems there, you know,
we're not affecting multiple zones at the same time.
And again,
the whole zone being affected is not acceptable to us.
So we actually run multiple single subnet VPCs per AZ
and it's blast radius protection,
but it's also kind of our scaling mechanism.
So the idea here is that each VPC, we've kind of tried
and tested how much load a single VPC can take
and it's a fixed size.
So once we start to get close to capacity
for one of these fixed size cells,
we basically can just keep scaling out horizontally
by adding more of these VPCs per AZ.
The App Runner data plane availability.
So with App Runner, you know, it's the similar thread,
it's cellular architecture with each component
within the cell being striped across multiple AZs.
So the app runner service VPC for a given region,
we run multiple VPCs per cell,
similar to, kind of, the ECS control plan picture
your App Runner service gets allocated
to one of the cells arbitrarily transparent to you.
And if you look at kind of a single VPC,
within the VPC, the Fargate task,
basically every component, right?
We have the load balancer, the L7 request router,
the Fargate tasks, they're all striped across AZs.
So, you know, that's kind of bringing me
to the end of my talk.
And really the points I wanted to make here is
we have a pretty rich portfolio
for hosting container applications and, you know,
often it can be confusing to decide
which service you should use for your application,
but it's important to understand, kind of,
the different abstraction at which each service
is providing for you.
And you know, my rule of thumb for these questions
that we get, is to always start
with the highest abstraction service
and then, you know, start your experiments there.
Really only move down lower to the stack
if you find specific reasons
why the higher abstraction services don't work for you.
And of course like come back and tell us,
so we can tell you if that's, you know,
something that's coming up that we wanna support
or if it's a feature that we're like, no,
that's like just never gonna be built
into that abstraction layer.
So the lower abstraction layer is the right layer for you.
And you know, as you can see,
a lot of thought has been put in to kind of the security
and availability stance of these services
and especially for the higher abstraction services.
Like it has, you know,
all of that thought put into every single layer beneath it
and the lower down you go, like,
you basically start owning the security
and availability stance of those kind of layers beneath you.
So, and it's hard to put a dollar amount on that.
So, you know, really take advantage I guess,
of the value proposition
of all the work that we do behind the scenes,
of all the work that our teams do behind the scenes.
And, and like I said, you know,
try to use the higher abstraction services
as far as possible.
That's all I had.
Посмотреть больше похожих видео
Containers on AWS Overview: ECS | EKS | Fargate | ECR
Top 50+ AWS Services Explained in 10 Minutes
24 MOST Popular AWS Services - Explained in 13 mins (2024)
Platform Technologies
AWS Cloud Quest - Computing Solutions - basketball court is the reward!
AWS & Cloud Computing for beginners | 50 Services in 50 Minutes
5.0 / 5 (0 votes)