AWS re:Invent 2022 - A close look at AWS Fargate and AWS App Runner (CON406)

AWS Events
30 Nov 202236:32

Summary

TLDRArchana Srikanta, a principal engineer at AWS, discusses the evolution of AWS container services, from EC2 to App Runner, highlighting the shared responsibility model and the architectural advancements that enable higher abstraction services like Fargate and App Runner. She also delves into the security and availability considerations that underpin these services.

Takeaways

  • 😀 Archana Srikanta, a principal engineer at AWS, has been instrumental in the development of container services including founding roles in App Runner and Fargate.
  • 🚀 The evolution of AWS services from EC2 to App Runner has been driven by a desire to abstract complexity away from the user, allowing easier deployment and management of applications.
  • 💡 AWS services are designed with a shared responsibility model, where the availability and security of applications are a joint responsibility between AWS and the customer.
  • 🛠️ Elastic Beanstalk simplified the process of deploying applications by orchestrating various AWS services, reducing the need for customers to manage individual components.
  • 📦 The rise of containerization led to the development of ECS, which abstracted the container orchestration control plane, making it easier for customers to run containers without managing the orchestration layer.
  • 🌟 Fargate introduced a serverless container offering, removing the need for customers to manage the underlying EC2 instances and base layer software, further simplifying the deployment process.
  • 🌐 App Runner is the latest service, focusing on web applications and abstracting even further by managing containers, load balancers, auto-scaling, and deployment pipelines.
  • 🔒 Security is a key consideration, with AWS implementing strict controls such as security groups and private service endpoints to ensure multi-tenant isolation and prevent unauthorized access.
  • 🔄 Availability is ensured through a cellular architecture within AWS, with multiple copies of services running across different availability zones to minimize the impact of any single point of failure.
  • 🛡️ Firecracker, an open-source virtualization software by Amazon, is used in Fargate to create microVMs that provide fast startup times and strong isolation for containers.
  • 🔑 AWS encourages the use of the highest abstraction services suitable for an application, leveraging the security and availability measures built into the platform, and only moving down the stack if necessary.

Q & A

  • What is Archana Srikanta's role and experience at AWS?

    -Archana Srikanta is a principal engineer at AWS with over 11 years of tenure, a large part of which has been with the container services organization. She has worked on multiple container services and was part of the founding team for App Runner and Fargate.

  • What is the significance of App Runner and Fargate in AWS's container services?

    -App Runner and Fargate are significant as they represent the evolution of AWS's container services. App Runner is the newest service offering the highest level of abstraction, while Fargate is a serverless container offering that abstracts away the underlying EC2 instances.

  • How does the architecture of newer AWS services like App Runner and Fargate build upon the older ones?

    -The architecture of newer services like App Runner and Fargate has layered on top of the foundations laid by predecessor services. For instance, App Runner is built on top of Fargate, which in turn is built on top of ECS, showing a progression of abstraction and simplification.

  • What is the shared responsibility model on AWS, and how does it apply to the discussed services?

    -The shared responsibility model on AWS is a concept where the availability and security posture of an application is a joint responsibility between the customer and AWS. Different aspects of the stack are owned by either party, and this model applies to all discussed services, with the division of responsibilities shifting as abstraction layers increase.

  • How did the evolution from EC2 to Elastic Beanstalk address customer concerns about managing infrastructure?

    -The evolution to Elastic Beanstalk addressed customer concerns by providing a central orchestration plane that simplifies the process of managing and stitching together various AWS services. It automated the creation and provisioning of resources, reducing the complexity for customers running applications.

  • What is the role of Firecracker in the context of Fargate and container services?

    -Firecracker is an open-source virtualization software project by Amazon that serves as a hypervisor specifically built for containers and functions. It is used in Fargate to spin up micro VMs, which are optimized for fast startup times and provide EC2 instance-level isolation between workloads.

  • How does App Runner simplify the process of running web applications compared to other services?

    -App Runner simplifies the process by abstracting away the need to manage containers, load balancers, auto scaling groups, and deployment pipelines. Customers only need to focus on their application image, and App Runner handles the rest, providing a URL endpoint for HTTP requests that scales automatically.

  • What security measures are in place to ensure multi-tenancy isolation in App Runner and Fargate?

    -Both App Runner and Fargate implement strict security measures such as using security groups to block task-to-task communication and ensuring that each task runs in its own micro VM with separate network interfaces. This maintains a high level of isolation between tenants and prevents unauthorized access or communication between tasks.

  • How does the ECS control plane ensure security and availability for its services?

    -The ECS control plane ensures security through a cellular architecture that runs multiple copies of its stack within a region, with each service spread across different availability zones. This design minimizes the impact of any single point of failure and allows for regional independence, protecting against outages and software deployment errors.

  • What is the advice given for customers deciding which AWS service to use for their container applications?

    -The advice given is to start with the highest abstraction service that meets their needs and only move down the stack if there are specific reasons why the higher-level services are not suitable. This approach allows customers to take advantage of the security and availability measures built into the higher abstraction services.

Outlines

00:00

😀 Introduction to AWS Container Services

Archana Srikanta, a principal engineer at AWS, introduces the session with her experience of over 11 years at AWS, mainly with container services. She outlines the journey from EC2 to App Runner, highlighting her involvement in founding services like App Runner and Fargate. The session aims to delve into the evolution of product ideas and the layered architecture of these services, emphasizing security and availability as key design influences. The use-case of a web application is presented to explore the application of AWS shared responsibility model across different services.

05:02

🚀 Evolution of AWS Compute Services

This paragraph discusses the evolution of AWS compute services, starting with EC2, which managed physical servers and virtualization software, leaving customers responsible for the VMs and associated software. Elastic Beanstalk was introduced to simplify the process by orchestrating resources through a cloud formation template. The rise of container technology led to customers using Docker or similar runtimes, which while efficient on a single instance, posed challenges in orchestration at scale. This led to the emergence of container orchestrators like Mesos and Kubernetes, which were complex to manage, identifying a need for AWS to step in and offer solutions.

10:03

🛠️ The Emergence of ECS and Fargate

The paragraph explains the launch of ECS in 2015, which moved the container orchestration control plane to AWS's responsibility, simplifying the process for customers. However, customers still managed load balancing, auto scaling, and deployment pipelines. Fargate, introduced in 2017, further abstracted these responsibilities by offering a serverless container service, eliminating the need for customers to manage EC2 instances or base layer software, and allowing them to focus solely on their containerized applications.

15:06

🌐 App Runner: Simplified Web Application Deployment

App Runner is highlighted as a service designed to simplify the deployment of web applications. It abstracts away the complexities of container management, load balancing, auto scaling, and CI/CD pipelines. Customers can deploy applications directly from GitHub or prebuilt container images in ECR, with App Runner handling the build process, containerization, and service creation. The service provides a URL endpoint that scales automatically with traffic, abstracting away the underlying infrastructure and network configurations.

20:08

🔧 Under-the-Hood: App Runner and Fargate Architecture

The paragraph delves into the technical architecture behind App Runner and Fargate, discussing the use of Firecracker, an open-source virtualization software by Amazon, which creates micro VMs for container and function deployment. It explains how Firecracker optimizes startup times and maintains traditional VM-level isolation, ensuring security and efficient resource utilization. The architecture includes a service VPC, managed language runtimes, and a detailed network configuration involving ENIs for connectivity to the customer's VPC.

25:09

🔒 Security Considerations in AWS Services

This section focuses on the security measures implemented in AWS services, particularly for App Runner and Fargate. It discusses the use of security groups to prevent task-to-task communication and the introduction of private service endpoints for App Runner. The paragraph also covers Fargate's data plane security, detailing the isolation provided by Firecracker micro VMs and the strict controls in place to ensure multi-tenant isolation and secure communication with ECS control plane.

30:10

🌐 ECS Control Plane Security and Availability

The paragraph discusses the security and availability of the ECS control plane, emphasizing the importance of protecting the state of ECS and ensuring it is accessed and modified in a secure manner. It explains the use of instance roles for identity verification and the implementation of limits and throttles to ensure fairness and protect the service from potential misuse. The availability strategy includes a cellular architecture with multiple copies of the stack within a region, spread across AZs to minimize the impact of failures.

35:12

🛡️ Data Plane Security and Availability for Fargate and App Runner

The final paragraph addresses the security and availability of the data plane for both Fargate and App Runner. It describes the zonal VPCs used by Fargate to ensure that no communication is allowed between instances, and the use of separate network interfaces for each micro VM. For App Runner, it outlines the cellular architecture with components striped across multiple AZs for high availability. The paragraph concludes by emphasizing the importance of starting with the highest abstraction service for container applications and understanding the security and availability considerations built into these services.

📚 Conclusion and Recommendations

Archana concludes the session by advising users to start with the highest abstraction service for their container applications and only move down the stack if necessary. She encourages users to leverage the security and availability measures built into AWS services and to provide feedback for continuous improvement. The emphasis is on utilizing the higher abstraction services to take advantage of the work done by AWS teams to ensure a robust and secure deployment environment.

Mindmap

Keywords

💡AWS

AWS, or Amazon Web Services, is a subsidiary of Amazon that provides on-demand cloud computing platforms and APIs to individuals, companies, and governments. It is a key player in the field of cloud computing and is central to the script's narrative as it discusses various AWS services and their evolution.

💡Container Services

Container services refer to the tools and platforms that enable the use of containers for developing, deploying, and managing applications. In the script, the speaker discusses their experience with AWS's container services, highlighting how they have evolved over time.

💡App Runner

App Runner is a fully managed service provided by AWS that makes it easy for developers to quickly deploy containerized applications and APIs. The script mentions App Runner as one of the newer services and a founding project of the speaker.

💡Fargate

AWS Fargate is a serverless compute engine for containers that works with Amazon Elastic Container Service (ECS) and Amazon Elastic Kubernetes Service (EKS). It is highlighted in the script as a serverless container offering, emphasizing its role in abstracting away the infrastructure management.

💡ECS

ECS, or Amazon Elastic Container Service, is a highly scalable, high-performance container orchestration service that makes it easy to run and manage Docker containers on AWS. The script discusses ECS as a foundational service upon which newer services like Fargate and App Runner are built.

💡Shared Responsibility Model

The shared responsibility model is a framework that outlines the security and compliance obligations shared between cloud service providers and their customers. In the script, it is used to explain how the security and availability of applications are a joint responsibility between AWS and its customers.

💡Elastic Beanstalk

AWS Elastic Beanstalk is an orchestration service that handles the deployment, from capacity provisioning, load balancing, auto-scaling to application health monitoring. The script mentions it as a service that simplifies the process of running applications by automating much of the infrastructure management.

💡Firecracker

Firecracker is an open-source virtualization software developed by AWS. It is designed to provide a lightweight virtualization option for container and function-based workloads. The script describes Firecracker's role in enabling Fargate's serverless container execution with fast startup times and strong isolation.

💡MicroVMs

MicroVMs, or micro virtual machines, are a concept introduced with Firecracker, which are optimized for container and function workloads by stripping down to the essentials to reduce startup time. The script explains how MicroVMs are used in Fargate to provide fast and secure container execution environments.

💡Security Groups

Security groups in AWS act as a virtual firewall for instances to control inbound and outbound traffic. The script discusses the use of security groups to ensure that tasks running in App Runner and Fargate do not communicate with each other, enhancing the security posture of multi-tenant environments.

💡Cellular Architecture

Cellular architecture in the context of AWS refers to the practice of distributing components of a service across different cells or zones to enhance availability and reduce the impact of failures. The script mentions this concept in relation to the control plane of ECS and the data plane of Fargate and App Runner, emphasizing AWS's commitment to high availability.

Highlights

Archana Srikanta, a principal engineer at AWS, shares insights on the evolution of container services at AWS.

App Runner and Fargate are highlighted as key services, with Archana being part of their founding teams.

The talk covers the progression from EC2 to App Runner, detailing how newer services are built on the foundations of older ones.

The architecture of these services is discussed, showing how they layer on top of each other, with newer services leveraging the base laid by predecessors.

Security and availability are emphasized as key tenets in the design of AWS services.

The shared responsibility model on AWS is explained, where the customer and AWS share the responsibility for the application's security and availability.

EC2 is described as the original compute service, with customers managing the VMs and associated software.

Elastic Beanstalk is introduced as a service that simplifies the management of resources by automating the orchestration.

The rise of container technology and its benefits, such as faster start times and better resource utilization, are discussed.

ECS (Elastic Container Service) is presented as a solution that moves the container orchestration control plane into AWS's responsibility, simplifying customer tasks.

Fargate is introduced as a serverless container offering, further abstracting the underlying instance management from the customer.

App Runner is described as focusing on web applications, abstracting even the container layer, making deployment as simple as making an API call.

The under-the-hood architecture of App Runner is detailed, including the use of VPC, managed language runtimes, and Fargate tasks.

Firecracker, an open-source virtualization software by Amazon, is explained as a key technology used in Fargate for running micro VMs.

The security measures in place for App Runner, Fargate, and ECS are discussed, including the use of security groups and private service endpoints.

The availability architecture of ECS, Fargate, and App Runner is described, emphasizing the cellular design and zonal distribution to ensure service resilience.

Archana recommends starting with the highest abstraction service for container applications and only moving down the stack if necessary.

The importance of leveraging the security and availability measures provided by AWS's higher abstraction services is emphasized.

Transcripts

play00:00

- Thank you for joining this late afternoon session today.

play00:03

My name is Archana Srikanta,

play00:05

I'm a principal engineer at AWS.

play00:07

I've been with AWS for over 11 years now,

play00:11

and a large part of that tenure

play00:12

has been with the container services org.

play00:15

So I've actually had the good fortune of working

play00:19

on multiple container services during that tenure.

play00:23

And in fact,

play00:24

I've actually rotated through all of these services

play00:27

that we're gonna talk about today, at some point.

play00:29

And, you know,

play00:30

App Runner and Fargate are especially close to my heart

play00:32

'cause I was one of the founding engineers,

play00:34

part of the founding team for those services.

play00:37

So I wanna start today by telling you a little bit

play00:41

of the story of these services

play00:43

and how the product ideas for these services kind of evolved

play00:47

from one service to the next.

play00:49

You know, starting from EC2,

play00:50

which is our original compute service,

play00:53

all the way to App Runner,

play00:54

which is kind of the newest service on the block.

play00:57

And then we'll pull the curtains back

play01:00

and look at the under-the-hood architecture

play01:02

for these services.

play01:04

And you'll see that it's not just the product ideas

play01:07

that, kind of, have built on top of each other,

play01:10

but the actual architecture itself

play01:12

has kind of layered on top of each other.

play01:14

So newer services have been built on top of foundations

play01:17

that we laid with predecessor services.

play01:20

And for the under-the-hood part of this talk,

play01:23

we'll go backwards.

play01:23

So we'll start with App Runner,

play01:25

which is kind of the highest abstraction service,

play01:27

and then we'll see how that's built on top of Fargate,

play01:29

and how Fargate kind of built on top of ECS.

play01:33

And finally,

play01:35

security and availability are kind of the key tenets

play01:41

that we apply at Amazon across all AWS services,

play01:44

across all Amazon services.

play01:47

So, we'll go over how, for these architectures,

play01:50

for these particular services,

play01:52

how kind of security and availability played a role

play01:55

in influencing the design.

play02:01

All right, so for the the product idea evolution,

play02:05

we're gonna use this use-case of a web application.

play02:08

So this is your standard kind of http server

play02:11

that listens on a socket

play02:12

and responds to http requests, right?

play02:14

And generally when you think

play02:17

of how you would run such an application, you have a VM,

play02:20

you install an operating system on it,

play02:22

you pick a language of your choice

play02:23

to write the application in,

play02:24

and then the app sits on top of it

play02:26

and you're probably running multiple copies of this stack

play02:29

for scale and redundancy.

play02:30

So you put a load balancer in front of it,

play02:32

you put an auto scaling group around it,

play02:34

and you probably have some build deployment pipeline.

play02:39

Now before we see how to run this on AWS, you know,

play02:45

I wanna pause and talk a little bit

play02:47

about the shared responsibility model on AWS.

play02:49

And some of you might have heard about this already,

play02:53

but the spirit of shared responsibilities

play02:55

is that no matter what AWS service you use,

play02:58

not just specific to the ones we're gonna talk about today,

play03:00

in most cases the availability

play03:03

and the security posture of your application

play03:05

at the end of the day is a joint responsibility between you,

play03:08

the customer, and us, AWS.

play03:10

So there's gonna be parts of the stack that we own,

play03:13

the responsibility, and there will be parts of the stack

play03:16

that you will own the responsibility for.

play03:19

And we're gonna use this lens of shared responsibility

play03:22

to go through each of the services

play03:25

in the context of that web application.

play03:29

So with EC2, this is the original compute service.

play03:34

EC2 basically took responsibility

play03:37

for running the physical servers and data centers

play03:40

and the virtualization software

play03:42

that runs on these physical servers.

play03:44

But you as the customer, you still own the VMs,

play03:47

what we call EC2 instances.

play03:49

You still own all the software

play03:51

that runs inside these instances.

play03:53

And you know, you own hooking up the load balancer,

play03:56

the auto scaling group and,

play03:57

and the build deployment pipeline around these instances.

play04:00

Now you can use AWS managed services,

play04:03

like application load balancer etcetera,

play04:06

but kind of tying it all together

play04:08

and making sure that it's configured properly

play04:11

is still your responsibility at the end of the day.

play04:14

And you know,

play04:15

a lot of our EC2 customers came back and told us,

play04:19

especially the ones that weren't, you know,

play04:21

core infrastructure admin personas,

play04:25

they said that this is still a lot of things

play04:27

that you as the customer have to tie together.

play04:29

It's a lot of things that you have to get right

play04:33

and it's a lot of services that you have to learn

play04:35

to run just a super simple web application.

play04:38

So in 2011 we launched Elastic Beanstalk.

play04:42

And what Beanstalk did is it said, okay,

play04:44

you don't have to go to each of these individual services

play04:47

and learn how to, kind of, stitch it all together.

play04:50

We will do that for you

play04:52

as a central kind of orchestration plane,

play04:55

which is Beanstalk.

play04:56

So you can just go to Elastic Beanstalk,

play04:58

you can describe your application

play04:59

and the environment in those terms.

play05:02

And Beanstalk will basically create

play05:04

a cloud formation template behind the scenes

play05:06

and, you know, deploy and provision all of these resources

play05:09

behind the scenes in your account.

play05:11

So the responsibility line here still doesn't shift

play05:16

because these resources still end up running

play05:19

in your account at the end of the day.

play05:21

So you have full access to these resources,

play05:23

you can go in and customize various aspects

play05:25

of these resources if you wanna change things.

play05:28

So in that sense, you as the customer

play05:30

still own the responsibility for these components.

play05:37

And then around 2013, 2014 on EC2,

play05:42

this is before AWS had any container services available,

play05:46

on EC2 we started seeing

play05:48

that containers were starting to become popular.

play05:50

And a lot of our customers

play05:51

for this web application type use cases,

play05:53

they were actually using container technology.

play05:55

And what does that mean?

play05:57

They would, you know,

play05:57

install a container runtime like Docker

play05:59

or Container D etcetera,

play06:01

and they would basically decouple the app packaging

play06:05

from the OS.

play06:06

So instead of building a monolithic army,

play06:10

they decoupled the app, you know,

play06:12

layered it with the language runtime

play06:14

and they would build a container image

play06:15

and deploy it as containers on these instances.

play06:18

And because containers provide some amount

play06:20

of resource isolation,

play06:21

you can actually co-locate multiple apps,

play06:24

multiple copies of the same app or even multiple, you know,

play06:26

different apps within the same instance.

play06:29

So you get, you know,

play06:30

all the wonderful benefits of containers,

play06:32

which is, you know, fast start times,

play06:33

better fleet utilization, etcetera, etcetera.

play06:37

Now this looks fine if you're looking at one instance,

play06:40

but if you're looking at, you know,

play06:42

hundreds of instances and thousands of applications,

play06:44

like many customers were doing,

play06:46

the actual orchestration and placement logic

play06:51

becomes a fairly complex software problem.

play06:53

So when you know workload request comes in,

play06:56

how do you find the right spot on the right instance

play06:58

to actually go launch this application?

play07:01

And so we saw a lot of these container orchestrator projects

play07:04

start to crop up, Mesos was a popular one,

play07:07

Kubernetes is a big one today.

play07:08

And a lot of these orchestrators

play07:10

were basically large open source projects.

play07:12

And what customers were doing is that, you know,

play07:16

they would install a orchestrator specific agent

play07:19

on this instance,

play07:22

and then the agent typically talks to a control plane,

play07:27

which is a much larger, beefier piece of software

play07:31

that has all the smarts for running the placement logic,

play07:33

and you know, the execution of the container orchestration.

play07:40

So these container orchestrators,

play07:42

they're not easy pieces of software, right?

play07:45

These are large services that you have to run

play07:46

and customers were running these control planes themselves.

play07:49

So they would actually launch more instances

play07:51

and, you know, run these open source projects themselves.

play07:54

And that was kind of the first opportunity we spotted

play07:58

and it just felt wrong

play07:59

that customers have to run more instances

play08:01

to manage their existing instances.

play08:04

So with ECS, which was released in 2015, we basically moved

play08:08

the container orchestration control plane bit

play08:10

down under the boundary

play08:11

into the AWS side of the responsibility.

play08:14

So on the EC2 instance, you install an ECS agent

play08:19

and your instances is basically registered with our service

play08:22

and then you can just speak our service APIs

play08:24

to launch containers on your instances.

play08:28

Now, you know, this is, oh,

play08:32

and then you still own, of course,

play08:34

the load balancing and the auto scaling CICD

play08:37

because the instances are all still running in your account.

play08:39

You still, kind of,

play08:40

have to hook up all of these things together.

play08:43

And actually some of the problems

play08:44

got even more complicated with containers

play08:47

because now, auto scaling for example,

play08:49

you're not just auto scaling your instances

play08:52

because you've decoupled the container from the instance.

play08:54

You have to auto scale your instance fleet

play08:56

and then you have to auto scale the containers

play08:57

on top of that.

play08:59

A similar build it and fort build and deployment pipeline,

play09:01

there's software that goes on the instance

play09:03

that you need a pipeline for.

play09:04

There's software that goes in the container, etcetera.

play09:06

So there was still a lot of stuff

play09:09

that was in the customer side of this responsibility line.

play09:14

Not to mention just the, you know, the OS patching,

play09:17

the runtime patching, agent patching.

play09:19

All of that non-trivial amount of work

play09:22

for someone who just wants to run a web app.

play09:24

So in 2017, what we did is we moved this line even higher,

play09:29

with Fargate,

play09:30

and Fargate was our serverless containers offering.

play09:33

And what that means is that we said, you as the customer,

play09:37

if you wanna run a container,

play09:38

you don't have to ever launch an EC2 instance.

play09:41

So we took responsibility of the underlying instance,

play09:44

we took responsibility of all that base layer software

play09:46

that's running on the instance.

play09:48

We run a Fargate agent, it's slightly modified version

play09:52

of the open source ECS agent that we make available,

play09:55

but we're gonna call it the Fargate agent.

play09:58

So you as the customer,

play09:59

you can really only speak in the currency of containers

play10:03

and you don't have to worry about the instance layer at all.

play10:07

There is still a certain aspect

play10:09

of load balancing auto scaling CICD

play10:12

that you don't have to do at the instance level,

play10:14

but at the containers level,

play10:15

you still have to kind of hook everything up together.

play10:20

So with App Runner, which, like I said,

play10:23

one of our newer services,

play10:25

what we did was, we said,

play10:28

let's focus on this use case of web applications

play10:31

and see for that specific vertical,

play10:33

how can we make things even easier for you, the customer?

play10:36

So we moved that responsibility line even higher

play10:40

and we said, you know,

play10:42

you don't have to run the containers even, so you know,

play10:47

we'll talk about what this experience looks like,

play10:48

but basically you don't have to run the containers

play10:50

in your account, you don't have to run the load balancer,

play10:52

you don't have to worry about the auto scaling groups,

play10:54

you don't have to worry about deployment pipelines.

play10:57

Really all you are responsible for is your application image

play11:01

and all the software that goes in the application image.

play11:05

So what does this experience look like with App Runner?

play11:09

So your teams can either start with source code

play11:15

directly in GitHub

play11:17

or you can start with prebuilt container images in ECR.

play11:21

But basically you have to give us permission

play11:23

to access your artifacts.

play11:25

So if it's source code in GitHub,

play11:26

you have to create a connection object,

play11:28

but if it's a image in ECR,

play11:30

you have to create an IM role that gives us permissions

play11:33

and then we will pull it down,

play11:35

and you just have to make one API call,

play11:37

the create service API call,

play11:40

and you get a URL in return,

play11:43

against which your clients can start making http requests.

play11:47

And like I said, you won't see the instances,

play11:49

you won't see the Fargate tasks or the containers.

play11:52

You won't see the load balancer,

play11:53

you won't see the auto scaling group.

play11:54

You just see this end point,

play11:56

against which you can make requests

play11:57

and everything magically scales

play11:59

as you start to send more requests to that end point.

play12:03

So what's going on under the hood of all this magic?

play12:08

So, you know, it's not magic.

play12:12

We have a VPC that we run behind the scenes.

play12:14

We we're gonna call it the App Runner, our service VPC.

play12:19

And if you're starting with source code,

play12:21

we basically have managed language run times

play12:26

that we make available to you.

play12:27

So you don't even have to to worry about the language layer

play12:30

of your application.

play12:32

We will layer it onto the runtime that that we provide

play12:35

and then we'll pump it through a build process

play12:37

and then we'll generate a container image for your app.

play12:40

Of course, if you're starting with a container image,

play12:42

we just copy it over into our account

play12:44

and then we basically deploy these as Fargate tasks in our,

play12:49

you know, App Runner owned service account.

play12:52

Now these Fargate tasks have to have some networking.

play12:55

So they have a, because they live in our service VPC,

play12:58

they have their primary ENI is attached

play13:00

to the App Runner service VPC,

play13:02

but they also have a secondary network interface

play13:05

that is attached to your VPC.

play13:07

So the application that you bring to us,

play13:10

if it needs to talk to a private database

play13:12

or something in your VPC,

play13:14

it uses the secondary network interface to do that.

play13:20

So what happens when you actually send a request?

play13:22

So when your clients send a request to that URL

play13:25

that I talked about,

play13:26

the URL basically gets resolved.

play13:29

We use Rev53 behind the scenes,

play13:31

to an NLB network load balancer that we run in our account.

play13:37

The NLB basically forwards it to an L7 request router

play13:43

and the L7 request router will then forward it

play13:45

to the Fargate tasks that we've spun up

play13:48

for your service through that primary ENI

play13:50

that we talked about.

play13:54

So that's kind of the picture of what's going on

play13:58

at the App Runner level.

play14:00

So what's actually going on behind the scenes

play14:02

of these Fargate tasks that we're launching.

play14:05

So now we're going into, kind of,

play14:07

the Fargate team's responsibility.

play14:10

Before I talk about Fargate,

play14:13

I want to introduce this technology

play14:15

that we use called Firecracker.

play14:18

Firecracker is a open source virtualization software project

play14:22

by Amazon, and it's basically a hypervisor

play14:27

that's purpose built for, specifically for,

play14:29

containers and functions.

play14:30

And Fargate uses it and Lambda uses it.

play14:34

So what's actually going on here?

play14:36

So you can take any bare metal server

play14:38

and then instead of the hypervisor,

play14:41

you basically install Firecracker

play14:43

and Firecracker will spin up what we call micro VMs.

play14:47

Now these are special,

play14:49

they're not traditional VMs,

play14:50

they're different from traditional VMs,

play14:52

that's why they're called micro VMs

play14:54

because what we've done, is we've basically ripped out

play14:58

some of the, kind of, devices and things

play15:02

from a traditional VM,

play15:05

in order to optimize for the startup time.

play15:07

So these are things that add latency to VM bootstrap.

play15:10

And by doing that,

play15:12

and these are devices that, you know,

play15:13

especially for abstracted workloads like containers

play15:15

and functions in the cloud,

play15:17

a lot of these devices aren't used.

play15:19

So we could kind of safely pair that down

play15:21

to gain advantages on startup times.

play15:24

So with these Firecracker micro VMs,

play15:27

we were able to go from, you know,

play15:29

traditional VM bootstrap time is, you know,

play15:32

probably tens of seconds if not minutes

play15:35

to basically sub-second bootstrap time.

play15:38

So we can launch these micro VMs, basically,

play15:40

just in time as those requests are coming in

play15:42

to launch containers, right?

play15:43

And so that's how they're different from traditional VMs,

play15:48

but how are they similar to traditional VMs?

play15:51

So the thing about these micro VMs

play15:53

is that the boundary between two micro VMs

play15:58

that are running on the same bare metal server,

play16:00

it's still using the same traditional, kind of,

play16:03

VM level isolation.

play16:05

So we can safely, you know,

play16:07

co-locate multiple VMs on the same same bear metal server

play16:11

and you basically get, you know,

play16:12

EC2 instance level isolation between between workloads

play16:15

that are running on the same server.

play16:18

And as you can see in this picture, you know,

play16:19

each micro VM has its own guest kernel.

play16:21

So you know, if we put multiple Fargate tasks,

play16:24

they're not sharing the guest dose

play16:26

or the guest kernel at all.

play16:29

So how has this been applied to Fargate?

play16:33

So the Fargate team runs their own VPC,

play16:37

we're gonna call that the Fargate service VPC.

play16:40

And within the Fargate VPC

play16:42

they run these bare metal instances.

play16:46

These are EC2 bare metal instances,

play16:47

they're publicly available, all of you can run it.

play16:49

It's basically, you get the whole machine.

play16:52

So you can actually install virtualization software

play16:54

on these machines

play16:56

unlike traditional other types of EC2 instances.

play17:00

And because these bare metal instances are running

play17:03

in the Fargate VPC, you know,

play17:05

it has an ENI, elastic network interface,

play17:08

that's in the Fargate service VPC.

play17:11

So within this instance, like I said,

play17:14

we install the Firecracker VMM and we install Container D

play17:19

and we install this piece of, kind of,

play17:21

glue code between Firecracker and Container D,

play17:23

which is, basically when we call Container D to launch

play17:29

the container, rather than using traditional C groups

play17:32

and name spaces to spin up the container,

play17:34

it will actually turn around and speak the Firecracker APIs

play17:37

to spin up the container within a micro VM.

play17:40

So that's what that glue code is doing.

play17:43

And then we run our Fargate agent

play17:45

and then we basically have these micro VMs

play17:47

that are running your actual application container

play17:50

within it.

play17:54

And then we have ENI at the micro VM layer

play17:57

separate from the ENI for the actual bare metal instance.

play18:00

There's one, they're dual honed.

play18:02

So like we talked about before,

play18:04

there's one ENI attached to this micro VM

play18:06

that talks to the App Runner VPC,

play18:07

and then there's a secondary ENI attached to the VM

play18:10

that talks to to your VPC.

play18:15

So that's kind of the Fargate story.

play18:18

So moving on to the ECS orchestration part of this story.

play18:24

So ECS, like we talked about,

play18:26

it's an orchestrator and it's job is really to,

play18:30

when we make a request of the ECS control plane

play18:33

to launch a Fargate task,

play18:35

its job is to find an appropriate slot,

play18:37

on an appropriate bare metal instance

play18:40

and speak to the Fargate agent

play18:41

to actually make the launch happen.

play18:43

So, the control plane itself

play18:46

is actually pretty sophisticated mesh

play18:49

of multiple microservices,

play18:50

and I'm not gonna go into a lot of details into each service

play18:54

but those of you who've used ECS,

play18:55

you might recognize some of the things there,

play18:57

like task definitions, clusters, services,

play19:00

these are all concepts in ECS

play19:02

and we basically have a microservice

play19:04

that owns each concept in ECS.

play19:08

But the way this this works,

play19:11

is that when we launch all these bare metal instances

play19:14

in the Fargate VPC and the Fargate agent bootstraps,

play19:18

the Fargate agent basically speaks out

play19:20

through that primary bare metal instance ENI

play19:23

and registers that the bare metal instance

play19:25

with one of the microservices,

play19:28

which we're gonna call the agent communication service.

play19:30

It's the agent, so that's the service

play19:32

that owns all these incoming connections

play19:34

from the actual worker nodes, right?

play19:39

And then when App Runner calls ECS, you know,

play19:42

App Runner is a client of ECS,

play19:43

calls ECS to launch that Fargate task in the App Runner VPC.

play19:48

Basically the placement algorithm in ECS lights up

play19:51

and, you know, it picks a specific bare metal instance

play19:55

and the agent communication service is called upon

play19:59

to actually communicate with the Fargate agent

play20:01

and send it the command and all the configuration

play20:03

that the agent needs.

play20:04

The agent then speaks to Firecracker or Container D

play20:07

or it speaks Container D and the micro VM is spun up

play20:11

with your application container in it.

play20:18

So how do security considerations layer on top of this?

play20:27

So let's start with App Runner.

play20:31

So, so far we kind of looked at the case of a single service

play20:35

and how it works, a single app runner service

play20:37

and how the logistics work.

play20:40

But in reality, these VPCs that we're running,

play20:43

they're hugely multi-tenant VPCs.

play20:45

So we actually put, I mean I'm showing showing two here,

play20:47

but we actually put, you know,

play20:49

many, many different app services into this VPC

play20:54

and we have some pretty strict controls

play20:56

that we have to put in place to make sure that there's no,

play20:58

you know, you know,

play21:01

undesired breakages from one to the other,

play21:05

from one tenant to the other.

play21:06

So we use actually a lot of the controls

play21:09

that are available to all of you, as customers of VPC.

play21:12

So the first one is, we basically use security groups

play21:17

for these tasks that don't allow

play21:19

any task-to-task communication,

play21:22

not even within tasks of the same service

play21:25

or the same tenant.

play21:26

There's no reason why these tasks should be talking

play21:29

to each other at all.

play21:30

So we completely block that communication.

play21:32

The only communication that should be coming

play21:35

into these tasks is the requests that are flowing in

play21:37

from the load balancer and the request router.

play21:44

But as I mentioned, shared responsibility,

play21:47

the secondary ENI that these Fargate tasks are attached to,

play21:52

that that kind of segues into your, the customer's, VPC,

play21:57

the security groups on that

play21:58

is the customer's responsibility to configure, you know,

play22:01

what kind of outbound traffic you wanna allow

play22:03

from your application that's running in App Runner.

play22:08

And this is, kind of, a recently launched feature.

play22:11

So this is on the inbound traffic side of things.

play22:16

We just announced private service endpoints for App Runner.

play22:19

And what this means is,

play22:20

that instead of getting an app runner service URL

play22:24

that's publicly reachable over the internet,

play22:27

you can now create a private endpoint in your VPC

play22:31

to accept the incoming traffic.

play22:32

And what we're doing is,

play22:33

really we're creating a private link endpoint in your VPC.

play22:37

And again, the security groups that you can figure

play22:40

on that sort of inbound connection in your VPC

play22:43

would be your responsibility,

play22:45

to configure exactly which clients within the VPC

play22:47

can talk to your service.

play22:53

All right, Fargate data plane security.

play22:59

So zooming into this Fargate VPC, like we said,

play23:04

we have these bare metal instances.

play23:06

There obviously isn't just one bare metal instance.

play23:08

We have many, many bare metal instances that we run per VPC.

play23:12

So a similar kind of theme.

play23:15

We don't allow any bare metal instance

play23:18

to instance communication through that primary ENI,

play23:21

no reason why instances should be talking to each other.

play23:28

The only communication that's allowed is the communication

play23:31

that the Fargate agent needs to perform

play23:33

to the ECS control plane

play23:35

and just any other kind of image pull

play23:38

and things like that that the agent needs to perform.

play23:40

So very controlled, the security groups there.

play23:46

We do, like I mentioned,

play23:48

one of the benefits of Firecracker

play23:50

is that it allows us to safely place multiple tenants

play23:53

on the same machine.

play23:54

So we do actually place multiple tenants,

play23:56

tasks for multiple tenants.

play23:58

And these are not just App Runner workloads, these are,

play24:02

you know, any public Fargate workloads.

play24:04

We can safely co-locate them

play24:06

on the same bare metal instance,

play24:10

but we will only run one task per micro VM.

play24:13

We will never put two tasks,

play24:15

even if it's from the same customer, same account,

play24:18

we will never put more than one Fargate task

play24:20

in the same micro VM.

play24:21

At Amazon, we don't trust the container boundary

play24:25

to be safe enough for multi-tenant isolation,

play24:30

which is the reason why we've, kind of, made this choice,

play24:33

of just really maintaining the security posture of your task

play24:36

and not putting multiple tasks within the same VM boundary.

play24:42

There's, you know,

play24:43

independent network interfaces for each of those micro VMs.

play24:47

So your tasks are not kind of cross communicating

play24:50

through the same network interface.

play24:51

They have their dedicated ENI

play24:53

that are connected to the customer VPC.

play24:59

And as we said, you know,

play25:00

micro VM boundary hardware, isolated EC2 instance,

play25:04

like isolation basically.

play25:05

And as you can see here, you know,

play25:06

completely separate guest OSs and guest kernels

play25:09

between the multi-tenant workloads.

play25:13

So all of these things basically ensures

play25:15

that we don't have any side words breakages

play25:18

from tenant to tenant or that there's no downward breakages

play25:22

to, kind of, some of the Fargate layer software running,

play25:25

running on the instance.

play25:28

All right, so going into the ECS control plane security.

play25:33

So ECS, the, kind of,

play25:36

protected resource is all the state that ECS is keeping

play25:40

about the bare metal instances, about the Fargate tasks

play25:45

that are running on the bare metal instances.

play25:47

And a lot of, kind of, the security aspect of ECS

play25:50

is making sure that that state is accessed

play25:52

and modified in a properly authenticated way.

play25:56

And there's basically two entry points to the control plane.

play26:01

Many of these services are internal services,

play26:03

but there's two entry points from the outside internet.

play26:06

And the first one is this agent communication service.

play26:09

So we talked about these, you know,

play26:11

with Fargate, these bare metal instances

play26:13

and the agent talking to the agent communication service

play26:16

and you know,

play26:17

this is fairly safe 'cause it's our agent,

play26:20

it's running on our instances, so, you know,

play26:22

we're all friends

play26:23

and we don't generally expect any malicious activity

play26:26

between Fargate and ECS.

play26:28

But remember that the ECS control plane

play26:31

is not just serving Fargate, right.

play26:32

It's also serving the EC2 launch type,

play26:35

which is customers running their own instances

play26:38

as worker nodes and having these instances register

play26:42

with the ECS control plane.

play26:44

So the agent communication service

play26:46

also feels these connections from all of these, you know,

play26:50

customer owned instances as well.

play26:53

And that's the reason why,

play26:57

because the agent is just a piece of software

play27:00

for this EC2 launch type.

play27:02

It's a piece of software that's running out there,

play27:04

in the wild, on a customer instance,

play27:06

it's just an open source piece of software really.

play27:08

They can write their own custom version of the agent

play27:10

and try and talk to our APIs.

play27:12

So we actually don't consider the agent

play27:14

to be within the trust boundary.

play27:18

And the way we kind of enforce checks and balances here,

play27:24

is that the instance role that's present on your instance,

play27:29

that's the identity that the agent is going to use

play27:31

to talk to our control plane.

play27:32

And the agent communication service basically ensures

play27:36

that you're only modifying

play27:39

or reading state about other tasks or instances

play27:43

in your own account and you can't read across accounts.

play27:48

The other entry point is the front end service of course.

play27:51

So this is the one that feels all the other, kind of,

play27:54

incoming APIs, task management, you know,

play27:58

run task APIs and task definition APIs,

play28:02

cluster API, service APIs.

play28:04

And, you know, here there's standard, again,

play28:10

IM-oth, we enforce IM-oth based on the calling actor.

play28:15

And then limits and throttles

play28:17

are another important piece of really just, kind of,

play28:21

protecting fairness on the service

play28:23

because unlike, you know,

play28:25

some of those open source orchestrator projects

play28:27

that we talked about,

play28:28

those basically are run in a single tenant mode, right?

play28:31

One customer installs their installation of, you know,

play28:34

Kubernetes or Mesos and it only serves that customer.

play28:37

ECS is basically a multi-tenant AWS service.

play28:40

So we, you know,

play28:43

a lot of times these limits and throttles are frustrating

play28:45

to customers 'cause you have to keep coming to our account

play28:47

and you know, getting it raised for your use case,

play28:49

but it's really to protect you from each other really.

play28:52

So we have to make sure

play28:53

that all the resources behind the scenes are fairly used

play28:56

across all of our customers.

play28:58

So, you know,

play28:59

that's really the spirit of limits and throttles

play29:01

that are basically enforced, kind of,

play29:03

right at the front door, at the front end service.

play29:08

All right, moving on to availability.

play29:13

So now we're gonna start from ECS and go backwards.

play29:17

So ECS control plane availability,

play29:19

we talked about this kind of web of microservices

play29:22

and we basically don't just run one one copy

play29:26

of that entire stack,

play29:27

we run a copy of that stack for every region.

play29:30

So there's kind of complete independence between regions.

play29:33

There's no service in one region

play29:35

that's trying to talk to a service in another region.

play29:38

This is all in the spirit of, you know,

play29:39

if a region is having an an outage,

play29:42

we don't want to have that impact spillover

play29:45

to any other regions.

play29:47

And I think AWS has 30 something regions, give or take.

play29:52

So we're actually running like 30 copies of the stack

play29:54

and it's not just infrastructure failures, right?

play29:56

Even software deployments that we perform to these services,

play29:59

we've, you know, phased them out region by region.

play30:03

So any kind of software related errors

play30:05

that we might roll out are also kind of very controlled

play30:08

to make sure that they don't hit multiple regions

play30:10

at the same time.

play30:13

So within a region,

play30:15

we actually don't just run one copy of the stack per region

play30:17

'cause we wanna do better, right?

play30:19

We don't, when a region is having a problem,

play30:21

it's not okay for us to take down the entire region

play30:24

and everyone that's using that region.

play30:26

So we actually have this notion of a cellular architecture

play30:30

and we actually run multiple copies of this stack

play30:34

within a region.

play30:35

And what we do,

play30:36

this is completely transparent to you as the customer

play30:39

is we basically allocate, you know,

play30:43

I wouldn't say it's arbitrary,

play30:45

but there's an algorithm that basically when you, you know,

play30:49

create your cluster or your tasks in ECS,

play30:52

you get allocated to one of our cells

play30:55

and then all of your APIs are then routed

play30:57

to that particular cell

play30:59

via this thin, kind of, cell router layer.

play31:02

And like I said, completely transparent to you,

play31:05

you don't get to pick your cells,

play31:07

you don't see the cells or anything like that.

play31:09

It's really just a knob for us.

play31:11

Again, like I said, to reduce that impact of blast radius

play31:16

to be subregion.

play31:22

And if you look at a particular microservice, you know,

play31:27

within this cellular control plane,

play31:30

each service is actually spread across AZ.

play31:33

So this is again like standard best practice

play31:35

that we recommend for customers, we follow the same.

play31:38

And really the idea there is,

play31:40

if a single AZ is having a infrastructure failure,

play31:44

we're basically losing just a third of each service

play31:50

and our services are scaled up enough

play31:54

in the other two AZs that we're able to, kind of,

play31:57

fail over the traffic from that AZ to the other two AZs,

play32:00

and we're able to operate with, basically,

play32:02

no customer facing impact if they're single AZ failures.

play32:09

So let's talk about the Fargate data plane availability.

play32:15

So again, similar kind of concept,

play32:19

that Fargate VPC that we talked about,

play32:22

that has the bare metal instances.

play32:23

We're not just running one of that per region,

play32:27

we actually have zonal VPCs.

play32:29

So we have single subnet VPCs

play32:32

completely separated out zone by zone.

play32:35

And like I said, it's not just about the zonal of failures,

play32:40

but you know, we also think about,

play32:42

you know, what if an operator is going into a VPC

play32:44

and trying to to perform some operations and they're,

play32:48

you know, they fat finger something,

play32:49

we don't wanna affect more than one AZ at a time

play32:52

or software deployments, like I mentioned,

play32:55

you know, they're rolled out zoned by zone

play32:57

to make sure that if there's any problems there, you know,

play33:01

we're not affecting multiple zones at the same time.

play33:04

And again,

play33:05

the whole zone being affected is not acceptable to us.

play33:08

So we actually run multiple single subnet VPCs per AZ

play33:16

and it's blast radius protection,

play33:18

but it's also kind of our scaling mechanism.

play33:21

So the idea here is that each VPC, we've kind of tried

play33:25

and tested how much load a single VPC can take

play33:29

and it's a fixed size.

play33:31

So once we start to get close to capacity

play33:34

for one of these fixed size cells,

play33:36

we basically can just keep scaling out horizontally

play33:39

by adding more of these VPCs per AZ.

play33:46

The App Runner data plane availability.

play33:50

So with App Runner, you know, it's the similar thread,

play33:55

it's cellular architecture with each component

play34:00

within the cell being striped across multiple AZs.

play34:03

So the app runner service VPC for a given region,

play34:07

we run multiple VPCs per cell,

play34:11

similar to, kind of, the ECS control plan picture

play34:15

your App Runner service gets allocated

play34:19

to one of the cells arbitrarily transparent to you.

play34:24

And if you look at kind of a single VPC,

play34:28

within the VPC, the Fargate task,

play34:31

basically every component, right?

play34:33

We have the load balancer, the L7 request router,

play34:36

the Fargate tasks, they're all striped across AZs.

play34:42

So, you know, that's kind of bringing me

play34:46

to the end of my talk.

play34:48

And really the points I wanted to make here is

play34:54

we have a pretty rich portfolio

play34:56

for hosting container applications and, you know,

play35:01

often it can be confusing to decide

play35:03

which service you should use for your application,

play35:06

but it's important to understand, kind of,

play35:08

the different abstraction at which each service

play35:12

is providing for you.

play35:15

And you know, my rule of thumb for these questions

play35:20

that we get, is to always start

play35:22

with the highest abstraction service

play35:24

and then, you know, start your experiments there.

play35:27

Really only move down lower to the stack

play35:29

if you find specific reasons

play35:31

why the higher abstraction services don't work for you.

play35:34

And of course like come back and tell us,

play35:36

so we can tell you if that's, you know,

play35:37

something that's coming up that we wanna support

play35:39

or if it's a feature that we're like, no,

play35:41

that's like just never gonna be built

play35:43

into that abstraction layer.

play35:44

So the lower abstraction layer is the right layer for you.

play35:47

And you know, as you can see,

play35:49

a lot of thought has been put in to kind of the security

play35:52

and availability stance of these services

play35:56

and especially for the higher abstraction services.

play35:58

Like it has, you know,

play36:00

all of that thought put into every single layer beneath it

play36:03

and the lower down you go, like,

play36:05

you basically start owning the security

play36:07

and availability stance of those kind of layers beneath you.

play36:11

So, and it's hard to put a dollar amount on that.

play36:13

So, you know, really take advantage I guess,

play36:18

of the value proposition

play36:19

of all the work that we do behind the scenes,

play36:21

of all the work that our teams do behind the scenes.

play36:23

And, and like I said, you know,

play36:24

try to use the higher abstraction services

play36:26

as far as possible.

play36:30

That's all I had.

Rate This

5.0 / 5 (0 votes)

Связанные теги
AWSContainersECSFargateApp RunnerCloud ComputingDevOpsMicroservicesServerlessArchana Srikanta
Вам нужно краткое изложение на английском?