Building a Multi-tenant SaaS solution on AWS
Summary
TLDRThe presenter provides best practices for building a multi-tenant SaaS solution on AWS using serverless technologies. Key topics covered include SaaS architecture patterns like application and control planes, deployment models like silos vs resource pools, authentication, authorization and access control with Amazon Cognito and AWS IAM, API throttling with usage plans and API keys, dynamically injecting tenant IDs into IAM policies at runtime for tenant isolation, CI/CD pipelines for consistent deployments across environments, and more.
Takeaways
- 😊 The talk focuses on best practices for building a SaaS solution on AWS using serverless architecture
- 💡 The application plane contains the core IP/service offering, while the control plane manages operational aspects
- 📝 Registration and onboarding of new tenants is handled by a separate microservice
- 🔐 Authentication uses API Gateway authorizers, API Keys, Usage Plans and Cognito to handle multi-tenant security
- 🚦 Tenant isolation in the pool model is achieved using IAM dynamic policies injected at runtime
- ⚙️ The CI/CD pipeline ensures consistent deployment across shared and dedicated tenant resources
- 🌐 A tenant routing mechanism redirects requests to the appropriate API Gateway based on tenant ID
- 🎚 API Keys allow throttling incoming requests at the tenant level to prevent abuse
- 💵 Tiered pricing models influence provisioning and resource sharing strategies
- 🗄 Row-level security in DynamoDB helps restrict tenant data access in the pool model
Q & A
What are the two main components of a SaaS application architecture?
-The two main components are the application plane, which is the core IP and service offering, and the control plane, which manages deployment, security, metrics collection etc.
What are the two typical deployment models for the application plane?
-The two models are silo, where each tenant has dedicated infrastructure, and pool, where infrastructure is shared across tenants. The reference architecture uses a hybrid approach.
How does the application handle authentication and authorization?
-It uses Amazon Cognito for authentication, generating JWT tokens. The tokens are validated by a Lambda authorizer, which also handles throttling based on usage plans and applies authorization policies.
How is tenant isolation achieved?
-For pooled resources, dynamic IAM policies at runtime restrict access to only a tenant's data rows. For siloed resources, Lambda execution roles limit access to dedicated DB tables.
How are infrastructure resources provisioned when onboarding new tenants?
-A registration service handles tenant onboarding workflows including creation of admin users in Cognito, allocating API keys, and conditionally invoking a provisioning service to deploy dedicated resources for high tier tenants.
What is the benefit of using Lambda layers?
-Lambda layers allow reuse of common logic around metrics, logs, and authentication across Lambda functions, avoiding code duplication.
How does the architecture support canary deployments?
-It uses CodePipeline to shift traffic between Lambda function versions. This allows testing a new version before routing all traffic to it.
How are costs attributed across tenants in a pooled deployment?
-There is a dedicated lab on implementing tagging and CloudWatch metrics to attribute resource utilization and costs back to specific tenants.
What mechanisms handle routing of tenants to the appropriate backend resources?
-Tenants provide their tenant name in the UI. An API looks up the relevant API Gateway URL, Cognito user pool etc. to route requests.
How does the CI/CD pipeline ensure consistency across environments?
-It uses the tenant DB table as source of truth to deploy builds across all tenant stacks and pooled resources in one pass.
Outlines
🎤 Introducing the speaker and topic
The speaker introduces himself as a solutions architect at AWS, with 20 years of software development experience. He will be talking about best practices for building a multi-tenant SaaS solution using AWS serverless services.
📝 Defining SAS and its importance
The speaker defines SaaS as a centralized, subscription-based application delivery model. He emphasizes the need for agility and operational excellence to keep customers satisfied. Both B2C and B2B SAS models can leverage the discussed best practices.
💡 SAS design principles and architecture
The speaker explains the high-level architecture of a SAS application consisting of an application plane for the core multi-tenant app/IP, and a control plane to manage operational aspects. He further describes dedicated resource (silo model) vs shared resource (pool model) deployment approaches.
✨ Overview of deployed architecture and components
The deployed architecture consists of multiple web UIs for signup/admin, API Gateway with authorizers for authentication/authorization, various microservices for managing users/tenants, shared and dedicated Lambda functions and DynamoDB tables in a hybrid model.
👥 Onboarding tenants and users
The speaker explains how new tenants are onboarded by the registration service, which creates tenant admins and assigns resources based on tiers. User management configures users with appropriate metadata for access control.
🔐 Authentication, authorization and API throttling
End users authenticate via Cognito, which passes signed JWT tokens containing tenant/role metadata. The authorizer uses this to build allow/deny policies and apply API throttling based on tenant-specific API keys mapped during onboarding.
🛡 Enforcing tenant isolation using IAM
Tenant isolation is inherently enforced in the silo model. For the pool model, dynamic IAM policies injected at runtime restrict access to only rows belonging to the tenant making the request, enabling row-level security.
🚀 Automated CI/CD deployments
A centralized build pipeline takes code from source, builds it, runs tests, and deploys updates across all environments consistently using infrastructure details saved during tenant provisioning.
📡 Routing requests to appropriate tenant resources
The speaker briefly mentions subdomain-based and lookup-based routing approaches to direct incoming requests to appropriate API Gateways, Cognito pools etc. based on tenant.
🎁 Helpful references for implementation
The speaker strongly recommends going through the hands-on workshop content on GitHub to gain a deeper understanding and leverage the extensive guidance provided around implementing such an architecture.
Mindmap
Keywords
💡SAS
💡multi-tenancy
💡serverless
💡API Gateway
💡Lambda authorizer
💡DynamoDB
💡tenant isolation
💡CI/CD pipeline
💡hybrid model
💡serverless application model
Highlights
A SAS application has two key components - the application plane that delivers the core service, and the control plane that manages it.
SAS applications can deploy dedicated isolated resources per tenant (silo model) or share resources across tenants (pool model).
The onboarding process creates a tenant admin user, registers the tenant in a database, and conditionally provisions dedicated resources based on the tenant tier.
Custom user attributes in the identity provider associate each user with a tenant ID and user role for access control.
API keys throttle tenant API usage while authorizers validate tokens and build access policies based on user roles.
For resource pooling, IAM policies restrict each tenant's row-level access to shared databases at runtime.
Database access is controlled by dynamically injecting each tenant's ID into IAM policies to filter rows.
Primary keys are partitioned with a tenant ID prefix to enable row filtering while distributing data.
A pipeline builds and tests centrally before looping through a tenant database to consistently deploy across environments.
Tenants are routed to their dedicated resources via subdomains or by an API mapping tenant names to infrastructure.
For siloed resources, isolation is handled by restricting each Lambda and database to one tenant.
User pools logicaly separate groups of users to apply specific policies as needed.
Lambda authorizers cache token validation and access policies to reduce latency overhead.
Assuming cross-region IAM roles adds minimal latency, managed via caching.
For single table designs, execution role permissions remove the need for dynamic IAM policies.
Transcripts
foreign
[Music]
first of all for inviting me here and um
you know I'm really excited to to
present this topic it's something very
near and dear to me
um in general just you know having you
know building software for about 20
years now
um so today I'll be talking about how to
build a SAS Solution on AWS using
serverless
so the idea here is to you know give you
some best practices and how to use
various AWS Services inside a reference
solution
um so if you are trying to build a SAS
if you're trying to build a multi-tenant
solution using AWS server services
this is something you really want to you
know maybe leverage as a starting point
so before I begin just maybe a quick
introduction about myself I am one of
the solution Architects here at AWS I
work for a team called SAS Factory and
by the name maybe you can imagine that
we help customers who are building SAS
based Solutions on AWS right I have been
writing code like for about 20 years now
I started my career as a Visual Basic
developer back then
um and then I three years ago I joined
AWS and and since then I've been working
as part of this stream uh I'm outside
work I have a family I have a wife and
two boys 15 and 10
um we live in Massachusetts and and
recently we also got a dog and she keeps
me busy sometimes all right so let me
begin
so I thought maybe it probably makes
sense you know before I jump into the
technical details to Define what
software as a service really means
um so if if you just you know look at
this definition which we have here is
it's more of a business delivery model
right so SAS is a way of doing business
where you centrally host your
application and your customers come and
they subscribe to that application right
some examples if I have to give think of
Dropbox slack Salesforce these are some
of the examples pretty popular SAS
platforms which you just go and
subscribe so as an example for Dropbox
you just you know buy some sort of
storage and you just pay on monthly
basis for the storage you don't have to
like buy a hard disk or install anything
on your machine right it's just some
sort of consumption based model you're
following right so today's discussion
is relevant for both d2c or business to
consumer or B2B or business to business
uh SAS models although I would say that
it's um it will be more beneficial it
will be more beneficial if you are
working on a B2B kind of SAS but you can
apply the principles and best practices
in both the cases
so I thought it is relevant
um to show this code from Jeff bizas
which he wrote in his 2016 letter to
shareholders and if you just pay
attention to this one line which he says
right that customers are always
beautifully wonderfully dissatisfied I
thought this was very relevant in this
context because if you're a SAS provider
you'll have hundreds and maybe millions
of customers and tenants in your system
and and it becomes really important for
you to raise the bar in terms of agility
and operational excellence uh just
imagine like even like a small down time
of 10 to 15 minutes can have a severe
impact on your on your business right
um so it's very important to to work
backwards and be customer obsessed and I
thought this quote is very much relevant
for today's discussion
all right so now what it means to be SAS
right so
you know before I jump into the
serverless and how to use serverless
services
I'll probably have a couple of slides
which talks about the high level
architecture patterns or design
principles that I need to understand
while building a SAS application
and the first thing you need to
understand is
broadly you can Define your SAS
application or design your SAS
application into high level components
an application plane and a control plane
so application plan is basically it's
actually your IP it's the service that
you're providing to your customers so in
this case in the examples I gave storage
as a service or a CRM platform or a
messaging service it's basically your IP
and that's pretty much your offering to
your customers and this is an a platform
that obviously is multi-tenant by
default and what does it mean what it
really means is you'll have multiple
businesses multiple customers multiple
users within those businesses who will
be subscribed to your SAS platform and
be leveraging all at same time
on the other hand what we have is a
control plane and the controller plane
is pretty much what is managing this
entire application plane right
um this is basically your platform or
set of micro Services which are
responsible for onboarding a new tenant
as an example right managing the
security aspects of a SAS solution
you'll have some sort of identity
management system
you will have some sort of micro
Services which will collect the logs and
Mattress metrics from a SAS platform and
consolidate them together and and then
be able to visualize them right and then
you'll have some sort of microservices
which will allow you to provision a new
tenant or maybe provision the resources
of a new tenant inside your SAS platform
so this control plane are basically set
of services which do not need to be
multi-tenant I mean they are just to
control the application plane or the
deployment and and operational aspects
of a SAS solution
now within the application plane there
are two typical deployment models that
we see customers follow when they deploy
inside AWS
the first one which is pretty common is
you have dedicated resources for each
tenant so take an example where you have
tenant one who is trying to onboard into
your system and you provision may be a
separate eqs cluster or an RDS database
instance for them all together right and
this model is what we call it a silo
model
basically you are just provisioning
separate infrastructure resources for
all of your tenants
now the other model is what we call it
as pool model and in this case as you
can imagine you have your existing or
your services or your resources shared
across all these tenants so imagine an
RDS cluster or maybe a eks cluster you
know being accessed by and shared by all
the tenants and as you can imagine right
it's probably pretty evident that each
model has their pros and cons right so
in case of a silo model on the left side
you see you have little bit of better
compliance alignment so if you have
customers who are little bit concerned
about the compliance and security
aspects or maybe you have Tighter slas
and you want to tune the resources
according to those tenants The Silo
model will be more suitable in those
cases whereas if you are trying to
maximize your resource utilization if
you're trying to make sure that you have
resources which are shared by multiple
users and customers and tenants which
gives you obviously better cost
efficiency you probably go for for us
for a pool model
now having said that it is important as
a SAS provider that you make sure that
proper tenant isolation and security
requirements are followed regardless of
what model you choose
in fact today the reference solution
that we will talk about I have a hybrid
model in place where we will provision
some resources or some tenants in a silo
model and some tenants in a pool model
and we will actually see how you can
enforce the tenant isolation
authentication authorization in in both
both ways
so let me just you know jump on to the
reference solution that I was talking
about which we have built which kind of
uses all these design patterns
um and it kind of applies them together
to build a working solution
um I'm going to first talk about the AWS
services and features that we leverage
as part of this reference solution
um so what we did we we first leveraged
what we call it as AWS serverless
application model or Sam and we we
deployed our resources using Sam so Sam
is basically an open source framework to
deploy your serverless resources on AWS
and then alongside we also leverage cdk
to deploy some more components mainly
for the devops footprint of our SAS
application now the difference between
Sam and cdk Sam is more like markup you
you probably use yaml or Json ctk is
more programming language so basically
you can actually write a typescript and
deploy your resources the idea was to
show that they both can work in parallel
and hand in hand
um but in your case you might just do
cdk to deploy your resources but we
intentionally use both just to show how
the kind of work hand in hand
so then our API layer was built using
Amazon API Gateway and we specifically
leveraged the rest API capabilities of
API Gateway to build our API layer so
all our apis are restful apis
um they follow that get put post kind of
methods
then for the authentication and
authorization purposes we leverage a
feature we called as Lambda authorizer
within API Gateway and I'll talk about a
little bit more detail on how that works
in subsequent slides
another feature that we use was usage
plans and API keys so typically in a SAS
portal if you have multiple tenants who
are trying to access the system
you need to be really careful about
enabling some sort of throttling
mechanisms in your application right so
think about it that you're trying to
open up your apis to hundreds of
customers what if one customer just run
accidentally maybe run a script and just
trying to bombard your apis right so
there are chances that your resources
might get exhausted so in order to
enforce some sort of throttling
mechanisms we leverage this feature and
I'll show you how that works exactly as
well
then we leverage Amazon Cognito as our
identity management platform and we we
specifically leverage what we call it as
user pools as a feature with Incognito
to store our users
um so your users can come and register
themselves and then later can
authenticate themselves using a username
password based workflow
then for the compute layer we leveraged
AWS Lambda and specifically I will also
talk about how we leverage the fine grid
Access Control using AWS SDS service
this is more relevant when you are
working in that pool model where a
single Lambda function is shared across
multiple tenants and users like how does
that find that access control work
and then we leverage Lambda layers which
is another functionality a feature
inside Lambda service which basically
let you create reusable libraries and
share those reusable libraries across
multiple Lambda functions so in our case
specifically we we leverage metrics and
logging and authentication authorization
which was more like a reusable pattern
and build them using Lambda layers and
we then you know we basically then
leverage those layers across multiple
Lambda functions
and then finally I'm not going to talk
about this in more detail today but um
obviously at the end of this slide I'll
give you some links of of the reference
solution in which is inside GitHub which
you can go and refer yourself but there
are more features like code Pipeline and
Canary deployments like how to do
traffic traffic shifting uh between your
version of Lambda so let's say you're
trying to deploy a new version of Lambda
function you might want to you know
slowly shift traffic towards that new
function
um just a way to you know automate the
canary deployment so that's another
feature that we implemented within this
reference solution
and then finally uh we use dynamodb as
our layer of storage and um dynamodb
again is a key Value Store provides you
a great way to you know store your data
in a serverless fashion
one more thing I wanted to mention at
this point is that the application plane
as I mentioned to you is deployed in a
hybrid model
so I'll show you how you actually
onboard your tenants into the system but
you will see that when you once you
onboard the tenant into the system the
basic and the standard ta tenants are
deployed in a pool model so basically
these standards will be sharing the same
set of API Gateway Lambda functions and
dynamodb table
whereas when we create a new tenant as a
platinum tier tenant um we actually
deploy separate resources for those
tenants uh again you know the whole
concept of tiering not sure if you if
you are aware of that or not but
typically when you're building a SAS
model you try to create different sort
of tiers for your for your tenants and
you kind of maybe sometimes provide more
um as you know maybe more functionality
to certain tenants provide better slas
to certain tenants so the idea here was
to show like how you can leverage those
steering based strategies to you know
even influence your architecture in this
case
and then the control plane has been
built using four different microservices
all using Lambda functions
um registration is for registering a new
tenant tenant management
is for managing the tenants and user
management is basically a facade in
front of your Cognito so if if if you
need to you know use a different IDP
instead of Cognito you can you just need
to change and set this user Management
Service
um and and you and the rest of your
application doesn't really get impacted
a lot and then finally we have a
microservice for provisioning resources
for this Platinum tier tenants
um and as you onboard
okay so let's now dive a little deep
um and and see you know how what you
really get when you deploy this Baseline
architecture right
so when you get the code from GitHub
you will see in the instructions that
you will be asked to deploy the the the
architecture using some sort of
deployment Scripts and when you deploy
that the first thing that you will see
you will get a you will probably get
three different web applications
so the first web application that you
will see here is um is the landing
signup application which we built using
angular 2 and the idea here was to
automate that whole onboarding
experience so your tenants can come and
register themselves into this
application
then at the center right here you see a
sample SAS application and this is just
a example SAS application we we just
took like a very basic e-commerce use
case
um basically it's it's very simple use
case of a order and a product service
um this will change depending upon your
needs right so you might have a totally
different use case but the sample SAS
application will give you like an idea
of how to implement multi-tenancy within
your micro services
and then we also built a admin console
for SAS providers again built using
angular 2. and in this case
um we basically you know allowed SAS
providers to manage tenants or onboard
tenants
um and you know just be able to add more
users into the system
the the authentication as I mentioned is
being managed through a cognito
then the next thing that that gets
deployed inside your AWS account is the
API Gateway and as I mentioned you will
get a Lambda authorizer which will be
responsible for the authentication and
authorization of your application
um you will get API keys and usage plans
feature built into this authorizer now
one thing you might notice which is
little different and you know sometimes
can get it confusing as well that we are
not really passing the API key from the
UI right normally typically the way
developers think that they need to pass
an API key to the API from the client
which is obviously one way to do it but
in our case what we actually did we were
generating the API key and mapping to
the tenant as part of the onboarding
process
and our Lambda authorizer is basically
trying to map the API key depending upon
the tenant so I'll show you how that
works in little bit more detail but we
are not really passing API key from the
UI but rather just mapping the API key
inside the Lambda authorizer itself
and what that really allows us to do is
it allows us to throttle all this
incoming requests so just keep in mind
that this API key is by tenant and the
way is typical SAS solution works you'll
have like a hierarchy right you'll have
a tenant like a like a business who buy
your SAS solution and you will have
multiple users within those tenants
right now the the whole concept of usage
plans and API key in our case is at the
tenant level so if you have a user who
is trying to abuse the system what that
basically means is that that abuse is
limited within that tenant because now
what you're saying is that hey
this tenant can only access the API
let's say 100 times in a day just an
example or maybe a million times in a
day right uh and if it tries to go
beyond that this usage plan will not let
you do so
then the next thing that gets deployed
as part of the architecture is those
four microservices as I mentioned for
registration tenant management
provisioning a user management
and then we are also deploying The Pool
services for the application plane so in
this case as I mentioned we took a very
basic example of an order and a product
service so we are deploying this order
and product service which will be
leveraged by all the basic and standard
tier tenants so this is something that's
that's getting deployed up front and
then we are deploying some Lambda layers
for authentication metrics logging Etc
so this is the basic architecture that
gets deployed initially
um obviously there are some things that
get added or provisioned as you onboard
and more tenants um so that's kind of
that's something I'm going to you know
talk about next at how that really works
all right so let's maybe dive a little
deeper now so you know so far I've
talked about some basic concepts I've
talked about the high level architecture
now I'll probably talk about some more
deep Concepts and also maybe show you
some code as we go along and how those
things really work
so the first thing I wanted to talk
about is registering new tenants or
onboarding new tenants right so if you
remember
we had two different user interfaces
that we talked about that could be
leveraged to onboard your tenants right
now this could be
in your case you might only have one or
both so what we have seen is that if
you're trying to operate a B2B kind of
SAS application sometimes you don't
expose your Landing or sign up
application publicly you just have an
admin console and you just use that to
to register your tenants so what you
will do is when you register a tenant
the first thing that happens is you
invoke the registration micro service
and in this case
um and this is by the way an open
endpoint right
um resistance microservice because this
is something you're exposing to your
public to your customers
and depending upon the tenant tier so if
it's a basic or a standard tier then you
follow a different workflow and if you
are like a platinum tier you follow a
different workflow right and the
workflow in this case is as I showed to
you like in couple of slides back for
for platinum tier tenants we are
deploying a whole set of AWS resources
separately in Asylum model foreign
so this service kind of orchestrates
some sort of workflow on how to
provision a new tenant and the first
thing that happens in this workflow it
it's basically creating a new
um user a new tenant admin user and this
new tenant admin user is then
responsible for onboarding more tenant
users later on
and as I mentioned depending upon the
tier of the tenant this user management
decides if I need to create a whole
separate user pools pool Incognito or
should I just kind of group
all the users within a single user pool
and for those who are not who don't know
how I'm cognitive user pools work they
are basically a way to kind of group
your users right so so if you have a
tenant who wants some sort of settings
like multi-factor authentication
different password policies it makes
sense to have a silo user pool for them
right but if you have tenants who don't
really have any unique requirements you
can just pull them into a single user
pool so again based up in our case based
upon the tenant tier we are making that
decision at this point
the second thing that happens is
actually the creation of Canon inside a
dynamodb table and this is we're used to
attendant configurations like Canon name
address billing information etc etc
and then finally if it is a silo tenant
if it is a platinum tier tenant that
requires separate AWS resources we
conditionally invoke this tenant
provisioning service to onboard those
tenants in Asylum model and I'll
actually talk about a devops pipeline as
well later I hope we'll get enough time
to talk about that which which will show
you how you can you know automate that
whole onboarding process across
different tenets as well
so I I have some code here in the slides
itself
um you know instead of Shifting screens
I thought it would be easier if you just
you know look at some of the code
obviously you can go to the GitHub
repository and do a deep dive but um so
if you look at here this code snippet is
from the user Management Service and one
thing I wanted to highlight here is that
so obviously you're getting you know in
the even body you're getting the user
details you get the tenant ID as well
and by the way this tenant ID is being
generated by the registration service
which then invokes this service right so
you get all this information and then
you have this
Cognito client and you basically are
creating a user inside Cognito at this
point and this is where we are providing
all the user attributes and these two
attributes if I just you know may pull
your attention here these are two
attributes we call as custom attributes
um so for those who who understand how
oauth and what custom attributes really
means within um within the IDP
um this is a way to add some sort of
metadata to this user right um so in
this case we are saying that okay this
user has a user role let's say tenant
admin or tenant read-only user and
belongs to this particular tenant which
is being registered which was being
registered or or what tenant this user
really belongs to right so this is our
way of telling inside the RDP which user
the standard belong to and the way we
are enforcing this is by using custom
attributes right
and on the right side
um this small code snippet is from the
create tenant from the tenant Management
Service
and this is where you have all those
um Talent attributes like name address
email etc etc
um but I am also saving an attribute
called as API key here as well so what I
do is you know I'm generating this API
key as part of the registration process
itself and mapping that API key to the
Tanner inside the standard management
table and then you know we are also
saving what user pool and app client
this um
this this tenant is supposed to
authenticate against so in this case you
know if you are generating a new user
pool per tenant for the for for The Silo
tenants
um this is where we are kind of saving
all the information so basically all the
internet information
[Music]
um is being saved inside this dynamodb
table
so at this point um you know you have
onboarded a a tenant into the system
right
um now you have a tenant admin who who
has access to your sample SAS
application
and the next thing your tenant admin or
your tenant user will do it will try to
you know login into your application
into your SAS application so this is
actually your application which they
will be using in long term like it could
be like a storage as a service or some
sort of e-commerce platform right so
it's important to understand how the
authentication and authorization really
works in this whole multi-tenant
um complex system right
so let's say a tenant user is trying to
access the system the first thing
obviously you will do is you will ask
them to provide a their username and
password credentials right
um and and basically in our case since
we are using Cognito we are just
redirecting that tenant to the Cognito
hosted UI which provides this nice
username password
um it's it's something which is already
built in you don't have to build this
username password UI right so they will
authenticate so once the authentication
is successful Cognito will send back a
jar token and in this dot token as I
showed to you you'll have tenant ID and
use the role as as custom attributes
right
so now what you will do is you will try
to access the API Gateway or your UI
your sample SAS application will try to
access the API Gateway
and the first thing that happens in this
whole workflow ad you will hit that
authorizer which will first of all make
sure that your jaw token is valid it
will go back to the Cognito endpoint and
make sure that your jaw token has an
expired it's a valid token and then it
will build something called as an
authorizer policy and this authorizer
policy
will do at least two things for now
right first of all based upon this user
role it will say that okay you know what
is this user even authorized to access
this endpoint right so let's let's say
you have a read-only user
you don't want a read-only user to
access your post and put endpoints it
probably you only want them to access
get right so in that case
um this authorization policy will
prevent you based upon the user role and
then the second thing it will do it will
basically you know provide that API key
which is stored inside the tenant
management table and enable the
throttling
um so or in other words it will say that
hey this user or this tenant is only
authorized to access your apis let's say
100 times in a minute or 100 times in an
hour and so on and so forth
so and then you know once this whole
authorization process completes once the
jaw token is validated once you make
sure that your your quotas are well
within the limits once you make sure
that the authorization is is valid then
you actually go ahead and actually call
that relevant Lambda function which is
basically your compute layer right so
this is like a typical workflow that you
follow
um and and I think the important things
to highlight here was the whole concept
around multi-tenancy in terms of how you
match manage API keys and how you
validate jaw tokens here
now
just kind of diving deep into the code
again a little right so this is again a
small code snippet I took from the from
the GitHub repository
and if you see this auth manager is
basically a python module it's slammed
earlier actually so what we have is we
have some methods inside the Lambda
layer which take the user role and tells
you whether you should allow method or
whether you should deny methods right so
in this case I'm denying
certain endpoints based upon certain
user roles and I'm allowing certain
endpoints based upon certain user roles
and basically you just you know build
that policy and pass to the uh
authorization policy as I was mentioning
previously
now
the whole tenant isolation
um is a separate so so I so I just
talked about the whole authentication
authorization how that works right um
the next next thing I want to talk about
was how do you actually enforce tenant
isolation into the system and what I
really mean by that is like let's say
you have data from multiple tenants
multiple users inside your dynamodb or
whatever data store of your choice right
how do you make sure that one tenant is
not able to access other tenants data
typically like you know going back maybe
15 years back you know I would say that
I will just write a rear class right
where customer ID is equal to one where
tenant ID is equal to ABC and that will
that's probably sufficient right but but
but then we are kind of handing over the
whole security aspects of our system in
the hands of developers right so what we
are saying that hey it's supposed to be
developers to write a code and make sure
you have beer Clause everywhere and then
you test them and then you test them
against all those security measures so
typically that that way of isolating
tenants is prone to lot of errors and a
lot of you know issues in general
so in AWS you have a concept of IAM and
I'll show you how you can apply that
um typically to enforce tenant isolation
so first of all in a silo model assume
that there is a tenant now who is trying
to access the system
it has gone through all those
authentication and authorization checks
we just talked about it's all good and
now it's time to access your Lambda
function
so now in case of a silo model we
deployed separate Lambda functions and
Separate dynamodb Tables per tenant
so our isolation story is very simple in
our case right so in our case all we are
saying is that we will associate this um
this Lambda function with an execution
role and this execution role have access
to only this dynamodb table right so
this makes the whole tenant isolation
story much more simpler
but in case of a pool model you can
imagine you could just when you have
like a single Lambda function when you
have a single dynamodb table shared
across all the tenants
this way of isolating may not be
sufficient right so how do we solve that
challenge so in order to solve that
challenge we we introduce a concept of
dynamic policies and the way it works is
you are able to at runtime
can inject a tenant ID into a dynamic
policy so this policy by the way is
stored in some sort of configuration
right so what you are doing is you are
saying that hey this policy allows you
to
make these five actions on a dynamodb
table on this particular table but only
allow an action to those rows which
start with the primary key of tenant ID
right and in this case the standard ID
is is basically a placeholder which gets
injected at runtime
so let's let's look at the code and how
that actually works right so
imagine this this is one of my code
which I have in my in my Lambda
authorizer and this method get policy
for user basically takes two arguments
right user role and tenant ID and by the
way these two arguments are already
available to us as part of the jaw token
right so what we typically do is we
decide that for this user role what kind
of policy I need to implement do I need
to you know provide like get or just or
or maybe maybe write as well and then
this tenant ID which which I get from
the Jaw token gets dynamically injected
into that policy and then I'm leveraging
this service which we call it as AWS STS
and passing that policy to get back
credentials which are now scoped to the
tenant in short what we are trying to do
is we are trying to enforce row level
security inside dynamodb right
um so basically this is a way of
implementing row level security inside a
dynamodb table by leveraging am AWS IM
policies and that and these credentials
um which are now only scope to their
tenant are passed onto the Lambda
function and when the Lambda function is
now trying to access the dynamodb table
it will Leverage
and that these credentials to access the
table right so in the typical uh
database relational database you can
think of it as a database user right so
database users normally have access to
let's say only database one database two
so you create separate users to give
access to different databases and and
you basically make sure that that that's
how you enforce some sort of security in
this case we are we are basically
leveraging the scope Dynamic policies to
to implement that kind of isolation in a
pool model
so as I mentioned this this context or
these credentials are passed as to your
Lambda function as part of the context
from the Lambda authorizer so this code
I actually took from the Lambda
authorizer and this is basically the
entire policy I was talking about the
authorization policy right
um and and there are a bunch of things
that I'm doing here so I'm passing those
success secret key I'm also passing the
API key which actually enforces the
whole
actually this is this is where you
actually enforce that whole usage plan
as well right so so you basically pass
all these information to your authorizer
output
um and and now when you are into that
whole authentication authorization
workflow right so again
you're trying to access a system this
isn't this is a example where you're
trying to access in a pool model
your authorizer is now able to access
that IAM STS service I was talking about
to get that runtime acquired tenant
scope
and pass those credentials to a Lambda
function and when now this Lambda
function is trying to access your
dynamodb table
you have to make sure to pass you know
these um so as long as you you have that
reusable date access layer which you
know passes the access secret key
um you can be assured that
that row level security is implemented
because of the IM Dynamic policy that we
talked about
now
one more thing I wanted to highlight at
this point is how would you you know
partition your dynamodb table right so
if you remember the condition of the
dynamic policy I was referring to
had a concept that okay only allow
where can where where your shot ID
starts with or where your primary key
starts with a tenant uh ID
Etc right whatever tenant ID in your
case
so now in this case I have one two three
four different rows for the tenant and a
random
suffix right so the reason I'm I'm not
really just putting
tenant ID as the primary key is just to
avoid that hotkey issue right so I'm
kind of Distributing the workload uh or
just Distributing the data across
multiple partitions but still be able to
apply that condition by using a string
like kind of fashion so basically all
I'm all I'm saying is that hey this
policy is basically applicable where
your primary key starts with tenant ID
right if you want to think of that way I
mean I I totally understand this is a
bit of a complex
um thing to understand if you're not
really used to these kind of terminology
so I will highly recommend that you go
back and look at the GitHub repositive
which I'll just show you in a minute
and then just one more thing I wanted to
cover
um and then we'll yes you know see if
you have any questions right so there's
one more thing I wanted to cover around
the whole CI CD aspect of this SAS
solution right
um now you can imagine that you know you
have all these environments you have a
pool environment you have environment
for tenant specific infrastructure
and in a typical SAS
in environment right our goal should be
to make sure that we deploy in across
all these environments in a consistent
fashion right
um in a SAS environment you don't do
like one of deployments that's an
anti-pattern you just deploy everything
to all the tenants normally and you try
to keep your versions as much consistent
as possible right so the mechanism that
we came up to do that was that we had a
build pipeline
um so which this build pipeline was
getting the source from the code combat
um repository it was building it into an
S3 uh any test you want to run you can
run right here and then we were actually
leveraging this table that we created as
part of Canon onboarding itself and if
you pay attention right basically what
the start deployment uh code block is
doing it's basically looping through
this
through this table and updating all the
stacks in a consistent fashion so in
other words
you just you know get your source of
truth from the code at one time you
build it one time and then you just
deploy across all this environment so
this is a a concept which you can
probably apply to any multi-tenant
system
um I mean in case of eks you have a
little bit more sophisticated open
source tools to do this but in case of
serverless we built this build Pipeline
and deployment pipeline concept which
you can follow
um there are more Concepts you know I
can go deep into but um I'll probably
maybe just stop
um but just one one maybe
maybe a couple of minutes on this Tenon
whole routing mechanism right so now you
have this whole concept of you know
making sure that you route your tenants
to the appropriate API API Gateway right
so whatever resources for Mana for the
tenant you want to make sure that you
route and and the way the routing
typically works is you have some sort of
way of identifying tenants in the UI
right what we have seen customers doing
is they follow an approach called
subdomain so sometimes tenant one dot
SAS application.com 10 and
2.application.com so they build some
sort of you know sub domains to to kind
of solve that challenge in our case we
were just asking
um the tenant to provide the tenant name
um and then we had an API which
basically took that tenant name and gave
you back okay what API Gateway URL this
tenant should be pointing to like what
user pool this tenant should be really
pointing to right and then we just
leverage those settings to redirect to
the relevant Cognito user pool
and then to redirect to the relevant API
right so this whole concept of routing
is another kind of complex piece that
you need to solve when you're building
this whole multi-tenant hybrid kind of
SAS application
so this is how the final you know
architecture look like eventually you
have a CI CD pipeline which takes care
of deploying across this whole
multi-tenant environment I talked about
all this different components already
right and then your application services
are now deployed in a hybrid model where
you have pooled tenants and you have
certain silos for certain tenants right
okay so
um there's a lot to a lot to cover here
and there are a lot of concept which I
talked about in last like 40 minutes
um
I I would really you know encourage and
appreciate
um if you want to you know follow some
of the workshops that we have built
around this so if you are trying to
build a SAS application a multi-tenant
application using serverless
um I highly encourage you to
um you know dive deep into the solution
into this GitHub repository
um and I think I think it you'll
probably definitely definitely leverage
and benefit a lot by following some of
the best practices and and and and the
guidance that we have provided right try
to provide here
all right thank you so much
um and that's it from I said Brian I
hope let's see if there's any questions
there were definitely some questions and
um and folks if you want to add some
more questions we've got time to answer
them but we did have some questions and
I'm grateful too because I I followed
everything but this is like
a little bit above my pay grade so I'm
glad for the question I'm like I knew
I've dealt with some of these services
and stuff inside AWS but never had to
build a multi-tenant yeah SAS
application even though I've worked for
companies that have them but
um but not actually done them myself
okay so I'll start getting to these
these questions
um
uh we've got one question here when
using Lambda authorizer with Federated
identity identity provider what is the
best way to Cache the auth token for
some time instead of authorizing on
every request in a serverless way yeah
so Lambda authorizer actually provides
you caching it's an inbuilt feature uh
which Lambda authorizer provides you all
you have to do is just enable the
caching you can cache in fact in our
case we were caching for like five
minutes so whenever a new request comes
in based upon the jar token it will
figure out you basically provide okay
what's your caching mechanism or what
what's the key you want to use for
caching right so it's basically cash on
the jaw token so it won't necessarily go
and make that call every time it will
just figure out that hey it's within
that five minute range so I'll just
cache it and just authorize this in this
particular jaw token so it's a feature
that's already built in yeah okay so
basically they don't need to really do
anything it's built in okay excellent
that's well that's easy uh all right
next question we've got
um how much latency will be added for
the additional STS assume role logic is
the strategy to Cache it yeah that
that's that's a great question and
thanks for asking I mean um so I mean I
ran some tests I think it was about like
200 millisecond or something when the
last time I ran some tests
um and uh and for that reason we're
actually leveraging Lambda authorizer to
generate the STS credential and as I
mentioned we are caching the Lambda
authorizers as well another approach
which you can take is you can
you can generate the express credential
inside your micro Services right so you
can just take that whole STS concept and
generate them inside the Lambda function
itself
um
that that way
um you don't have to but but the whole
point of Doing inside a Lambda
authorizer was to just cache it and
avoid that latency but there is
definitely a very very small latency
that gets added typically we have we
have seen you know customers not really
you know bothering about that you know
200 millisecond or whatever
um actually it's even less than that I
might have screwed up some numbers here
but it's definitely in milliseconds
okay okay so so not nothing but but not
necessary something to be aware of
honestly like you know if you have a
very low latency application something
you should you should be really
concerned and available aware of and
make sure you test with with the sdsm
without the SDS and what Behavior you're
you know seeing in application it's
worth considering as as a way yeah
excellent
um okay so next question is
uh this he says this is basically what
we're doing at my current employer the
one thing that it doesn't allow
for us is the use of direct Integrations
like appsync or step functions in things
like asynchrist is there a way to limit
the access of those Integrations and if
not as as is this something AWS is think
about
um so yes I mean um
you you are you are kind of limited in
this in this model to introduce a Lambda
function even for a basic credit
functionality right in some way or the
other
um in case of appsync I think from what
I can remember there is a way to do the
direct integration as well
um I don't remember on top of my head or
how to do it I I did at one point uh
maybe as a follow-up I can try to find
what I did a long time back uh but but
eventually
eventually if you've been able direct
integration you will have to
um you know somehow generate and pass
the access and secret key as part of the
direct integration as well right which
normally
um also there's a way to do it again you
have to you know build that inside your
um
inside your cloud formation so it's a
complex solution which you know we can
which I have worked out in past but my
advice would be to leverage Lambda
function just to avoid that complexity
so the short answer is you know you'll
be better off if you have a Lambda
function in between and provide you much
more flexibility
okay
okay
um and then uh Jason commented that 200
milliseconds sounds like you're going
from region from a region that is not Us
East one and going to the default USC
one for the STS calls yeah again
um I need to go back and check my
numbers I mean um I would I would I
would maybe run I run the test like a
year ago honestly
um and uh there was some definite there
was a lag and yeah you're right I mean
typically Between Us East one and maybe
U.S West that's pretty pretty much the
kind of latency there's definitely an
overhead to add SDS token uh to generate
an SS token but as I mentioned if you
use a Lambda authorizer and you cache
that that
latency or that you know that latency
becomes manageable right so because now
you're not doing that on the every call
but rather every five to ten minutes
okay
um okay we have one more question uh if
anybody wants to throw a last minute but
I think we have a couple more minutes
um but if so if you have an additional
question feel free to throw it in the
chat or on the ask a question module
um while we answer this this one here uh
how can we make the dynamic policy to
work with Dynamo single table design
what would be the strategy for partition
key for multi-tenant with single table
design
well I mean you you don't even have to
worry about Dynamic policy for a single
table right because as I mentioned you
can just provide that in your execution
role which table you want to access
right so so basically you can just say
that hey hey
um
so let me if I just go back right maybe
I can just show you
oh yeah sure I'll share your screen
again oh okay
Yeah so basically you don't need this
whole condition block if it's if it's
just a table level isolation right you
can just say Okay tenant one I think I
had a slide on that um
all right this one right so basically
you just associate the execution role
with the Lambda function itself and you
just say that okay you know this Lambda
function is only required or is only
allowed to access
table one the standard one order table
or order tenant one table right so
there's no need to for you to find good
access it's much simpler you just say
that your Lambda execution role has
access to access only this dynamodb
table and and basically you are not even
allowed to access any other table so
it's all built as part of the Lambda
execution rule itself maybe just to
clarify there are like two two different
concepts we are talking about here
there's a concept of Lambda execution
role so basically when you create a new
Lambda function you have to tell okay
hey what permissions this Lambda
function has and that is governed by the
Lambda execution rule so what I'm trying
to say is that when you just you know
provide that as part of the execution
rule itself it's a much simpler model
versus Dynamic policy is inside your
Lambda function right it's inside when
it's basically a runtime construct you
can't apply that as part of the
execution rule itself so
it gets much simpler just because of the
execution role in Asylum model actually
okay
um
all right so I think that's all the
questions we have I shared the GitHub
repo because this what this is basically
they can take this entire application
and just modify it and build upon it and
and deploy their own
um yeah
in fact um
exactly if you share if you share the
screen again
yes
you know I'll probably just you know
yeah so this is a workshop of the link
um I mentioned um so this Workshop
actually do a good job in in breaking
out
into steps right so this is a
complicated thing to understand and to
you know absorb in in one go so this
Workshop actually takes you through that
whole journey you know it kind of first
talks about like it talks about you know
shared services it talks about how to
add multi-tenancy
then it talks about the dynamic policy
then it applies to steering based
concept and then tenant rotting and in
fact recently we also added this um
this lab 7 into this Workshop which
actually talks about how you can
attribute cost in a pool model right and
all those links uh I did mentioned um
here
um at the end
yeah Nancy was asking if we can paste
those into the chat or something so that
they yeah yeah
if you if you I don't know how to get
them off uh do you are you on the the
crowdcast chat if you can paste them I
can I can go there yeah
sure okay so Nancy we'll get you those
links
um otherwise you can uh maybe even just
uh screenshot this right now and save
and pick them up there as well
um so I in in an above if you send me I
can actually might be worth if you if
you don't mind sharing the slide deck
and I'll post it with the presentation
on the site as well
um yeah I know you can't click them off
the screen she mentioned but uh so we
will we will share them as well uh well
I I think that's that's it for our time
I this was this is actually fabulous I I
know there was a lot uh you covered a
lot of stuff in a very short time and I
appreciate you kind of working to cram
all that in for our yeah sorry about
that I know right now
okay yeah I mean um yeah yeah please
please follow the links and you'll be
fine if you want to dive deep further
okay okay I will so folks I will share
those links um as soon as I have them as
well
um and I will post the the once
Annabelle sends me uh if you go to
cfe.dev and you go to this event I will
actually post a link to the PDF as well
تصفح المزيد من مقاطع الفيديو ذات الصلة
Intro to AWS - The Most Important Services To Learn
Software as a Service (SaaS) Explained in 5 mins
AWS Solution Architect Interview Questions and Answers - Part 2
How To Deploy Serverless SAM Using Code Pipeline (5 Min) | Using AWS Code Build & Code Commit
EKS Pod Identity vs IRSA | Securely Connect Kubernetes Pods to AWS Services
Functionality and Usage of Key Vault - AZ-900 Certification Course
5.0 / 5 (0 votes)