What Can I Get You? An Introduction to Dynamic Resource Allocation - Freddy Rolland & Adrian Chiris
Summary
TLDRIn this video, software engineers Freddie Holland and Adrian Kiris from Nvidia's cloud operations team discuss Dynamic Resource Allocation (DRA), a new Kubernetes API for resource management. They cover various resources for workloads, limitations of the device plugin framework, and introduce the Container Device Interface (CDI). The talk delves into Kubernetes' resource allocation, including CPU, memory, storage, and device plugin resources. They explain the DRA's benefits, such as sharing resources, handling unlimited resources, and providing configuration flexibility. The presentation also outlines the process of building a DRA driver, the role of CDI in device exposure to containers, and concludes with a Q&A session.
Takeaways
- 😀 Dynamic Resource Allocation (DRA) is a new API for requesting resources in Kubernetes, introduced to enable networking technologies.
- 🔧 Kubernetes can allocate various resources for different workloads, including CPU, memory, storage, and device plugin resources like GPUs.
- 📈 The device plugin framework has limitations, such as inability to share resources and lack of advanced configuration options.
- 🚀 The Array API addresses these limitations by providing a more flexible and vendor-independent approach to resource allocation.
- 💾 Storage options in Kubernetes include scratch space for temporary data and persistent storage solutions like NFS mounts and CSI (Container Storage Interface).
- 🔌 Device plugins are necessary for utilizing specialized hardware within Kubernetes, but they have constraints that Array aims to overcome.
- 🔄 The Array API introduces concepts like ResourceClass, ResourceClaim, and ResourceClaimTemplates, providing more control and flexibility.
- 📝 The allocation process in DRA can occur immediately or be delayed until a pod referencing the resource claim is created, influencing pod scheduling.
- 🛠️ Implementing a DRA driver involves defining a name, CRDs, coordination mechanisms, and providing implementations for the controller and node plugin.
- 🔗 CDI (Container Device Interface) is a specification for exposing devices to containers, which is utilized by container runtimes like containerd and CRI-O.
Q & A
What is Dynamic Resource Allocation (DRA) in Kubernetes?
-Dynamic Resource Allocation (DRA) is a new API for requesting resources in Kubernetes, allowing for more flexible and efficient allocation of resources such as GPUs or network devices to workloads.
Why is there a need for Device Plugins in Kubernetes?
-Device Plugins are needed in Kubernetes because Kubernetes does not natively support specialized hardware like GPUs or network interfaces. Device Plugins help to utilize these resources within Kubernetes workloads.
What limitations does the Device Plugin framework have?
-The Device Plugin framework has limitations such as not supporting shared resources, difficulty in handling unlimited resources, and a lack of support for advanced configurations for different instances of the same resource.
What is Container Storage Interface (CSI) and how does it relate to storage in Kubernetes?
-Container Storage Interface (CSI) is a standard for exposing storage systems to containerized workloads in Kubernetes. It allows storage vendors to implement their own plugins for provisioning and managing storage, separate from the Kubernetes core code.
How does the Dynamic Resource Allocation (DRA) solve the issues with the Device Plugin framework?
-DRA solves the issues with the Device Plugin framework by providing a more flexible and vendor-controlled approach to resource allocation, allowing for shared resources, no requirement for pre-defining resource limits, and advanced configurations for each resource instance.
What is the role of the centralized controller in a DRA resource driver?
-The centralized controller in a DRA resource driver coordinates with the Kubernetes scheduler to decide which nodes can service incoming resource claims, allocates resources, and handles allocation and deallocation requests.
What are the two allocation modes used in DRA?
-The two allocation modes used in DRA are immediate allocation, where the resource is allocated immediately upon resource claim creation, and delayed allocation, also known as wait for first consumer, where the allocation is delayed until a pod referencing the claim is created.
How does the Kubernetes scheduler integrate with DRA during the scheduling process?
-The Kubernetes scheduler integrates with DRA by considering resource claims as part of the pod scheduling decision. It creates a pod scheduling context to coordinate with the centralized controller to determine suitable nodes for the pod based on resource availability.
What is the Container Device Interface (CDI) and its significance in DRA?
-Container Device Interface (CDI) is a specification that describes how a device should be exposed to a container. It is significant in DRA as it provides a standardized way to export devices to containers, allowing for better integration with the container runtime.
What are the key components required to implement a DRA driver?
-To implement a DRA driver, you need to define a name for your driver, create custom resource definitions (CRDs), establish communication between the controller and the node plugin, provide a default implementation of your resource class, and implement both the controller and the node plugin with the necessary business logic.
Outlines
💻 Introduction to Dynamic Resource Allocation in Kubernetes
Freddie Holland and Adrian Kiris, software engineers at Nvidia, introduce the topic of Dynamic Resource Allocation (DRA) in Kubernetes. They discuss the importance of DRA, which is a new API for requesting resources within Kubernetes. The agenda for the talk includes an overview of available resources for workloads, the workings and limitations of the device plugin, an exploration of the Container Storage Interface (CSI), and a deep dive into generator flows. They also touch upon the Container Device Interface (CDI), which is a part of the container runtime required by device drivers. The paragraph sets the stage for a detailed discussion on how Kubernetes handles different types of workloads, especially those requiring specialized hardware like GPUs or networking capabilities.
🔌 Understanding Kubernetes Resources and Device Plugins
The paragraph delves into the types of resources available in Kubernetes, such as CPU, memory, and storage, and how they are allocated to workloads. It explains the role of the kubelet in reporting node status, which includes both built-in resources like CPU and memory, and device plugin resources like GPUs. The concept of 'requests' and 'limits' in Kubernetes is introduced, which helps the scheduler to place containers on nodes with sufficient resources. The paragraph also discusses the evolution from basic storage options to more advanced and flexible solutions like CSI, which allows storage vendors to implement their own plugins without being tied to Kubernetes release cycles. Additionally, it addresses the limitations of the device plugin framework, such as the inability to share resources and the lack of support for advanced configurations.
🚀 The Emergence of the Resource API (DRA) in Kubernetes
This section introduces the Resource API (DRA) as a solution to the limitations of the device plugin framework. DRA, which started in Kubernetes 1.26, allows for more flexible and advanced resource management. It is designed to give vendors full control over resource management, similar to how CSI works for storage. The paragraph explains the components of DRA, including the resource class, resource claim, and the use of Custom Resource Definitions (CRDs) to allow for vendor-specific parameters. It also discusses the difference between resource templates and resource claims, and how they can be used to create new resource claims with each reference. The paragraph highlights the benefits of DRA, such as the ability to share resources between workloads, solve the issue of unlimited resources, and provide more flexibility in resource configuration.
🔄 Deep Dive into Resource Sharing and Allocation Modes in DRA
The paragraph explores how DRA enables resource sharing between different containers within the same pod or across different pods. It emphasizes the importance of the resource claim's name in facilitating sharing and mentions that the DRA driver implementer must specify that a resource is shareable for such configurations to work. The discussion then moves to the two allocation modes in DRA: immediate allocation, where resources are allocated as soon as a resource claim is created, and delayed allocation, which waits until a pod references the claim before allocating resources. The paragraph provides a detailed explanation of the flow of events in both allocation modes, including the interactions between the centralized controller, the node-local couplet plugin, and the Kubernetes scheduler.
🛠️ Building a DRA Resource Driver for Kubernetes
This section provides an overview of the process of creating a DRA resource driver, which involves defining a name for the driver, creating CRDs, and determining the communication method between the controller and the plugin. It outlines the key components of a DRA resource driver, including a centralized controller, a node-local couplet plugin, and a set of CRDs. The paragraph explains the responsibilities of each component and the two allocation modes: immediate and delayed. It also discusses the driver interface in the controller, which includes methods for getting class and claim parameters, allocating resources, handling unsuitable nodes, and preparing and unpreparing resources. The paragraph concludes with a brief mention of CDI, which is used to expose devices to containers, and provides a reference to an example driver that serves as a starting point for developers looking to create their own DRA drivers.
⚙️ Driver Interface and CDI in DRA Resource Drivers
The final paragraph focuses on the driver interface within the controller and the role of Container Device Interface (CDI) in DRA resource drivers. It describes the methods of the driver interface, such as getting class and claim parameters, allocating resources, and handling resource deallocation and preparation. The paragraph also explains the importance of CDI, which is a specification for describing how a device should be exposed to a container. It mentions that CDI is consumed by the container runtime to export devices to containers. The paragraph concludes with a list of resources for further reference and a summary of the key points covered in the presentation.
Mindmap
Keywords
💡Kubernetes
💡Dynamic Resource Allocation (DRA)
💡Device Plugin
💡Container Storage Interface (CSI)
💡Persistent Volume Claim (PVC)
💡Node Status
💡Resource Class
💡Container Device Interface (CDI)
💡Resource Claim
💡GRPC
Highlights
Introduction to Dynamic Resource Allocation (DRA) in Kubernetes by Nvidia software engineers.
DRA is a new API for requesting resources in Kubernetes, enhancing resource management.
Explaining the different resources available for workloads in Kubernetes.
Discussion on how to request resources such as CPU, memory, and storage.
Overview of the Device Plugin and its role in Kubernetes networking.
Limitations of the Device Plugin framework and its impact on resource allocation.
Introduction to Container Storage Interface (CSI) and its advantages over entry storage volume plugins.
Deep dive into the Generator flows and the steps to build a custom direct driver.
Explanation of Container Device Interface (CDI) and its necessity for device drivers.
The variety of resources that can be allocated to workloads, including specialized hardware.
How Kubelet reports node status and manages resource allocation.
The process of allocating CPU, memory, and H3 resources in Kubernetes.
Storage options in Kubernetes, including scratch space and persistent storage.
The evolution from entry volume plugins to CSI for storage management.
The concept of dynamic provisioning in Kubernetes and its benefits.
The necessity and functionality of Device Plugins for specialized hardware utilization.
The issues with the current Device Plugin framework, such as lack of shared resources and configuration limitations.
Introduction to the Array and its main APIs as a solution to the limitations of Device Plugins.
The anatomy of a DRA resource driver and its components.
Explaining the allocation modes in DRA: immediate allocation and wait-for-first-consumer.
How to implement a resource driver for DRA, including defining CRDs and driver interface.
Resources and tools available for developing DRA drivers, including example drivers and helper packages.
Conclusion and Q&A session wrapping up the discussion on DRA in Kubernetes.
Transcripts
and so I'm Freddie Holland and with me
is Adrian kiris we are software
engineers at Nvidia part of the cloud
operation team in the networking
business unit
or that today work is to enable
networking Technologies in kubernetes
today we'll talk about Dynamic resource
allocation also known as dra
is a new API for requesting resources in
kubernetes
okay so let's take a look at the agenda
first we'll uh
we'll go over the different resources
available for your workload and how do
you actually request them then we'll
talk about the device Plugin or do they
work and what are the limitations and
that will go over the array and its main
apis after that we go and go deep dive
into the generator flows and also we
will go over the steps that we need to
do in order to build your own direct
driver
lastly we'll cover CDI CDI is a
container device interface which is part
of the container runtime that is
required by the DI drivers
okay let's start
so
so Kubota is all about running workloads
inside containers right but not every
workloads are the same requirements for
example if you have a CNF application
like router or firewall you need some
networking very specific requirements or
if you're using the predicate for this
application when you use pages right and
in AI for example gpus are required both
for training and insurance in trending
you will need multiple gpus among
multiple nodes and maybe we required
some fast networking in in order to be
able to sell efficiency data within them
maybe using GPU direct or dma
so what are the resources that we want
we can allocate to our workload so first
we have the regular one CPU memory your
Treasures then we have storage related
workload and eventually you also have
the device Plugin or workload resources
so what are the device plugin resources
for example android.com GPU
okay so where do we see this resources
we have in the north status actually two
sections the first one is the capacity
second one is allocatable the capacity
is a wall full of resources that we have
on this specific node and the local
table is what is still available to a
scheduled future
workloads so cubelet is in charge of
reporting the not status and it is also
in charge of recording the the available
resources
so you see in the first part what we can
call the built-in resources like CPU
utilities and memory and second part we
have some example of some device plugin
resources
sorry
okay next
here an example of allocating CPU memory
and H3 so under the spec of your board
on on the edge container you have two
section
request on limit so the scheduler will
look at the request part and we'll
search for a node that has enough
resources to actually answer this
request and according to it you will
decide where this board will be
eventually scheduled
so in storage we have several options
first we have the available storage some
will call it the scratch space so if for
example if you want to download some
large file or have some State uh serve
the neural in your local files you can
use this one but you need to understand
that it is not persisted so if your part
is restarted all your data will be lost
regarding a persistent storage we have a
few options first one is what we call
the entry storage volume plug-ins so in
this example we are an NFS Mount that
you can just specify the NFS server and
although the needed parameters and we'll
get the mod inside your pod
so what is it called entry and it is
because that the implementation of this
volume plugins are part of the
kubernetes core code and it it was
actually not very convenient for storage
vendor to have this code inside the
kubernetes base code because so it's our
tightly coupled with the Cadence of
releasing kubernetes so if you have a
bug or they want to release a new
feature they need to wait for the next
stories so as an evolution from the
entry volume plugins we got the CSI CSI
is container storage interface and it
gave actually the storage render a full
freedom to implement a other own
contents and they are releasing other
content so they can fix bugs and add
features
then just need to implement the apis
that was defined by by CSI so what we
have in CSI we have a storage class in
the storage class you have a name and
you have the CSI driver that will
eventually provision and expose these
volumes to your port
in addition you have a possibility to
have a bunch of parameters these
parameters are freestyle it means that
you can do whatever you want there but
they are very limited inside
infrastructure because they are just a
strength to string key map kind of
structure
so next we have the persistence volume
claim the volume Cam that you specify
some parameters like for example access
mode and size and most importantly you
can also specify the storage class name
which will actually stay which provider
will eventually provision your volume
so there are the dynamical source
allocation it is taken from this API the
main approach so it will it will take
the van ID of a storage class and the
claim and it will extend it it actually
for any resources not only storage
okay so how do you actually request the
volume inside the Pod so you have a
volume part under the spec and then you
can actually say what is the PVC that
you want to have in your in your
workload in this case the PVC was
already created before
okay next next method that we have is
the device plugin
so why do we need device plugin so
sometimes as your node you have
specialized hardware and for example
here we have a Bluetooth we have a gpui
100 and connect 67 Nick and we want to
be able to utilize this Hardware inside
your workload and like we saw kubernetes
don't the not support specialize
Hardware that's only a set of limited
resources that is aware of
so here comes the device plugin to help
us actually to utilize this resources so
how does it works
so device plugin is a couplet plugin it
means you know in the node it will first
advertise himself to coblet and we say
okay this is the resource that I'm
working on and then it will expose a
grpc interface to qubit
and the most important method here is
the list and watch so the cookware will
has to plug in give me a list of the
available resources and it is a
streaming API so if there is a change on
the status the device plugin can update
kublet with a change
and the second important part will be
allocate allocate will be called by
couplet just before creating the port
and the vast beginning we give the
couplet loss a list of instruction or to
be passed on to The Container runtime
explaining exactly what you need to do
to be able to access this this resource
okay so as I mentioned we can see this
this resource is also available on the
North status here we have two example
once the GPU second one will be above
SRV resource
and or do you actually require them
inside your pod under the resource you
have the request and then it goes like
domain slash name of the resource okay
so here we are requesting one GPU and
one siob resources
so I can see this interface is uh you
can see call it countable it's just a
number
so what are the issue with
with a device plugin framework first of
all you cannot have shared resources
let's say for example you have a GPU
that is able to work with different
workload at the same time using device
plugin you cannot do that why is that
because osrc don't have a name it's just
a number so if you would like to request
another one or you one that has already
been created you don't have the
possibility to do that
second point is unlimited resources
so if you are familiar for example with
codeword which is running VMS inside in
kubernetes they have a device plugin for
KVM and it has a count of 1000 and it
really doesn't make sense because KVM is
not a limited resources it's just a
configuration of the of the CPU
but since they want to use all the
things that are part of the device
plugin in firmware they still need to
publicize a number account so it's kind
of hard but actually it doesn't have any
any meaning
you don't have the possibility to do
Advanced configuration let's say for
example that you have two gpus and you
want to have different configuration on
10. the device plugin framework don't
have the possibility to do that
everything will be configured the same
the same
so here comes the area to actually
answer of all of this issues that we
mentioned
so what is the array it is a new way of
requesting requesting resources in
kubernetes it started in one
not 26. you will need to have a
container runtime that is support CDI
CDI is container device interface you
can see here the version phone
containerd and Kyle that do already have
this support
it is still in Alpha meaning that if you
if you want to start to tie that we need
to enable a feature guide
and the idea behind it is actually to
give an alternative of the device plugin
framework that we mentioned earlier
so and similar to csis that is to give
the full control to the vendors like we
mentioned storage van dorno can release
anything other on Cadence we want to do
the same regarding resources and it can
and it it actually takes the same
approach so if you remember we have a
storage class now we have a resource
class
we have a resource claim so the idea is
similar but in addition we have also
some things that are a little better so
for each resource class you can have a
crd defined by the vendor that can be a
class parameter so if you remember we
have the list of string in this in the
storage class now we have a full
possibility that the vendor of the
resource of the dra driver can have
whatever you want into parameters it can
be a really much more complex at what we
had before
any addition to the resource time also
have the same thing you can point to a
vendor defined crd with a lot of
parameter for each resource claim
and we also have a resource line
template which we will explain in a few
sides
okay so first of all although the track
of the Pod change most important thing
as end user what would you need to do so
it's a little bit more verbos but we
need to keep in mind that it will give
us a lot of more flexibility on on the
when using these resources so on the
left we have the device plugin
configuration there's a count that we
mentioned earlier so we want two gpus on
the new way you have a new section on
the resources it's called claims and
then you give a list of names the name
of the claims as a resource claim that
you want to use
then you have also a new section it's
called resource claim and here there you
need to configure for each claim that
you want to use what is its source in
this example it is a resource claim
template that is configured on the right
and each time that we reference this
resource time template a new resource
claim will be created with a spec
defined in the resource claim template
so the idea is that every time you use a
resource template the new resource claim
is created it's not reusing an existing
one
and lastly we can see that in the spec
we have a reference to as a resource
class
okay let's take a look at the resource
class
first of all all the examples here are
from an existing
dra driver a quantity zero driver for
gpus that has been implemented by Kevin
close from Nvidia he also did a great
talk about it with Alexa for men conform
Intel you can check it out from the last
clipcon we'll give a link at the end
so the resource class will Define the
first of all the name of the result and
then the dra driver that will actually
be bind to this resource
it will be created same as the storage
class created by the Cs admin
okay next we mentioned that we also have
a possibility to have parameters for the
resource class so how do we do that we
just configure a reference in the in the
in the form of the API Group kind name
which is a crd that the array driver
will Implement and then you can have a
specific parameters so in this example
we want dpus that are not non-shareable
okay so we have a resource contemplate
and resource claim so what is the
difference like I mentioned earlier a
resource content that creates a new
resource claim for each time they are
reference and the resource claim will
refer to the exact same object
all right so now
we have we mentioned that also the
resource claim can have a
parameter and it gave us a lot of
possibility so here in this example we
have a GPU selector on the resource
plane meaning
here we can we actually want either a
default GPU or other either a V100 with
less than 16 gig memory so you can
imagine that there's a lot of
flexibility and and possibility that you
can configure your resources with the
same type of resources but with
different configuration on each instance
okay next how can we share actually
resources between a workload so here an
example on the same port different
containers you just point to the same
claim since now we have a name it's
quite easy so we have GPU name and then
on the resource link
section you will Define the cells so
where you pre-create your resource and
then you can actually refer it in from
different two different containers in
the sample
and it goes the same regarding sharing
between different ports so you again
using the name of the pre-cutter
resources one thing to mention that the
dairy driver implementer needs to
specify in the resource claim that this
resource is actually shareable otherwise
the scheduler won't allow this kind of
configuration
so we saw that direct comes and solve us
the share issue that we mentioned like
we just saw it also solved the unlimited
resources because you don't have to
actually expose the number of resources
that you you want to support it's not
required and you can easily implement
the direct driver that don't have any
limits and last one is a lot of more
flexibility regarding the configuration
each different instance of the same
resource can easily have different
configuration
will take us in a more deeper dive about
different flow
all right thanks Freddy for providing us
an overview of dra so yeah we're a bit
short on time but let's try to make it I
will go through some high level flows
here to understand what happens a bit
under the hood with the array as well as
we'll go ahead and see what it what is
required to implement a resource driver
and some helpers for that and then we'll
have some time for a question hopefully
all right so what is the anatomy of a
dra resource driver essentially it's a
composed of two components separate by
coordinating a centralized controller
which is running with high availability
and the node local couplet plugin
running as a demon set
uh and we also have a set of crds as
Freddie explained
the centralized controller coordinates
with codependence is scheduler to decide
which nodes
um an incoming resource claim can be
serviced on it allocates the resource
claim uh once the schedule repeats the
node and it also in charge of the
allocation
the Google plugin essentially is in
charge of doing all of the node local
operations it will publish the node
local state to the central controller it
will perform any allocation requests
requested by kublet we'll see that later
and it will also perform some
deallocation requests
the crd is essentially each resource
driver can Define its own it's it's a
driver-specific resource class
parameters resource claim parameters
additional crds which can be optionally
added for example to store the global
State or the per node state to keep
track of allocated resources
and that's it
in regards to the allocation modes okay
there are two allocation modes used one
is immediate allocation
which means that the allocation happens
immediately for resource claim
um once the resource game is created uh
the user driver will allocate the
resource on a specific node and then pod
which references claim will get
scheduled onto that node
delay the location or also known as wait
for first consumer we'll delay the
allocation of Verizon's claim until a
pod is referencing it at that point
essentially the the resource
availability will be considered as part
of the Pod scheduling in a sense where
the entirety request of the Pod the
resources CPUs device plugin other
claims will be taken into consideration
in the scheduling decision and we'll see
how this happens
right let's dig into the immediate flow
um right so we have like uh the the
floor is the same at the beginning so
the admin will deploy uh the J resource
driver the Google plugin the crds and
we'll Define a resource class a user
will create
the the resource claim
for the resource class
okay at that point the centralized
controller picks that up
and proceeds with allocation of this
resource it will allocate it on some
node in the cluster
once it's allocated it will essentially
update the recess claim status with a
resource Handler this one contains
essentially a string blob which is
passed through the system essentially by
the couplet plugin to the dra driver
again
um
as well as setting the node on which the
resource was allocated on
at that point a user will create a pod
which references that resource claim and
the kubernetes schedule will kick in
here inspect the Pod it will see that it
has a resource claim referencing and
will proceed a proceed in scheduling
this pod onto the node where the
resource was allocated on
it's a long process right so once the
Pod the node was selected then the
couplet will pick that up it will then
again see that this port is referencing
a resource claim it will call the
couplet plugin via grpc passing in the
claim information the code plugin will
perform the allocation needed and return
a set of CDI device identifiers we'll
discuss them at the end which will then
pass to The Container runtime
and the containing span up exposing the
the devices
all right that was the immediate
allocation now we'll see like we'll sort
of complete the picture for the delay
the location
the initial flow is essentially the same
right the admin will deploy whatever is
needed the user will create the resource
claim
at that point yeah one thing to note is
that the centralized control does not
kick in again it's wait for first
consumer it will not kick in the the
user will create a pod referencing in
the resource claim at that point
uh the kubernetes schedule picks that up
and now
um it essentially looks at the Pod looks
at the resource claim it creates an
object called pod scheduling context
this object is used to coordinate
operation between different dra drivers
and the kubernetes scheduler for the pod
it will set a set of potential nodes
essentially these are node where where
the Pod may run on
and on the other hand the central
controller will read those potential
nodes and we'll sort of try to narrow
down the list
by updating this object with a set of
unsuitable nodes so it's a subset of
nodes which this spot should not be
scheduled on this operation is repeated
for all resource drivers until a
scheduling decision is made once this
scheduling decision is made a kubernetes
schedule will update the Pod scratching
in context with the selected node so a
node was chosen
at that point the centralized controller
will pick that up the selected node and
will proceed with the allocation onto
that node same as it was in immediate
allocation
so this was like a quick rundown yeah of
the two allocation loan and how it works
with kubernetes and now let's discuss
like at high level how you would write a
dra driver
um yeah so essentially what you would
need you need to First of course Define
a name for your driver
um Define the crds which are are to be
referenced in the resource class and
resource claim parameters
uh essentially these are costume
parameters for your resource which may
be Global or per resource allocation
you decide how the the controller and
the plugin are going to coordinate or
communicate is it per node crds is it
some grpc with some database combination
of the two the key Concepts here that
you need to you essentially need to
represents the following into represents
the set of available resources in the
cluster or on the Node the set of
allocated resources and the set of
prepared resources
you will need in addition to provide a
default implementation of your resource
class to be distributed with your driver
so user can use it and then of course
there's the implementation
implementation of the controller and
implementation of the couplet plugin
both of them include some boilerplate
code in order to interact with
codeburnettes apis in controller case or
interact with couplet
as well as of course the business Logic
for the two
okay so this was a long list so to help
you do that essentially what we have is
a bunch of packages like upgraded by the
kubernetes ecosystem to help you do that
the first one is the controller package
from the dynamic users allocation
controller project which implements most
of the boilerplate code to interact with
the kubernetes dra API object
you it defines a driver interface which
you need to implement and we'll go over
that and once you implement that you
provide it to the new method and you
just you get a controller and just call
run
over simplifying it a bit but that's at
high level how it works for the complete
part there is an implementation for of
the registration with couplet
grpc so this is all like the
registration is already provided for you
you just need to provide the grpc
implementation for the node server so
it's like the JPC server which will
allocate and deallocate resources and
again call a run method there as well
grpc is defined in the couplet like apis
and the kubernetes project and
that's for the kubernetes part we have
like a bunch of CD hype CDI helpers here
so you can reference them later
essentially they will help you create
CDI device specification
um to be used later on but container
runtime and I think most importantly
importantly here is the example driver
there's a dra example driver which is
fully functional on top of like mock
gpus you just need a kind cluster to
bring it up and it there is like a
pretty good readme with step-by-step
instruction how to run it there you can
expect the different parts it serves as
a reference implementation where you can
sort of take reference for it for kit
and extend or rewrite yeah
all right so in regards to the driver
interface in the controller so
um that's the driver interface so it has
a couple of methods we'll quickly go
over them
um there is the get class parameters and
get claim parameters
nothing too fancy here if we discuss the
vendor specific
crds for the class and claim this
discussions claim these are the Getters
for them
they will return the specific instance
of the event of the crd
there is the allocate call so the
allocate will essentially perform the
allocation of a resource notice the
selected node field so this guy is empty
in case of immediate allocation where
you need to choose your own node and it
will have a value in case of delay the
location because of the whole pod
scheduling context which we went through
again it will essentially you can you
will get all the claim the claim
parameters the class the resource class
there is class parameters and you need
to return a location result this struct
will contain eventually that string blob
which will contain information of the
allocated Resource as well as the node
where the resource is available on
uh the allocate call essentially the
allocates the resource it's called when
the resource claim is deleted
um it should essentially free resources
uh which were created by this claim
unsuitable nodes so uh these guys gets
called when uh
this guide gets called when um
during the wait for first consumer flow
where we need to negotiate with the
scheduler on which nodes we are
scheduled on
essentially we need to it accepts like
potential nodes and it needs to update
in in the pasting claim allocation
object
the unsuitable nodes for each claim
um again as I discussed before so you
update the struct with what you don't
want to be scheduled on
in this one
for the note part so there is the node
prepare and unprepared resource uh this
again run on each node by the equivalent
plugin and not prepare resource will
prepare the resource it will generate a
CDI device specification and return the
CDI device IDs one thing to note here is
uh and of course the resource handle
that you will get the new source handle
in the request which is that string blob
which we talked about earlier
um one thing to note the call must be
the potent and you have under 10 seconds
to finish the call currently at least
with the kubernetes
unprepared it does like the opposite of
prepare resource it's get called when I
didn't mention that the first one gets
called when the port is created and it
references claim this one will get
called when a port is deleted and you
need to perform cleanup
um for their for their resource and
again this call must be the important as
well
um and uh yeah and let's like talk a
little bit about what is CDI like we
mentioned before a couple of times CDs
stands for container device interface
it's essentially a specification
which it's a json's formatted
specification which which describes how
a device should be exposed to a
container it contains essentially
information such as device nodes which
needs to be exposed like chart devices
as environment variables host mounts and
hooks that needs to be run
it's sort of a standardized way to
export devices to container it's big
it's getting it's getting consumed by
the container runtime like container the
cryo
um
to to export devices to container and
that's like an example of a CDI device
specification
um
just contains as I said you can dig into
it you know later
um and like next thing is just link of a
couple of resources which we added uh
throughout this presentation so it's all
here uh you can reference later and with
that I think we are done 12 seconds to
go
thank you
[Applause]
関連動画をさらに表示
L-1.1: Introduction to Operating System and its Functions with English Subtitles
CompTIA Security+ SY0-701 Course - 4.1 Apply Common Security Techniques to Computing Resources
Basics of OS (I/O Structure)
Operating System Services
Company Portal || Deploy Store APPs to Android/ IOS Devices || Intune Tutorial Series | Part 16
2-1 Introduction au Système d'exploitation
5.0 / 5 (0 votes)