Belajar Kubernetes - 49 Horizontal Pod Autoscaler

Programmer Zaman Now

5 Aug 202015:52

Summary

TLDRThis video explains Kubernetes scaling concepts, focusing on both vertical and horizontal autoscaling. Vertical scaling involves upgrading resources for individual pods, while horizontal scaling adds more pods to distribute the load. Horizontal Pod Autoscaling (HPA) is emphasized as a cost-effective, automated way to manage application scaling based on metrics like CPU and memory usage. The video walks through setting up HPA in Kubernetes, demonstrating how it automatically adjusts pod replicas to meet demand. Key considerations for configuring HPA, including resource thresholds and potential pitfalls with memory usage, are also covered.

Takeaways

😀 Vertical scaling involves upgrading the resources (CPU, RAM) of a single instance, but is limited by the capacity of the instance and can be costly.
😀 Horizontal scaling, or Horizontal Pod Autoscaling (HPA), adds more instances (pods) to distribute the load, offering better flexibility and cost-effectiveness.
😀 Vertical scaling can be expensive, as increasing CPU or memory often costs more than simply adding more pods in horizontal scaling.
😀 HPA automatically adjusts the number of pods based on usage metrics (like CPU or memory), helping to scale applications efficiently.
😀 HPA is an essential feature for applications in Kubernetes, particularly for managing high traffic or heavy loads in production environments.
😀 Vertical Pod Autoscaling (VPA) automatically adjusts the resources of individual pods, but is still in beta and not widely available for all cloud providers.
😀 Horizontal scaling has no theoretical resource limits compared to vertical scaling, which makes it more scalable in the long term.
😀 The key benefit of horizontal scaling is that it distributes workload evenly across multiple pods, preventing overload on a single instance.
😀 When using HPA, it's important to set minimum and maximum replica limits to ensure pods are scaled appropriately without overloading the system.
😀 Care should be taken when using memory-based scaling with HPA, especially with languages like Java where memory usage may not decrease promptly after high usage.
😀 To implement HPA, a metrics server is required to track pod metrics, and HPA adjusts pod numbers based on predefined thresholds like CPU or memory utilization.

Q & A

What is horizontal scaling in Kubernetes?
-Horizontal scaling in Kubernetes refers to the ability to automatically scale the number of pods in a deployment, increasing or decreasing them based on demand. This helps in distributing the workload evenly among multiple pods, ensuring better performance and resource usage.
What is vertical scaling, and how does it differ from horizontal scaling?
-Vertical scaling involves upgrading the resources (like CPU and memory) of an existing pod, such as moving from 1 CPU core to 2 CPU cores. However, it has limitations because there's a maximum resource allocation, and once you reach that limit, you cannot scale any further. Horizontal scaling, on the other hand, adds more pods to handle the load, and it has no such limitations.
What are the limitations of vertical scaling?
-The main limitation of vertical scaling is that it is restricted by the available resources on the node. For example, if a node has 10GB of RAM, it cannot scale beyond that. Additionally, upgrading vertically can be expensive because higher CPU and memory configurations are typically more costly.
What is the role of Horizontal Pod Autoscaler (HPA) in Kubernetes?
-The Horizontal Pod Autoscaler (HPA) automatically adjusts the number of pods in a deployment based on metrics like CPU or memory usage. When resource usage exceeds a certain threshold, HPA increases the number of pods, and when usage decreases, it reduces the pods, helping to manage resource efficiency and application performance.
How does HPA determine when to scale the pods?
-HPA listens to metrics data from the metrics server, such as CPU or memory usage. If the usage exceeds the predefined thresholds, HPA triggers scaling actions, either adding or removing pods accordingly to maintain optimal performance.
What is the function of the metrics server in Kubernetes?
-The metrics server in Kubernetes collects resource usage data from the pods, such as CPU and memory usage. It serves this data to HPA, which uses the metrics to decide whether to scale the number of pods up or down based on resource utilization.
Why is horizontal scaling considered more cost-effective than vertical scaling?
-Horizontal scaling is typically more cost-effective than vertical scaling because upgrading individual nodes with more powerful resources (such as CPUs or memory) tends to be more expensive. In contrast, adding more smaller nodes or pods often costs less in terms of both hardware and management.
What is vertical pod autoscaler (VPA), and how does it differ from HPA?
-The vertical pod autoscaler (VPA) automatically adjusts the resource requests and limits of a pod based on its usage. It differs from HPA in that VPA modifies the resource configuration of individual pods (e.g., increasing memory or CPU), whereas HPA scales the number of pods.
What challenges can arise when using memory utilization for scaling?
-Using memory utilization as a metric for scaling can be problematic because some applications, like Java, may not release memory efficiently. For example, Java applications can consume more memory over time without releasing it, causing the memory usage metric to spike and triggering unnecessary scaling actions.
How can Kubernetes deployments be configured to use HPA for scaling?
-To use HPA for scaling, you need to configure the Horizontal Pod Autoscaler object with a minimum and maximum replica count. You also set a target for CPU or memory utilization. When the usage crosses the threshold, HPA will either scale up or scale down the number of pods automatically based on the defined limits.