AWS Batch on EKS

Containers from the Couch

9 Nov 202256:27

Summary

TLDRIn this episode of 'Containers from the Couch', AWS developers Jeremy Cowan, Psy Venom, Angel Pizarro, and Jason Rupert discuss the new capabilities of AWS Batch on EKS. They explore the fully managed service for running batch jobs on Kubernetes, its history, and the benefits of using it for high-throughput workloads. The conversation covers topics like job queues, compute environments, and the integration of AWS Batch with EKS, providing insights into how customers can leverage this service for their batch processing needs.

Takeaways

😀 AWS Batch is a fully managed service designed for running batch jobs and high-throughput workloads like genomics, financial risk analysis, and AI/ML training.
🔧 AWS Batch supports running jobs on EC2 instances with ECS and ECS Fargate, and has recently introduced support for EKS clusters.
💡 The motivation for offering AWS Batch on EKS is to leverage the scalability and just-in-time allocation of cloud resources for batch workloads, which differs from traditional on-prem batch processing.
🛠 AWS Batch uses its own scheduler and scaling system, rather than Kubernetes' default scheduler, to efficiently manage compute resources for job processing.
📈 AWS Batch is optimized for both maximal throughput and cost-efficiency, balancing these two factors when scaling compute resources for jobs.
🔑 Users are responsible for creating their EKS clusters, while AWS Batch manages the compute environment within the EKS cluster to run batch workloads.
🚀 AWS Batch supports job dependencies and array jobs, allowing for complex workflows such as MapReduce to be executed as a series of batch jobs.
🔄 AWS Batch does not use Carpenter for scaling compute instances, opting instead for its own managed scaling approach that integrates with EKS.
🔒 AWS Batch allows customers to provide a launch template for their nodes, enabling customization and adherence to hardened image policies.
📊 AWS Batch integrates with monitoring tools like Prometheus and Grafana, allowing users to track resource usage and job performance within EKS.
🆕 AWS Batch on EKS is in its early stages, with plans to add features like persistent volume support and multi-node parallel job types based on customer feedback.

Q & A

What is the role of Jeremy Cowan in the episode?
-Jeremy Cowan is a developer Advocate at AWS and the host of the episode, facilitating the discussion about AWS Batch and its integration with Kubernetes.
What is the main topic of discussion in this episode of 'Containers from the Couch'?
-The main topic is running batch jobs on Kubernetes with AWS Batch, exploring the new solutions and integrations offered by AWS for managing batch workloads on EKS.
What is AWS Batch and why was it created?
-AWS Batch is a fully managed service designed for running batch jobs at scale. It was created to handle high-scale workloads such as genomics, financial risk analysis, AI, or ML training, by providing a compute scheduler for these batch workloads.
How does AWS Batch differ from traditional on-premises batch processing?
-AWS Batch leverages the scalability of the cloud and just-in-time allocation of resources, making scheduling more flexible and efficient compared to the capped resources and shared infrastructure of traditional on-premises batch processing.
What is the significance of the term 'compute environment' in AWS Batch?
-A compute environment in AWS Batch represents the types and amounts of resources available for jobs. It defines the minimum and maximum number of CPUs a cluster could have and specifies the target container platform, such as ECS, EC2, Fargate, or an EKS cluster.
What is a 'job queue' in the context of AWS Batch?
-A job queue is a central resource in AWS Batch where all work is submitted. It holds information about the jobs, such as the number of CPUs and memory requirements, and is connected to one or more compute environments to manage the execution of these jobs.
How does AWS Batch handle scaling for batch workloads?
-AWS Batch uses workload-aware scaling to make aggregate decisions on how to scale up or down compute resources based on job queues, requirements of the jobs, and cost considerations, optimizing for both throughput and cost efficiency.
Why did AWS choose to integrate AWS Batch with EKS instead of creating a separate control plane?
-AWS chose to integrate with EKS to leverage existing customer infrastructure and preferences. Many customers were already using EKS for their workloads, and integrating AWS Batch allows them to manage batch workloads within the same environment they are familiar with.
What is the role of 'jobs' and 'job dependencies' in AWS Batch?
-Jobs in AWS Batch are individual tasks that need to be executed, and job dependencies define the order in which these jobs should run. For example, a reduce function might depend on the completion of all map jobs in a MapReduce architecture.
How does AWS Batch support the execution of multi-node parallel jobs?
-AWS Batch supports multi-node parallel jobs through a job type designed for ECS. While this feature is not yet available for EKS, it is under consideration for future releases to accommodate machine learning workloads and other use cases that require this design pattern.
What kind of access does AWS Batch require to an EKS cluster?
-AWS Batch requires the ARN of the EKS cluster and the service role to access the cluster. It uses these to integrate with the EKS API and manage the scaling of nodes for batch workloads within the customer's VPC.
How can customers provide feedback or get support for AWS Batch?
-Customers can provide feedback or seek support through various channels, including reaching out via the 'Contact Us' page on the AWS website or engaging with AWS representatives on social media platforms like Twitter.