032 Job Scheduling in MapReduce in hadoop
Summary
TLDRThis video explains the job scheduling mechanisms in the MapReduce framework, focusing on different schedulers used in both MapReduce 1 and MapReduce 2. It covers three primary types: First-in-First-out (FIFO), Capacity Scheduler, and Fair Scheduler. FIFO is the simplest but may lead to delays for small, high-priority jobs. The Capacity Scheduler allows multi-user job distribution, ensuring fair resource allocation between organizations, while the Fair Scheduler prioritizes parallel job processing. The video highlights their pros and cons, their suitability for various use cases, and their impact on cluster resource management.
Takeaways
- 😀 MapReduce job scheduling is crucial for efficient resource management in distributed systems like Hadoop.
- 😀 There are multiple scheduling schemes in MapReduce, including First-In-First-Out (FIFO), Capacity Scheduler, and Fair Scheduler.
- 😀 In MapReduce 1, the default scheduling scheme is FIFO, while MapReduce 2 uses Capacity and Fair Schedulers, with Capacity Scheduler as default.
- 😀 FIFO scheduling processes jobs in the order they are submitted, which can lead to long waiting times for small, high-priority jobs.
- 😀 FIFO scheduling in MapReduce 1 was slightly improved by adding priority levels (very high, high, normal, low, and very low) to address the issue of long waiting times.
- 😀 Preemption is not supported in FIFO, meaning longer jobs can prevent shorter, high-priority jobs from executing in a timely manner.
- 😀 Capacity Scheduler in MapReduce 2 allows resource allocation based on user or group-specific queues, which can be rented out for different organizations.
- 😀 The Capacity Scheduler allows both soft and elastic resource allocations and can be configured for different levels of rigidity based on organizational needs.
- 😀 Jobs in the Capacity Scheduler are allocated resources based on the capacity defined for their respective organization or group, promoting efficient cluster utilization.
- 😀 The Fair Scheduler aims to improve job processing by picking jobs from queues and processing them in parallel, ensuring a better user experience for all jobs in the queue.
Q & A
What is the primary function of the job scheduler in the MapReduce framework?
-The primary function of the job scheduler in the MapReduce framework is to allocate resources and manage the execution of jobs submitted by multiple users on a Hadoop distributed network.
What are the three types of job scheduling schemes available in MapReduce 1?
-The three types of job scheduling schemes available in MapReduce 1 are First-In-First-Out (FIFO), Fair Scheduler, and Capacity Scheduler.
Which scheduler is the default in MapReduce 1?
-The default scheduler in MapReduce 1 is the First-In-First-Out (FIFO) scheduler.
How does the FIFO scheduling scheme work?
-In the FIFO scheduling scheme, jobs are executed in the order they are submitted. The first job submitted gets executed first, potentially delaying smaller or high-priority jobs if a larger job is running.
What problem arises with the FIFO scheduling scheme?
-The problem with the FIFO scheduling scheme is that small, high-priority jobs may have to wait for a long time if a larger job was submitted before, leading to long delays.
What enhancement was introduced in FIFO scheduling to improve its performance?
-FIFO scheduling was enhanced by introducing a priority scheme, where jobs can be categorized into different priority levels such as very high, high, normal, low, and very low, helping smaller jobs move up the queue.
What is the Capacity Scheduler and what does it focus on?
-The Capacity Scheduler is the default scheduler in MapReduce 2, focusing on multi-user scheduling by dividing the cluster's resources into queues for different organizations or users, allowing them to request specific resources for their needs.
How does the Capacity Scheduler manage resources for multiple organizations?
-The Capacity Scheduler allows organizations to rent a portion of the cluster's resources, which are allocated based on the organization's needs. It can be configured for elastic or hard allocations, depending on the requirements.
How is the Capacity Scheduler different from the FIFO scheduler in terms of job management?
-Unlike FIFO, the Capacity Scheduler divides resources into queues for different organizations, and each job is processed according to the availability of resources in the respective organization's queue, not based purely on submission order.
What are some of the features of the Capacity Scheduler?
-The Capacity Scheduler includes features like capacity guarantees, elasticity, and security, which can be customized by administrators to suit specific needs.
What are the main differences between the Capacity Scheduler and the Fair Scheduler?
-Both schedulers divide resources into queues, but the Fair Scheduler uses a more flexible approach by allowing jobs from different queues to be processed in parallel, aiming to provide a better user experience by reducing waiting times for smaller jobs.
What is the current status of the Fair Scheduler at the time of this video?
-At the time of the video, the Fair Scheduler is still in the beta stage, with ongoing work to improve its functionality.
Outlines

Dieser Bereich ist nur für Premium-Benutzer verfügbar. Bitte führen Sie ein Upgrade durch, um auf diesen Abschnitt zuzugreifen.
Upgrade durchführenMindmap

Dieser Bereich ist nur für Premium-Benutzer verfügbar. Bitte führen Sie ein Upgrade durch, um auf diesen Abschnitt zuzugreifen.
Upgrade durchführenKeywords

Dieser Bereich ist nur für Premium-Benutzer verfügbar. Bitte führen Sie ein Upgrade durch, um auf diesen Abschnitt zuzugreifen.
Upgrade durchführenHighlights

Dieser Bereich ist nur für Premium-Benutzer verfügbar. Bitte führen Sie ein Upgrade durch, um auf diesen Abschnitt zuzugreifen.
Upgrade durchführenTranscripts

Dieser Bereich ist nur für Premium-Benutzer verfügbar. Bitte führen Sie ein Upgrade durch, um auf diesen Abschnitt zuzugreifen.
Upgrade durchführenWeitere ähnliche Videos ansehen

Hadoop Tutorial - The YARN

What is Hadoop Yarn? | Hadoop Yarn Tutorial | Hadoop Yarn Architecture | COSO IT

noc19-cs33 Lec 05-Hadoop MapReduce 1.0

Difference between Multitasking, Multiprogramming and Multiprocessing | Operating system | CSE

FEFO e FIFO (métodos de rotação do ESTOQUE)

Hadoop and it's Components Hdfs, Map Reduce, Yarn | Big Data For Engineering Exams | True Engineer
5.0 / 5 (0 votes)