Hadoop Tutorial - The YARN

Learning Journal
20 Apr 201709:11

Summary

TLDRIn this video tutorial, we explore the role of YARN (Yet Another Resource Negotiator) within the Hadoop ecosystem. YARN manages cluster resources, allowing multiple computation engines like Apache Spark and MapReduce to operate on the same cluster. It acts as a middle layer between HDFS and computation frameworks, providing efficient resource allocation and job scheduling. We also look at how YARN's flexibility enables better scalability and elasticity for big data applications. While developers may not interact directly with YARN, understanding its function is crucial for working with modern Hadoop systems and distributed computing environments.

Takeaways

  • 😀 YARN (Yet Another Resource Negotiator) is a core component of Hadoop 2.x that manages resources and job scheduling across a cluster, replacing the MapReduce engine for these tasks.
  • 😀 YARN enables Hadoop to support multiple distributed computation frameworks (e.g., Apache Spark, Apache Tez), allowing diverse applications to run on the same cluster.
  • 😀 The primary purpose of YARN is to handle cluster resource management (CPU, memory, disk I/O, network bandwidth) and provide APIs for requesting resources, not to support individual application developers.
  • 😀 YARN is not aimed at developers building applications but at teams working on new execution engines that can utilize YARN's resource management features.
  • 😀 Apache Slider is an incubating project aimed at integrating NoSQL databases (e.g., Cassandra) into YARN-managed Hadoop clusters, helping to avoid the need for separate clusters.
  • 😀 YARN's resource management is divided into two main components: Resource Manager (the master service) and Node Manager (the slave service). The Resource Manager handles resource allocation, while Node Managers manage containers on individual nodes.
  • 😀 Containers in YARN are isolated execution environments that define resource limits (e.g., CPU, memory). The first container in a job is called the Application Master, responsible for managing the job's execution.
  • 😀 When an application is submitted, the Resource Manager allocates resources and requests Node Managers to start containers. The Application Master then oversees task execution within these containers.
  • 😀 YARN enables elasticity by allowing clusters to scale up or down on demand, similar to the capabilities offered by cloud providers. This flexibility is key for managing workloads efficiently.
  • 😀 The use of MapReduce is declining in favor of more advanced computation engines (like Apache Spark), signaling a shift away from the traditional MapReduce framework for distributed processing.
  • 😀 While MapReduce is in decline, HDFS (Hadoop Distributed File System) remains a crucial component of Hadoop, providing reliable and scalable storage for big data applications.

Q & A

  • What is Yan in the context of Hadoop?

    -Yan, or Yet Another Resource Negotiator, is a resource management and job scheduling component in the Hadoop ecosystem. It sits between HDFS (storage layer) and execution engines like MapReduce, managing the allocation of resources like CPU, memory, and network bandwidth across a cluster.

  • How did the need for Yan arise in the Hadoop ecosystem?

    -Initially, Hadoop had just two components: HDFS for storage and MapReduce for computation. However, as data processing needs grew, it became clear that MapReduce alone wasn't sufficient. Yan was introduced to manage resources for various computation engines, allowing multiple distributed applications to run on the same Hadoop cluster.

  • What role does Yan play in managing Hadoop clusters?

    -Yan manages the resources and job scheduling for applications running on a Hadoop cluster. It allocates resources such as CPU, memory, and bandwidth, and schedules tasks for different computation frameworks, like MapReduce, Apache Spark, and Apache Tez.

  • What are the main components of Yan in a Hadoop cluster?

    -Yan has two main components: the Resource Manager, which is the master service responsible for resource allocation, and the Node Manager, a slave service that runs on each node in the cluster and manages containers, which are isolated environments for tasks.

  • How does Yan handle resource allocation for a job?

    -When an application is submitted to Yan, the Resource Manager allocates resources by launching containers via Node Managers. The first container launched is the Application Master, which takes over the responsibility of executing and monitoring the job.

  • What is the function of the Application Master in Yan?

    -The Application Master is responsible for managing the execution of an application within a Hadoop cluster. It requests additional resources (containers) from the Resource Manager, launches tasks on those containers, and monitors the progress of those tasks.

  • What are containers in the context of Yan?

    -Containers are Linux control groups that provide isolated environments for running tasks. They are allocated resources like CPU, memory, and I/O by the Resource Manager and are used to execute jobs on Hadoop nodes.

  • How does Yan provide high availability?

    -Yan offers high availability for its Resource Manager. In a production environment, there is usually one active Resource Manager and one standby. If the active Resource Manager fails, the standby takes over, ensuring continuous operation.

  • How is Apache Slider related to Yan?

    -Apache Slider is an incubating project aimed at bringing NoSQL databases, like HBase and Accumulo, under the management of Yan. It allows these databases to run in a Hadoop cluster managed by Yan, eliminating the need for separate clusters for different systems.

  • Why is the use of MapReduce in Hadoop declining?

    -The use of MapReduce is declining because newer, more efficient frameworks like Apache Spark offer better performance and flexibility for distributed computation. As a result, the Hadoop ecosystem is moving towards newer computation engines that leverage Yan for resource management.

Outlines

plate

此内容仅限付费用户访问。 请升级后访问。

立即升级

Mindmap

plate

此内容仅限付费用户访问。 请升级后访问。

立即升级

Keywords

plate

此内容仅限付费用户访问。 请升级后访问。

立即升级

Highlights

plate

此内容仅限付费用户访问。 请升级后访问。

立即升级

Transcripts

plate

此内容仅限付费用户访问。 请升级后访问。

立即升级
Rate This

5.0 / 5 (0 votes)

相关标签
YARNHadoopBig DataResource ManagementApache SparkMapReduceCluster ManagementNoSQLApache SliderData ProcessingCloud Computing
您是否需要英文摘要?