noc19-cs33-Introduction-Big Data Computing

IIT KANPUR-NPTEL

12 Dec 201806:23

Summary

TLDRDr. Rajiv Misra from IIT Patna introduces an 8-week NPTEL MOOC course on Big Data Computing. The course covers core concepts of Big Data technologies such as Hadoop, Apache Spark, and NoSQL databases, focusing on volume, velocity, and variety challenges in modern data management. Students will learn about scalable data storage, real-time streaming, machine learning with Spark MLlib, and graph processing with Spark GraphX. The course includes eight assignments and a final certification exam, making it ideal for engineering and science students seeking cutting-edge knowledge in Big Data computing.

Takeaways

📚 This is an 8-week NPTEL MOOC course on Big Data Computing, offered by Dr. Rajiv Misra from IIT Patna.
🌐 Big Data is rapidly growing in today’s digital world, with data generated by logistics, financial services, healthcare, retail, IoT, and social networks.
⚡ Big Data presents challenges in volume, velocity, and variety, making it hard for traditional systems to manage.
🚀 The leading technology in Big Data analytics is Hadoop, which provides scalable, distributed computing.
📊 This course covers Hadoop ecosystem, MapReduce, Apache Spark, large-scale data storage, and NoSQL databases like Apache Cassandra and HBase.
🤖 Scalable machine learning with Spark MLlib and large-scale graph processing using Spark GraphX are included in the curriculum.
🎥 The course emphasizes Big Data streaming platforms like Apache Spark Streaming and Apache Kafka Streams for real-time data processing.
💾 Students will explore Big Data storage solutions for handling large amounts of unstructured data, such as videos and complex simulations.
🎓 The course is suitable for undergraduate and postgraduate students, engineers, and scientists wanting to learn Big Data technologies.
📈 There will be 8 assignments throughout the course and a final certification exam, with an emphasis on hands-on learning of Big Data concepts.

Q & A

What is the primary focus of the course on Big Data Computing?
-The course focuses on Big Data and cloud computing, covering key technologies such as Hadoop, Apache Spark, and scalable machine learning. It provides an in-depth understanding of Big Data computing concepts.
Why is Big Data considered a challenge for organizations today?
-Big Data is challenging because of its characteristics of volume (large amounts of data), velocity (fast data flow), and variety (different data types such as images, videos, and unstructured data), which traditional technologies cannot handle effectively.
What are some examples of industries generating Big Data?
-Industries such as logistics, financial services, healthcare, and retail, as well as start-ups, are generating Big Data. Additionally, the Internet of Things (IoT) and social networks contribute significantly to Big Data generation.
What role does Hadoop play in Big Data computing?
-Hadoop is an open-source framework used for reliable, scalable, and distributed computing in Big Data analytics. It forms the core of many Big Data technologies.
What are some technologies covered in this course for real-time Big Data streaming?
-The course covers technologies like Apache Spark Streaming and Apache Kafka Streams for real-time Big Data streaming.
What machine learning frameworks are introduced in this course?
-The course introduces scalable machine learning frameworks such as Apache Spark MLlib and explores deep learning as a framework for Big Data analytics.
What is the purpose of using Spark GraphX in the course?
-Spark GraphX is introduced for large-scale graph processing, which is crucial for analyzing and processing complex graph data structures in Big Data applications.
Who is the target audience for this course on Big Data Computing?
-The course is suitable for undergraduate (UG) and postgraduate (PG) students, as well as practicing engineers and scientists interested in learning advanced techniques and applications in Big Data computing.
What topics are covered in the course related to Big Data platforms and storage?
-The course covers Big Data platforms, Hadoop ecosystem distribution, and technologies for large-scale data storage, including Hadoop Distributed File Systems (HDFS).
How is the course structured in terms of assignments and evaluation?
-The course includes 8 assignments spread across its duration, and there will be a final certification exam at the end. The course is designed to be interactive and progressive as students learn the concepts of Big Data computing.