How to query S3 data from Athena using SQL | AWS Athena Hands On Tutorial | Create Athena Tables
Summary
TLDRThis video tutorial demonstrates how to use Amazon Athena to query data stored in an S3 bucket with SQL. The presenter uploads a Netflix TV shows and movies dataset to S3, creates a table using AWS Glue crawler, and then performs sample queries in Athena. The video highlights Athena's capability to analyze data directly in S3 without the need to move it into a traditional database, showcasing its efficiency and ease of use.
Takeaways
- 💻 Amazon Athena allows you to query data stored in S3 using simple SQL, making it an interactive and flexible service.
- 🗂️ The example dataset used in the video is a Netflix TV shows and movies dataset downloaded from Kaggle, containing information like show ID, title, and director.
- 📂 The first step is uploading the dataset to an S3 bucket, where a folder named 'Netflix data' is created for this purpose.
- 🔍 Amazon Glue Crawler is used to automatically scan the S3 file, infer the schema, and create a corresponding table in Athena.
- 🔧 The video demonstrates how to create a Glue Crawler, configure it, and run it to generate the table needed for querying.
- 🛠️ If you don't have the necessary IAM role, Glue can automatically create one with the required permissions for scanning and table creation.
- 🏛️ A new database, named 'Netflix DB', is created in Athena to store the table generated by the Glue Crawler.
- 📊 The video shows how to query the newly created table using SQL, including setting up the query output location in S3.
- 🎬 You can run specific queries in Athena, such as finding movies directed by a particular person or filtering content based on country.
- 📈 Amazon Athena allows you to query data directly from S3 without needing to move it to a traditional database, making data analysis more efficient.
Q & A
- What is Amazon Athena and how does it work?- -Amazon Athena is an interactive query service that allows users to run SQL queries on data stored in Amazon S3. It works by creating a table catalog for the data and then enabling users to query the data using standard SQL syntax. 
- Where is the sample dataset for the video from?- -The sample dataset used in the video is from Kaggle and it contains information about Netflix TV shows and movies. 
- What is the structure of the Netflix dataset?- -The Netflix dataset has a simple schema that includes fields such as show ID, type, title, director, and other general information about TV shows and movies. 
- How do you upload the dataset to an S3 bucket?- -To upload the dataset to an S3 bucket, you select the bucket, create a folder (e.g., 'Netflix data'), select the CSV file, and then click on upload. 
- What is the purpose of creating a table catalog in Athena?- -Creating a table catalog in Athena allows you to define the schema of the data and makes it easier to query the data stored in S3 using SQL. 
- What is a glue crawler in AWS Glue?- -A glue crawler in AWS Glue is a tool that automatically scans data stored in a data store, infers the schema, and creates a metadata catalog table for the data. 
- How does AWS Glue help in creating a table for the data?- -AWS Glue helps by using a glue crawler to scan through the data, enforce the schema, and create a table in the metadata catalog, which can then be queried using Athena. 
- What is an IAM role in AWS and why is it needed for the crawler?- -An IAM role in AWS is a set of permissions that defines what actions a user or service can perform. It is needed for the crawler to grant the necessary permissions to scan the S3 folder, infer the schema, and create the table. 
- How do you run a crawler in AWS Glue?- -To run a crawler in AWS Glue, you create a crawler, specify the source data store, set up the IAM role, define the database to store the results, and then click on 'Run Crawler'. 
- What is the significance of configuring the query output location in Athena?- -Configuring the query output location in Athena specifies where the results of the queries will be stored in S3, making it easier to access and analyze the query results. 
- Can you provide an example of a SQL query that could be run on the Netflix dataset?- -An example SQL query could be 'SELECT * FROM netflixdb.netflix_data WHERE director = 'Vikram';' to find all movies directed by Vikram. 
Outlines

Esta sección está disponible solo para usuarios con suscripción. Por favor, mejora tu plan para acceder a esta parte.
Mejorar ahoraMindmap

Esta sección está disponible solo para usuarios con suscripción. Por favor, mejora tu plan para acceder a esta parte.
Mejorar ahoraKeywords

Esta sección está disponible solo para usuarios con suscripción. Por favor, mejora tu plan para acceder a esta parte.
Mejorar ahoraHighlights

Esta sección está disponible solo para usuarios con suscripción. Por favor, mejora tu plan para acceder a esta parte.
Mejorar ahoraTranscripts

Esta sección está disponible solo para usuarios con suscripción. Por favor, mejora tu plan para acceder a esta parte.
Mejorar ahoraVer Más Videos Relacionados

How To Host S3 Static Website With Custom Route 53 Domain (4 Min) | AWS | Set Alias To S3 Endpoint

Spring Data JPA Native Query Examples

Google Sheets - Dashboard Tutorial - Dynamic QUERY Function String - Part 3

Database Design Tips | Choosing the Best Database in a System Design Interview

SQL for beginners: CREATE TABLE statement

Amazon Redshift Tutorial | Amazon Redshift Architecture | AWS Tutorial For Beginners | Simplilearn
5.0 / 5 (0 votes)