How I would learn Data Engineering (if I could start over)
Summary
TLDRIn this informative video, Jay, a data engineer and filmmaker, shares a step-by-step guide on relearning data engineering from scratch. He emphasizes the importance of mastering SQL, Python, and command lines, and introduces essential concepts like data storage, orchestration, and advanced data processing techniques. Jay suggests focusing on understanding core concepts and building a portfolio through project work, highlighting the rewarding nature of a career in data engineering.
Takeaways
- π Data is a pivotal goal in the 21st century, with major companies processing vast amounts of information daily.
- πΌ High demand for data engineers with an average salary of $130,000, making it one of the most sought-after skills in the U.S.
- π¨βπ« The speaker, Jay, a data engineer and filmmaker, shares a step-by-step approach to relearning data engineering.
- π€― Data engineering can be overwhelming due to buzzwords and its ambiguous nature compared to traditional tech roles.
- π Essential skills for data engineers include expertise in relational SQL, Python, and workflow scheduling tools like Airflow.
- π SQL's underrated transaction properties make it crucial for databases and data lakes, with advanced topics like Group by and window functions being important for interviews.
- π Python is recommended for learning data concepts due to its open-source nature and availability of third-party libraries.
- π Virtual environments in Python are vital for system reproducibility in data engineering.
- π Command lines are instrumental for data engineers in moving files and facilitating interactions within data pipelines.
- π The speaker suggests focusing on learning concepts and building projects, recommending specific resources for learning Python, SQL, and command lines.
- π‘ Data storage and orchestration are fundamental to data engineering, with object stores and relational databases being key storage types.
- π Batch and stream processing differentiate data engineers from traditional software engineers, with tools like Apache Spark and Kafka being important for handling large and real-time data respectively.
- π οΈ The importance of understanding data concepts over specific tools, as the tools are constantly evolving and the concepts remain relevant.
- π Practice through project work and building a portfolio is emphasized for success in data engineering.
- π A mindset of persistence and finding joy in the learning process is encouraged, as data engineering can be hard work but also rewarding.
Q & A
Why is data engineering an important skill in the 21st century?
-Data engineering is crucial because large companies process vast amounts of data daily, and it's one of the highest demand skills in America with a high average salary, indicating its significance in the job market.
What is the average salary for a data engineer according to the video?
-The video mentions that the average salary for a data engineer is around one hundred and thirty thousand dollars.
What are some common job requirements for a data engineer as seen in the LinkedIn job description example?
-The job description example requires expertise with relational SQL, Python, and workflow scheduling tools like Airflow.
Why is SQL considered underrated and important for data engineers?
-SQL is underrated because it has attractive transaction properties that make it fast and easy to use, and it is a common interface for databases and data lakes.
What programming language is recommended to start learning for data engineering roles?
-Python is recommended due to its simplicity, the availability of third-party libraries, and its use in data concepts rather than general computer science.
How does the video suggest learning about data storage and orchestration?
-The video suggests learning about data storage by understanding object stores and relational databases, and about orchestration through concepts like ETL, data provenance, and using tools like Apache Airflow.
What is the significance of batch and stream processing in data engineering?
-Batch and stream processing differentiate data engineers from traditional software engineers, allowing them to handle large datasets and real-time data feeds efficiently.
What is the video's stance on learning specific big data tools for data engineering?
-The video suggests focusing on understanding concepts rather than specific tools, as tools are constantly evolving, and the concepts learned can be applied to various tools as needed.
What resources are suggested for learning Python, SQL, and command lines?
-The video recommends resources such as O'Reilly textbooks, Code Academy for SQL, learnenough.com tutorials, and Stack Overflow for problem-solving.
What is the recommended mindset for someone starting to learn data engineering according to the video?
-The recommended mindset is to practice through project work, focus on building a portfolio, stick to one learning material that resonates with you, and understand that learning data engineering will be hard work but rewarding.
What is the video's advice on dealing with the overwhelming amount of resources available for learning data engineering?
-The video advises to find one content that you resonate with the most and stick to it, to avoid feeling lost among the vast resources.
Outlines

This section is available to paid users only. Please upgrade to access this part.
Upgrade NowMindmap

This section is available to paid users only. Please upgrade to access this part.
Upgrade NowKeywords

This section is available to paid users only. Please upgrade to access this part.
Upgrade NowHighlights

This section is available to paid users only. Please upgrade to access this part.
Upgrade NowTranscripts

This section is available to paid users only. Please upgrade to access this part.
Upgrade NowBrowse More Related Video
5.0 / 5 (0 votes)