Being A Data Engineer: Expectations vs Reality
Summary
TLDRIn this video, Ben Rogue John, known as the Seattle Data Guy, shares his journey and the realities of being a data engineer. He contrasts his initial expectations—using advanced tools like Hadoop and Spark—with the actual industry practice of relying on older, low-code solutions like SSIS. He discusses the frequent involvement in data migrations, the behind-the-scenes nature of the role, and the unexpectedly high salary potential. This video offers valuable insights for aspiring data engineers about the true landscape of the profession.
Takeaways
- 😀 The speaker, Ben Rogue John, shares his initial expectations and the reality of being a data engineer.
- 🔧 He initially thought that data engineers extensively use tools like Hadoop, Spark, and Kafka, but found that many companies still use older technologies and low-code experiences.
- 📈 He expected to write a lot of MapReduce jobs, but discovered that many data engineering tasks involve SQL interfaces and ETL tools like SSIS.
- 🔄 The speaker emphasizes the commonality of migrations in data engineering roles, which often involve moving data or systems from one technology to another.
- 💡 He learned that using complex systems like distributed systems and streaming analytics is not as simple as it seems and requires significant maintenance.
- 🏢 Companies often migrate their tech stacks to find better solutions or to save costs, which can be a significant part of a data engineer's job.
- 🤔 Ben initially thought data engineers would get more recognition, but found that the role is often behind the scenes and less celebrated than data scientists.
- 💼 The speaker points out that data engineering roles are more abundant than data scientist positions, indicating a higher demand for data engineers.
- 💰 Contrary to his initial salary expectations, Ben found that there are opportunities for data engineers to earn well above his initial projections if they find the right company and role.
- 📊 He suggests that the salary for data engineers can vary widely depending on the company and location, with some companies willing to pay a premium for skilled data engineers.
- 📈 Ben encourages viewers to share their own expectations versus reality experiences in the comments, indicating an interest in community insights and shared experiences.
Q & A
What were Ben Rogue John's initial expectations when he decided to become a data engineer?
-Ben Rogue John expected to be using advanced tools like Hadoop, Spark, and Kafka, and to be writing a lot of MapReduce jobs and coding for real-time analytics and complex data systems.
What did Ben find out about the use of tools in data engineering roles?
-He discovered that many companies still use older technologies and some even rely on drag-and-drop tools for ETL and data pipeline development, rather than the complex systems he initially expected.
What was Ben's experience with SQL Server Integration Services (SSIS) in his first job?
-At his first job, Ben used a lot of SSIS, which involved drag-and-drop coding rather than writing code, and was very focused on Microsoft technologies.
Why do companies sometimes abstract away complex data systems?
-Companies abstract away complex data systems to make it easier to find personnel with the necessary skills, as SQL is more commonly known than direct interaction with systems like Hadoop.
What was one of the unexpected aspects of data engineering work that Ben learned about?
-Ben learned that a significant part of a data engineer's work involves migrations, such as code, design, or system migrations, which can be quite common in the industry.
How often did Ben find himself working on migrations during his career?
-Ben found that he worked on migrations at least once every two years, indicating that it's a regular part of the data engineering role.
What is the general perception of data engineers compared to data scientists?
-Data engineers tend to get less fanfare and recognition compared to data scientists, whose work is often more tangible and noticeable to executives and management.
What was Ben's initial salary expectation as a data engineer five years into his career?
-Ben initially expected to make around 100 to 120k in salary about five years into his career as a data engineer.
What did Ben find out about salary opportunities for data engineers?
-Ben found that there are many opportunities where data engineers can earn far more than his initial expectation, especially if they find the right company and role.
What advice does Ben give to those looking to become a data engineer based on his experiences?
-Ben advises that understanding the realities of the role, including the prevalence of migrations and the behind-the-scenes nature of the work, is important for those considering a career in data engineering.
Outlines
😀 Expectations vs. Reality of Becoming a Data Engineer
In this paragraph, Ben Rogue John, the Seattle Data Guy, introduces the topic of his video: the expectations he had when becoming a data engineer and the reality he encountered. He discusses the misconception that all data engineers work with cutting-edge tools like Hadoop, Spark, and Kafka, and the reality that many companies still use older technologies or even drag-and-drop tools for ETL and data pipeline development. He shares his experience with SQL Server SSIS and the abstraction of complex systems like Hadoop, emphasizing the importance of SQL skills over direct interaction with complex data storage systems. Ben also touches on the maintenance and management challenges associated with distributed systems and streaming analytics, highlighting the need for a deeper understanding of these systems' implications in a data engineer's role.
😔 The Underappreciated Role of Data Engineers
This paragraph addresses the reality that data engineers often receive less recognition and fanfare compared to data scientists. Despite the crucial role data engineers play in preparing and aggregating data for analysis, their work is often behind the scenes and not as visible to executives and management. Ben points out that data engineering roles are more numerous than data scientist positions but do not receive the same level of attention or prestige. He suggests that the tangible and noticeable outcomes of data scientists' work might be the reason for this disparity in recognition.
💼 The Financial Reality of a Data Engineer's Career
In the final paragraph, Ben discusses the financial expectations and reality of a data engineer's career. He initially expected to earn around $100,000 to $120,000 five years into his career but found that there are many opportunities that can offer much higher salaries, especially if one finds the right company and role. Ben emphasizes that companies are increasingly willing to pay well for skilled data engineers, recognizing their importance in processing, cleaning, and organizing data for future use. He also invites viewers to share their own experiences with job expectations and realities in the comments section and encourages them to like the video to support the channel.
Mindmap
Keywords
💡Data Engineer
💡Hadoop
💡Spark
💡Kafka
💡Map-Reduce
💡SSIS
💡ETL
💡Migration
💡Data Lake
💡Data Scientist
💡Salary Expectations
Highlights
Ben Rogue John, the Seattle Data Guy, shares his expectations and realities of becoming a data engineer.
Initial assumption was that data engineers primarily use advanced tools like Hadoop, Spark, and Kafka for real-time analytics.
Reality often involves using older technologies and drag-and-drop tools for ETL and data pipelines development.
Many companies abstract complex data systems away, allowing access through SQL interfaces.
Skill level for SQL is more common than for direct interaction with complex data storage systems like Hadoop.
Interviewers look for understanding of the maintenance and challenges associated with distributed systems and streaming analytics.
Data engineers often spend a significant portion of their time on migrations, including code, design, and system migrations.
Migration projects are common as companies seek the best tech stack or a fresh start on old problems.
Data engineers typically receive less recognition compared to data scientists, whose work is more tangible to executives.
Data engineering roles are more behind the scenes, serving as a core layer between applications and data analysts.
Data engineers' work is crucial but often goes unnoticed, as it's the foundation for data analysis and insights.
Contrary to expectations, data engineering salaries can exceed the initial assumption of $100-120k with the right company and role.
The demand for skilled data engineers is high, with companies willing to pay well for the right candidate.
Salary ranges for data engineers can vary widely depending on location and company.
Ben encourages viewers to share their own expectations versus reality experiences in the comments.
The video concludes with a call to action for likes and engagement to help the YouTube algorithm promote the content.
Transcripts
hey there guys welcome back to another
video with me ben rogue john
aka the seattle data guy today i want to
talk about some of the expectations i
had
when i decided to become a data engineer
and some of the things that i've
realized
and the reality of the matter of being a
data engineer
so we're going to cover a few concepts
of the things that i learned versus the
things that i thought i would experience
and do
as far as the work i'd be doing as a
data engineer some of the tools i might
be using and other kind of lessons i
learned along the way
so let's kind of go over this we're
going to go over a few points to
hopefully help some of you who are maybe
looking to become a data engineer
understand what your role is and what
you will be doing all right so my first
kind of realization when i became a data
engineer
was i assumed that everyone was using
tools like hadoop spark
kafka and every other kind of fancy tool
that you've read about you know
streaming real-time analytics
all these various complex components and
i assumed i'd have to be writing a lot
of map-reduced jobs
and coding a lot what i found is the
answer is it kind of really depends but
honestly a lot of companies are still
utilizing a lot of older technologies
and in some cases are using more drag
and drop tools
to develop a lot of their etls and data
pipelines
for example at my first job we didn't
use any form of really code
we used a lot of ssis which is very
familiar to anyone who has worked in a
microsoft shop
where you're doing sql server ssis and a
couple other components that are very
microsoft
focused of course there are some
differences if you're on azure
then you've got different kind of etl
tools but overall that's where i started
it wasn't even really code it was a lot
of drag and drop code
and there are really a lot of other
tools that are similar to that where
it's kind of a low code
experience and even at my following jobs
you'd often find that
things like hadoop or more complex data
systems were often abstracted away
and you might just be accessing it
through a sql interface
honestly this makes a lot of sense
because when you think about the skill
level
of sql versus the skill level of
directly interacting with something like
hadoop
or a more complex data storage system
it's much harder to find a person that
fits that role
and thus using some sort of abstracted
system makes a lot of sense because then
you don't need to hire a software
engineer to essentially do
work that's more suited for a data
engineer in fact i remember with one of
my many interviews where
the interviewer asked about things that
i wanted to learn over the next few
years
i talked about wanting to learn about
distributed systems and streaming
analytics and that's one reason i was
excited
to work for this company as i assumed
they were using these systems very
heavily
the interviewer actually got a pretty
serious look on their face when i
responded with this and then kind of
turned the question around and asked me
well are there any negatives to these
systems i think what they were looking
for
in this question was trying to see and
understand if i understood
that using these systems is not as easy
as just deciding to use these systems
there's a lot of maintenance and time
required to manage and upkeep these
systems
and so just using these systems because
you think they're interesting or because
you think they're new
or wherever you feel as far as these
tools go
doesn't warrant actually using these
tools
so i think that was something i learned
very quickly as far as realities versus
expectations
and so for my next expectation i kind of
assumed a lot of what data engineers do
is build new tables and build new data
pipelines now this is kind of half right
you will do a lot of that
but the other thing no one told me about
a lot of your work as a data engineer
often involves some sort of migration
either you're doing a code migration
you're doing a design migration you're
doing a system migration
and i swear i've done one of these at
least every two years where it's a six
month project
so that means you know i've spent at
least a year or
a year and a half in the last four years
doing some form of migration
it's oddly quite common that you'll find
yourself doing migrations as companies
try to find the best tech stack that
works for them
or possibly they're just trying to get a
fresh start on an old problem
so you'll do a lot of data migrations of
one form of another
maybe somebody's switching from oracle
to bigquery for their data warehouse or
some other similar switch
there are really tons of different ways
migration can happen again you've got
your etl
pipelines that you might want to migrate
you've got your data warehouse
technology that you might want to
migrate you've got the design of the
data warehouse you might want to migrate
or move into often what they call an edw
now data lakes have kind of moved away
from a fad and i think they're kind of
fading as the concept of data lakes are
coming out
but there was migrations to like data
lakes for a while there there are just
so many forms of migrations that can
happen because as companies learn about
new technologies
they seem to want to jump on it right
away this isn't to say there isn't value
in a migration they can often save money
for a company
as they switch from very expensive
database providers to possibly an open
source system
it might be worth it there are also
possibly performance boosts that they're
looking for
or just easier to access layers of data
again there are tons of reasons
companies try to do migrations
but this is honestly where you might
find yourself spending 30
of your work depending what kind of
company you join if you're joining a
startup you're likely building tons of
new pipelines
if you're joining a company that's been
around for a while there's a likelihood
you will do at least one migration
in a two-year time span okay so for my
next expectation
i don't think it was so much an
expectation as it was just a reality
check
when it comes down to it data engineers
get very little fanfare
and at the end of the day data
engineering roles
far outnumber those of data scientists
when you look at something like
indeed or other sites because companies
are still
trying to wrangle their data despite
this
there is no real fanfare for data
engineers you rarely see articles
talking about the sexiest job of the
21st century being data engineering
or anything similar to that so you'll
often find instead that a lot of the
fanfare still kind of goes into the data
scientist area
because their work is a little more
tangible and noticeable
by executives and by management they're
often the ones actually creating the
analysis
off of the data that you've built so at
the end of the day regardless of what
you do
it's kind of behind the scenes our work
is kind of more this core layer
that no one really sees and we're just
this middle man between
applications and the data scientists and
analysts who are going to try to
actually make insights
off the data that we're pulling in and
aggregating so if you're expecting to
have a lot of people talk about your
role
or talk about the work that you've done
it's going to be a little more
challenging just because you tend to be
behind the scenes and it's just the
nature of the role
now for my final expectation i honestly
assumed when i became a data engineer
i'd probably make somewhere in the range
of 100 or 120k
about five years in to my career and
oddly enough i found that there are a
lot of opportunities
that far exceed that range as long as
you find the right company
and the right role and i think this is
becoming more and more prevalent as
companies are finding that
hiring solid data engineers is a very
hard role to fill
and i'm going to throw up a few examples
of salaries that different companies are
kind of putting from different
sites whether it be glassdoor and pay
scale etc so you kind of get a good idea
as far as the range of salaries again
it's going to range pretty widely
depending where you live and depending
what company you work for
but there are a lot of companies willing
to pay a data engineer pretty well
because they know in order for data to
often be useful it needs to be processed
cleaned and put into some sort of system
that can be utilized in the future and
with that i really appreciate all of you
guys
if you enjoyed this video please take a
moment to smash that like button it
helps me understand what videos are
great and it also probably helps the
youtube algorithm
understands what we use to show other
people let me know if you have any
expectations versus reality
for your job in the comments below and
other than that i will see you next time
thank you and goodbye
Ver Más Videos Relacionados
The Harsh Reality of Being a Data Engineer
What Is A Data Pipeline - Data Engineering 101 (FT. Alexey from @DataTalksClub )
The Ultimate Big Data Engineering Roadmap: A Guide to Master Data Engineering in 2024
AI Engineers- What Do They Do?
8 things I learned from a dozen technical interviews
How to get hired as Frontend Developer in 2024
5.0 / 5 (0 votes)