Being A Data Engineer: Expectations vs Reality

Seattle Data Guy
2 Apr 202107:27

Summary

TLDRIn this video, Ben Rogue John, known as the Seattle Data Guy, shares his journey and the realities of being a data engineer. He contrasts his initial expectations—using advanced tools like Hadoop and Spark—with the actual industry practice of relying on older, low-code solutions like SSIS. He discusses the frequent involvement in data migrations, the behind-the-scenes nature of the role, and the unexpectedly high salary potential. This video offers valuable insights for aspiring data engineers about the true landscape of the profession.

Takeaways

  • 😀 The speaker, Ben Rogue John, shares his initial expectations and the reality of being a data engineer.
  • 🔧 He initially thought that data engineers extensively use tools like Hadoop, Spark, and Kafka, but found that many companies still use older technologies and low-code experiences.
  • 📈 He expected to write a lot of MapReduce jobs, but discovered that many data engineering tasks involve SQL interfaces and ETL tools like SSIS.
  • 🔄 The speaker emphasizes the commonality of migrations in data engineering roles, which often involve moving data or systems from one technology to another.
  • 💡 He learned that using complex systems like distributed systems and streaming analytics is not as simple as it seems and requires significant maintenance.
  • 🏢 Companies often migrate their tech stacks to find better solutions or to save costs, which can be a significant part of a data engineer's job.
  • 🤔 Ben initially thought data engineers would get more recognition, but found that the role is often behind the scenes and less celebrated than data scientists.
  • 💼 The speaker points out that data engineering roles are more abundant than data scientist positions, indicating a higher demand for data engineers.
  • 💰 Contrary to his initial salary expectations, Ben found that there are opportunities for data engineers to earn well above his initial projections if they find the right company and role.
  • 📊 He suggests that the salary for data engineers can vary widely depending on the company and location, with some companies willing to pay a premium for skilled data engineers.
  • 📈 Ben encourages viewers to share their own expectations versus reality experiences in the comments, indicating an interest in community insights and shared experiences.

Q & A

  • What were Ben Rogue John's initial expectations when he decided to become a data engineer?

    -Ben Rogue John expected to be using advanced tools like Hadoop, Spark, and Kafka, and to be writing a lot of MapReduce jobs and coding for real-time analytics and complex data systems.

  • What did Ben find out about the use of tools in data engineering roles?

    -He discovered that many companies still use older technologies and some even rely on drag-and-drop tools for ETL and data pipeline development, rather than the complex systems he initially expected.

  • What was Ben's experience with SQL Server Integration Services (SSIS) in his first job?

    -At his first job, Ben used a lot of SSIS, which involved drag-and-drop coding rather than writing code, and was very focused on Microsoft technologies.

  • Why do companies sometimes abstract away complex data systems?

    -Companies abstract away complex data systems to make it easier to find personnel with the necessary skills, as SQL is more commonly known than direct interaction with systems like Hadoop.

  • What was one of the unexpected aspects of data engineering work that Ben learned about?

    -Ben learned that a significant part of a data engineer's work involves migrations, such as code, design, or system migrations, which can be quite common in the industry.

  • How often did Ben find himself working on migrations during his career?

    -Ben found that he worked on migrations at least once every two years, indicating that it's a regular part of the data engineering role.

  • What is the general perception of data engineers compared to data scientists?

    -Data engineers tend to get less fanfare and recognition compared to data scientists, whose work is often more tangible and noticeable to executives and management.

  • What was Ben's initial salary expectation as a data engineer five years into his career?

    -Ben initially expected to make around 100 to 120k in salary about five years into his career as a data engineer.

  • What did Ben find out about salary opportunities for data engineers?

    -Ben found that there are many opportunities where data engineers can earn far more than his initial expectation, especially if they find the right company and role.

  • What advice does Ben give to those looking to become a data engineer based on his experiences?

    -Ben advises that understanding the realities of the role, including the prevalence of migrations and the behind-the-scenes nature of the work, is important for those considering a career in data engineering.

Outlines

00:00

😀 Expectations vs. Reality of Becoming a Data Engineer

In this paragraph, Ben Rogue John, the Seattle Data Guy, introduces the topic of his video: the expectations he had when becoming a data engineer and the reality he encountered. He discusses the misconception that all data engineers work with cutting-edge tools like Hadoop, Spark, and Kafka, and the reality that many companies still use older technologies or even drag-and-drop tools for ETL and data pipeline development. He shares his experience with SQL Server SSIS and the abstraction of complex systems like Hadoop, emphasizing the importance of SQL skills over direct interaction with complex data storage systems. Ben also touches on the maintenance and management challenges associated with distributed systems and streaming analytics, highlighting the need for a deeper understanding of these systems' implications in a data engineer's role.

05:00

😔 The Underappreciated Role of Data Engineers

This paragraph addresses the reality that data engineers often receive less recognition and fanfare compared to data scientists. Despite the crucial role data engineers play in preparing and aggregating data for analysis, their work is often behind the scenes and not as visible to executives and management. Ben points out that data engineering roles are more numerous than data scientist positions but do not receive the same level of attention or prestige. He suggests that the tangible and noticeable outcomes of data scientists' work might be the reason for this disparity in recognition.

💼 The Financial Reality of a Data Engineer's Career

In the final paragraph, Ben discusses the financial expectations and reality of a data engineer's career. He initially expected to earn around $100,000 to $120,000 five years into his career but found that there are many opportunities that can offer much higher salaries, especially if one finds the right company and role. Ben emphasizes that companies are increasingly willing to pay well for skilled data engineers, recognizing their importance in processing, cleaning, and organizing data for future use. He also invites viewers to share their own experiences with job expectations and realities in the comments section and encourages them to like the video to support the channel.

Mindmap

Keywords

💡Data Engineer

A data engineer is a professional who specializes in designing, building, and maintaining systems for efficient storage and retrieval of data. In the video, the term is central to the discussion as the speaker, Ben Rogue John, reflects on his expectations and experiences as a data engineer. He mentions the reality of the role, which often involves more traditional tools and less of the cutting-edge technologies he initially anticipated.

💡Hadoop

Hadoop is an open-source framework used for distributed storage and processing of large data sets. The speaker initially expected to work extensively with tools like Hadoop, but found that many companies still rely on older technologies or even drag-and-drop interfaces for ETL processes. Hadoop represents the complex systems that are sometimes abstracted away in the industry.

💡Spark

Apache Spark is an open-source distributed general-purpose cluster-computing framework known for its in-memory computing capabilities. It is mentioned as one of the 'fancy tools' the speaker thought he would be using, but the reality was that many companies were not as advanced in their technology stack as he had expected.

💡Kafka

Apache Kafka is a distributed streaming platform used for building real-time data pipelines and streaming apps. It is cited as an example of the advanced technologies the speaker assumed would be part of his daily work, but which he found were not as universally utilized as he had thought.

💡Map-Reduce

MapReduce is a programming model and an associated implementation for processing and generating large datasets. The speaker mentions that he expected to write a lot of MapReduce jobs, indicating his initial belief that data engineering would involve more complex coding tasks than what he actually encountered.

💡SSIS

SQL Server Integration Services (SSIS) is a platform for building enterprise-level data integration and data transformations solutions. The speaker found that his first job involved using SSIS, which is a more traditional, drag-and-drop tool, rather than the complex coding tasks he expected.

💡ETL

ETL stands for Extract, Transform, Load, and it refers to a process in database usage for migrating data from one or more sources into a destination. The video discusses how the speaker's work involved a lot of ETL processes, often using more traditional tools rather than the latest technologies.

💡Migration

In the context of the video, migration refers to the process of moving data from one system or structure to another. The speaker was surprised to find that a significant part of his work involved migrations, such as moving from one database system to another, which was not what he initially expected.

💡Data Lake

A data lake is a system or storage repository that holds a vast amount of raw data in its native format until it is needed. The speaker mentions that data lakes were once a trend but are now fading, indicating a shift in how companies store and process large volumes of data.

💡Data Scientist

A data scientist is a professional who uses statistical analysis, machine learning, and artificial intelligence techniques to extract insights and knowledge from data. The video points out that data engineers often work behind the scenes, in contrast to data scientists, whose work is more visible and celebrated.

💡Salary Expectations

The speaker discusses his initial salary expectations as a data engineer and finds that there are opportunities to earn more than he anticipated, especially if one finds the right company and role. This reflects the growing importance and value companies place on skilled data engineers.

Highlights

Ben Rogue John, the Seattle Data Guy, shares his expectations and realities of becoming a data engineer.

Initial assumption was that data engineers primarily use advanced tools like Hadoop, Spark, and Kafka for real-time analytics.

Reality often involves using older technologies and drag-and-drop tools for ETL and data pipelines development.

Many companies abstract complex data systems away, allowing access through SQL interfaces.

Skill level for SQL is more common than for direct interaction with complex data storage systems like Hadoop.

Interviewers look for understanding of the maintenance and challenges associated with distributed systems and streaming analytics.

Data engineers often spend a significant portion of their time on migrations, including code, design, and system migrations.

Migration projects are common as companies seek the best tech stack or a fresh start on old problems.

Data engineers typically receive less recognition compared to data scientists, whose work is more tangible to executives.

Data engineering roles are more behind the scenes, serving as a core layer between applications and data analysts.

Data engineers' work is crucial but often goes unnoticed, as it's the foundation for data analysis and insights.

Contrary to expectations, data engineering salaries can exceed the initial assumption of $100-120k with the right company and role.

The demand for skilled data engineers is high, with companies willing to pay well for the right candidate.

Salary ranges for data engineers can vary widely depending on location and company.

Ben encourages viewers to share their own expectations versus reality experiences in the comments.

The video concludes with a call to action for likes and engagement to help the YouTube algorithm promote the content.

Transcripts

play00:00

hey there guys welcome back to another

play00:01

video with me ben rogue john

play00:03

aka the seattle data guy today i want to

play00:05

talk about some of the expectations i

play00:07

had

play00:07

when i decided to become a data engineer

play00:09

and some of the things that i've

play00:10

realized

play00:11

and the reality of the matter of being a

play00:13

data engineer

play00:14

so we're going to cover a few concepts

play00:16

of the things that i learned versus the

play00:18

things that i thought i would experience

play00:19

and do

play00:20

as far as the work i'd be doing as a

play00:22

data engineer some of the tools i might

play00:24

be using and other kind of lessons i

play00:26

learned along the way

play00:27

so let's kind of go over this we're

play00:28

going to go over a few points to

play00:29

hopefully help some of you who are maybe

play00:31

looking to become a data engineer

play00:32

understand what your role is and what

play00:34

you will be doing all right so my first

play00:36

kind of realization when i became a data

play00:37

engineer

play00:38

was i assumed that everyone was using

play00:40

tools like hadoop spark

play00:43

kafka and every other kind of fancy tool

play00:45

that you've read about you know

play00:46

streaming real-time analytics

play00:48

all these various complex components and

play00:50

i assumed i'd have to be writing a lot

play00:52

of map-reduced jobs

play00:53

and coding a lot what i found is the

play00:56

answer is it kind of really depends but

play00:58

honestly a lot of companies are still

play01:00

utilizing a lot of older technologies

play01:02

and in some cases are using more drag

play01:04

and drop tools

play01:05

to develop a lot of their etls and data

play01:07

pipelines

play01:09

for example at my first job we didn't

play01:11

use any form of really code

play01:12

we used a lot of ssis which is very

play01:14

familiar to anyone who has worked in a

play01:16

microsoft shop

play01:17

where you're doing sql server ssis and a

play01:19

couple other components that are very

play01:21

microsoft

play01:22

focused of course there are some

play01:23

differences if you're on azure

play01:25

then you've got different kind of etl

play01:26

tools but overall that's where i started

play01:29

it wasn't even really code it was a lot

play01:30

of drag and drop code

play01:31

and there are really a lot of other

play01:32

tools that are similar to that where

play01:34

it's kind of a low code

play01:36

experience and even at my following jobs

play01:38

you'd often find that

play01:39

things like hadoop or more complex data

play01:42

systems were often abstracted away

play01:43

and you might just be accessing it

play01:45

through a sql interface

play01:46

honestly this makes a lot of sense

play01:48

because when you think about the skill

play01:49

level

play01:50

of sql versus the skill level of

play01:52

directly interacting with something like

play01:53

hadoop

play01:54

or a more complex data storage system

play01:56

it's much harder to find a person that

play01:58

fits that role

play01:59

and thus using some sort of abstracted

play02:01

system makes a lot of sense because then

play02:03

you don't need to hire a software

play02:04

engineer to essentially do

play02:05

work that's more suited for a data

play02:06

engineer in fact i remember with one of

play02:09

my many interviews where

play02:10

the interviewer asked about things that

play02:11

i wanted to learn over the next few

play02:13

years

play02:13

i talked about wanting to learn about

play02:15

distributed systems and streaming

play02:16

analytics and that's one reason i was

play02:18

excited

play02:18

to work for this company as i assumed

play02:21

they were using these systems very

play02:22

heavily

play02:23

the interviewer actually got a pretty

play02:25

serious look on their face when i

play02:26

responded with this and then kind of

play02:28

turned the question around and asked me

play02:30

well are there any negatives to these

play02:32

systems i think what they were looking

play02:33

for

play02:34

in this question was trying to see and

play02:37

understand if i understood

play02:38

that using these systems is not as easy

play02:40

as just deciding to use these systems

play02:42

there's a lot of maintenance and time

play02:44

required to manage and upkeep these

play02:46

systems

play02:47

and so just using these systems because

play02:49

you think they're interesting or because

play02:50

you think they're new

play02:51

or wherever you feel as far as these

play02:54

tools go

play02:56

doesn't warrant actually using these

play02:57

tools

play02:59

so i think that was something i learned

play03:00

very quickly as far as realities versus

play03:02

expectations

play03:03

and so for my next expectation i kind of

play03:05

assumed a lot of what data engineers do

play03:07

is build new tables and build new data

play03:09

pipelines now this is kind of half right

play03:12

you will do a lot of that

play03:14

but the other thing no one told me about

play03:17

a lot of your work as a data engineer

play03:19

often involves some sort of migration

play03:21

either you're doing a code migration

play03:23

you're doing a design migration you're

play03:25

doing a system migration

play03:26

and i swear i've done one of these at

play03:29

least every two years where it's a six

play03:31

month project

play03:32

so that means you know i've spent at

play03:34

least a year or

play03:36

a year and a half in the last four years

play03:38

doing some form of migration

play03:40

it's oddly quite common that you'll find

play03:42

yourself doing migrations as companies

play03:44

try to find the best tech stack that

play03:46

works for them

play03:47

or possibly they're just trying to get a

play03:49

fresh start on an old problem

play03:51

so you'll do a lot of data migrations of

play03:54

one form of another

play03:55

maybe somebody's switching from oracle

play03:56

to bigquery for their data warehouse or

play03:58

some other similar switch

play04:00

there are really tons of different ways

play04:01

migration can happen again you've got

play04:03

your etl

play04:04

pipelines that you might want to migrate

play04:05

you've got your data warehouse

play04:06

technology that you might want to

play04:07

migrate you've got the design of the

play04:09

data warehouse you might want to migrate

play04:11

or move into often what they call an edw

play04:13

now data lakes have kind of moved away

play04:15

from a fad and i think they're kind of

play04:16

fading as the concept of data lakes are

play04:18

coming out

play04:19

but there was migrations to like data

play04:21

lakes for a while there there are just

play04:23

so many forms of migrations that can

play04:25

happen because as companies learn about

play04:26

new technologies

play04:27

they seem to want to jump on it right

play04:28

away this isn't to say there isn't value

play04:31

in a migration they can often save money

play04:33

for a company

play04:33

as they switch from very expensive

play04:34

database providers to possibly an open

play04:36

source system

play04:37

it might be worth it there are also

play04:39

possibly performance boosts that they're

play04:40

looking for

play04:42

or just easier to access layers of data

play04:44

again there are tons of reasons

play04:46

companies try to do migrations

play04:47

but this is honestly where you might

play04:49

find yourself spending 30

play04:50

of your work depending what kind of

play04:51

company you join if you're joining a

play04:53

startup you're likely building tons of

play04:54

new pipelines

play04:55

if you're joining a company that's been

play04:56

around for a while there's a likelihood

play04:58

you will do at least one migration

play05:00

in a two-year time span okay so for my

play05:02

next expectation

play05:03

i don't think it was so much an

play05:04

expectation as it was just a reality

play05:07

check

play05:07

when it comes down to it data engineers

play05:09

get very little fanfare

play05:11

and at the end of the day data

play05:12

engineering roles

play05:14

far outnumber those of data scientists

play05:16

when you look at something like

play05:17

indeed or other sites because companies

play05:20

are still

play05:20

trying to wrangle their data despite

play05:23

this

play05:24

there is no real fanfare for data

play05:26

engineers you rarely see articles

play05:28

talking about the sexiest job of the

play05:29

21st century being data engineering

play05:31

or anything similar to that so you'll

play05:33

often find instead that a lot of the

play05:35

fanfare still kind of goes into the data

play05:36

scientist area

play05:37

because their work is a little more

play05:39

tangible and noticeable

play05:41

by executives and by management they're

play05:42

often the ones actually creating the

play05:44

analysis

play05:45

off of the data that you've built so at

play05:47

the end of the day regardless of what

play05:48

you do

play05:49

it's kind of behind the scenes our work

play05:51

is kind of more this core layer

play05:53

that no one really sees and we're just

play05:55

this middle man between

play05:57

applications and the data scientists and

play05:59

analysts who are going to try to

play06:00

actually make insights

play06:02

off the data that we're pulling in and

play06:03

aggregating so if you're expecting to

play06:05

have a lot of people talk about your

play06:06

role

play06:07

or talk about the work that you've done

play06:08

it's going to be a little more

play06:09

challenging just because you tend to be

play06:11

behind the scenes and it's just the

play06:13

nature of the role

play06:14

now for my final expectation i honestly

play06:16

assumed when i became a data engineer

play06:18

i'd probably make somewhere in the range

play06:20

of 100 or 120k

play06:22

about five years in to my career and

play06:25

oddly enough i found that there are a

play06:26

lot of opportunities

play06:27

that far exceed that range as long as

play06:30

you find the right company

play06:31

and the right role and i think this is

play06:33

becoming more and more prevalent as

play06:34

companies are finding that

play06:35

hiring solid data engineers is a very

play06:38

hard role to fill

play06:39

and i'm going to throw up a few examples

play06:40

of salaries that different companies are

play06:42

kind of putting from different

play06:44

sites whether it be glassdoor and pay

play06:45

scale etc so you kind of get a good idea

play06:48

as far as the range of salaries again

play06:50

it's going to range pretty widely

play06:52

depending where you live and depending

play06:53

what company you work for

play06:55

but there are a lot of companies willing

play06:57

to pay a data engineer pretty well

play06:59

because they know in order for data to

play07:01

often be useful it needs to be processed

play07:03

cleaned and put into some sort of system

play07:05

that can be utilized in the future and

play07:07

with that i really appreciate all of you

play07:09

guys

play07:10

if you enjoyed this video please take a

play07:11

moment to smash that like button it

play07:13

helps me understand what videos are

play07:14

great and it also probably helps the

play07:16

youtube algorithm

play07:17

understands what we use to show other

play07:18

people let me know if you have any

play07:20

expectations versus reality

play07:22

for your job in the comments below and

play07:24

other than that i will see you next time

play07:25

thank you and goodbye

Rate This

5.0 / 5 (0 votes)

الوسوم ذات الصلة
Data EngineeringCareer InsightsTools OverviewETL ProcessesHadoopSparkKafkaSQL ServerMigrationsSalary RangeTech Trends
هل تحتاج إلى تلخيص باللغة الإنجليزية؟