DP-203: 01 - Introduction
Summary
TLDRIn this engaging YouTube series, the host, Spirit, guides viewers on a journey to become an Azure Data Engineer and prepares them for the DP-203 exam. With over 18 years of experience in data engineering and multiple certifications, Spirit promises an in-depth and passionate approach to learning. The course is free, with no hidden costs, and assumes that learners have some hands-on experience and basic knowledge of Azure. The content is structured around the natural lifecycle of data, covering everything from data ingestion to transformation and modeling. Spirit emphasizes the importance of taking notes and provides resources such as GitHub links for diagrams. The series also touches on the challenges faced by data engineers, including data source connectivity, authentication, and transformation requirements. The host's real-life example of automating his wife's book sales data retrieval showcases the practical application of data engineering concepts. The series is designed to be informative, interactive, and enjoyable, with a commitment to answering viewer questions in future episodes.
Takeaways
- 🎓 The course is designed to prepare individuals to become Azure data engineers and to pass the related exams.
- 💼 The instructor has over 18 years of experience in data engineering and holds multiple certifications, ensuring a high-quality learning experience.
- 📈 The course is free of charge, with no hidden costs, making it accessible to a wide audience.
- 📚 Learners are expected to have hands-on experience and to practice the topics covered in the course.
- 🔑 For those without an Azure subscription, a free trial is recommended and a link is provided in the video description.
- 📈 The course aims to go beyond exam requirements, delving deeper into important topics for a comprehensive understanding.
- 📒 It is advised to take notes during the course, using tools like OneNote, Excel, or physical notes to retain information.
- 🖥️ The instructor will provide sketches and diagrams to explain concepts, which will be available on GitHub.
- 📅 New episodes will be released at least twice a month, with an option to subscribe for updates.
- 🤔 The course encourages questions and interaction, with the instructor committing to answering in future episodes.
- 📈 Data engineering involves challenges such as data source identification, authentication, transformation, and analysis, which will be covered in the course.
Q & A
What is the primary goal of the YouTube series presented by Spirit?
-The primary goal of the series is to help viewers become Azure data engineers and prepare them to pass the DP-203 exam.
Why should one choose this course over other available courses?
-This course is special because it is taught by an experienced professional with over 18 years in data engineering, multiple certifications, and positive feedback from previous trainings. Additionally, it is completely free with no hidden costs.
What is the importance of having hands-on experience in Azure for this course?
-Hands-on experience is crucial as the course assumes that learners will practice the discussed topics, which is essential for truly understanding and mastering the material.
What does the instructor recommend for those who do not have an Azure subscription?
-The instructor recommends using a free trial subscription, which should be sufficient for the training purposes of the course.
Why does the instructor suggest taking notes during the course?
-Taking notes is advised because it helps to reinforce learning, especially when dealing with similar-sounding services and features within Azure.
How often does the instructor plan to release new episodes of the series?
-The instructor plans to release new episodes at least twice a month, with the possibility of more frequent uploads.
What is the real-life example used to explain data engineering in the script?
-The example involves automating the process of checking book sales for the instructor's wife, who is a writer. This involves data extraction, transformation, and analysis from various sources including a publisher's website, an Excel file, and the Facebook marketing API.
What is the difference between batch processing and streaming in the context of data solutions?
-Batch processing involves processing data in chunks or batches, often during off-peak hours, while streaming involves the continuous processing of data as it is generated or received in real-time.
Which part of the data lifecycle does a data engineer typically handle?
-A data engineer typically handles everything between data sources and data modeling/serving, which includes data ingestion, transformation, and storage.
What is the recommended approach for keeping track of the different services and features within Azure?
-The instructor recommends taking notes using a tool like OneNote, Excel, Word, a mind map, or even physical notes to keep track of the various services and features.
How can one access the detailed study guide for the DP-203 exam?
-One can access the detailed study guide by searching for 'DP-203' in a browser, which will lead to the Microsoft Learn page containing the study guide.
What is the current inclusion status of Microsoft Fabric in the DP-203 exam?
-As of the time of the script, Microsoft Fabric is not yet included in the DP-203 exam, but it is expected to be added in the future.
Outlines
📚 Introduction to the Azure Data Engineering Course
The speaker, Spirit, introduces the YouTube series designed to aid in becoming an Azure data engineer and preparing for the associated exams. The course stands out due to the speaker's 18 years of experience, multiple certifications, positive feedback from previous training sessions, and the passion for the subject. The course is free, and the audience is expected to have some hands-on experience with Azure. The speaker also mentions the importance of taking notes and provides resources for those unfamiliar with Azure fundamentals. The course aims to delve deeper into topics beyond exam requirements, emphasizing their importance in the field of data engineering.
🔍 Challenges in Automating Sales Data Retrieval
The speaker shares a personal anecdote about automating the process of checking book sales for his writer wife. Challenges included the lack of an API for the publisher's website, requiring a workaround to log in and parse the sales data from a table. The data was aggregated, making it unsuitable for analysis, so the speaker transformed it into daily sales figures. The data also required parsing to separate combined information into distinct columns. The speaker also discusses the need to account for data resets on the publisher's website and incorporating historical sales data from an Excel file. Additionally, the Facebook marketing API was used to gather ad insights to correlate with sales data.
📈 Data Engineering Tasks and Exam Preparation
The speaker outlines the various challenges and tasks a data engineer might face, such as identifying data sources, authenticating, and understanding the type of data and its scope. The transformation of data, joining datasets, and detecting changes are also highlighted. The speaker refers to the Microsoft site for the most accurate information on exam requirements and skills needed to pass the DP-203 exam. The course will cover these topics, but it will be organized in a way that follows the natural lifecycle of data, which may differ from the official exam outline.
📊 Batch vs. Streaming Processing in BI Solutions
The speaker explains the concept of batch processing in business intelligence (BI) solutions, where data is processed in intervals, typically overnight. This allows for reports to be prepared for employees each day. The speaker contrasts this with streaming solutions, where data is processed in real-time as it is generated, such as with IoT devices. The focus is on batch processing, with streaming to be discussed later. The speaker outlines the common steps in data processing for BI solutions, starting from data sources and ending with reports for end-users.
🛠️ Data Engineering Responsibilities in the BI Process
The speaker details the responsibilities of a data engineer in the BI process, which include everything from data ingestion from sources to data modeling and serving. The data engineer's role ends where the data analyst's begins, with the latter focusing on creating reports. The speaker emphasizes the importance of the data lifecycle, from ingestion to transformation, and how the course will follow this lifecycle. The speaker also recommends self-paced learning resources for further exam preparation and concludes with a teaser for the next topic to be covered in the series.
Mindmap
Keywords
💡Azure Data Engineer
💡Data Ingestion
💡Data Transformation
💡Data Modeling
💡Batch Processing
💡Data Sources
💡Data Storage Layer
💡Data Analyst
💡Power BI
💡Microsoft Certification
💡Data Lifecycle
Highlights
The YouTube series aims to guide viewers to become Azure data engineers and prepare for Azure Data Engineer Associate exams.
The presenter has over 18 years of experience in data engineering and holds multiple certifications.
The course is free, with no hidden costs, and promises an engaging learning experience.
Learners are expected to have hands-on experience and can use a free Azure trial subscription for practice.
The course will delve deeper into topics beyond exam requirements to foster a comprehensive understanding of data engineering.
Basic knowledge of Azure fundamentals is assumed; additional resources are provided for beginners.
The presenter advises taking notes and provides a GitHub link for diagrams and sketches that aid in explaining complex topics.
New episodes of the series will be released at least twice a month.
The presenter encourages questions and engagement, promising to address them in future episodes.
Data engineering is exemplified by automating the process of checking book sales for the presenter's writer wife.
Challenges faced include working with aggregated data, parsing information, and handling data resets.
The solution integrates data from various sources, including a website, an Excel file, and the Facebook Marketing API.
The outcome is an automated email notification system and Power BI reports for visual data analysis.
The course will cover the entire data lifecycle, from sourcing to transformation and analysis.
Microsoft's official study guide for the DP-203 exam is recommended for detailed skill requirements.
The course structure will follow the natural lifecycle of data, making it easier for learners to understand.
Microsoft Fabric is not currently included in the exam, but the course will be updated if it is added in the future.
Batch processing is distinguished from streaming, with the course initially focusing on batch solutions.
The responsibilities of a data engineer include everything between data sourcing and data modeling/serving.
The course will utilize real-life scenarios and examples to illustrate data engineering concepts.
Transcripts
hey there this is spirit and in this
YouTube series I will help you become
Azure data engineer as well I will
prepare you to pass deep to free exams
so let's get started
now if I were you I would ask me a
question like this so there is a lot of
other courses out there right so what
makes this one so special that I should
choose it and I'm glad you asked
so first of all I've been in it for over
18 years and most of the time I spent on
a various data engineering related tasks
and trust me I've seen some wild stuff
secondly I hold multiple other
certifications
so simply put I just know my stuff
then
I conducted a lot of trainings a lot of
sessions and I got a very positive
feedback about them which makes me
believe that I really do know how to
conduct them in an interesting way
and I believe that it's much better to
learn from someone who is really
passionate about the topic then from
someone for whom in just a nine to five
job
and finally this course is completely
free you won't have to pay for anything
there are no hidden costs no strings
attached
and one more thing
is a Slough teaching I laughed from the
knowledge and for sure I will have a lot
of fun preparing those videos and
hopefully you will have a lot of fun
watching them
alright so having some said that let's
talk about some general assumptions
about this course and you
as the audience
so
you should be able to pass the exam with
my help but please be aware that you
need some hands-on experience you need
to practice the stuff that I will be
talking about
and if you don't have your own Azure
subscription yet
no problem just use free other trial
subscription right it should be enough
for this Trading
and in the video description there is a
link that will help you create it
then my goal of this course is not only
to help you pass the exam
but rather to help you become other data
engineers
it means that in some areas I will go
deeper into some topics that it is
really necessary from the exam point of
view
but I believe that those topics are
really important that's what uh that's
why I'm doing this
next I assume that you have some basic
knowledge about Azure that you know its
fundamentals
if not then a stopwriter and go and
watch the Mars hacks a great playlist
about fundamentals
and the link is a video description
then
I will cover a lot of other services and
other features during this course and
some of which have quite similar names
like data Lake data Factor data bricks
and those names they might start to mix
in your heads right
and my advice to you guys is take notes
really just take notes and personally I
use OneNote to manage my notes but you
can use whatever tool works for you
whether it's Excel word OneNote mind map
or even some physical note but please
take notes
next I will be drawing a lot sketching
groups diagrams that will help me
explain various topics
and all of these drawings will be
available on my GitHub and link to it
you can find in the video and
description
I will do my best to upload
new episodes of this series at least
twice a month maybe even more often I
will see anyway if you don't want to
miss any of those episodes you know what
to do right
and lastly I love questions
so if you have any just post a comment
to those videos and I will try to answer
them in future episodes
alright so what is data engineering and
I believe that the best way to explain
it is to show a real life example so
let's do this
so
have a wife
who is a writer she writes books as a
hobby right
she wrote four books so far she has a
publisher who has this website on which
my wife can log in and check her sales
and actually it became her everyday
ritual
to log in every morning to check if
there was any new sale
and when I saw this I realized that hey
that's a great opportunity for me to
automate this process and prove my wife
that I really do know something about
those computers and I did it
but there were some challenges down the
road that I would like to solve
so first
let's take a look at the
um
website on which my wife can suck her
sales
so first obviously she has to log in
right
then she has to go to this subsite and
then in the middle
in this table we can see her sales right
easy right
not not really
so first of all this publisher all right
they have this website right
but there is no API exposed that I could
connect to and query to get this data
same instead I had to find a workaround
in which I would just pretend that I'm
logged in as my wife then go to this sub
site and finally parse the content of
this of this table
then
if we take a look
at the data in this table you will
notice that actually it is already
aggravated which is a great thing for
authors
who want to check under sales but it
doesn't really work if you would like to
run some analysis on this data so let me
explain what I mean
so for example
this Row in the second row means that
there was a single copy
of particular book actually this this
book salt through Google play right
okay but it is already aggregated data
so let's say that tomorrow my wife sells
it another 10 copies of this book
then the next day I would see here 11 as
a value right 1 plus 10 gives 11.
so
instead of having those aggregated
values
I wanted to have daily sales right which
would allow me to slice and dice data
later on in reports
fortunately if you think about it it's
quite easy
to convert those aggregated data into
daily say slide right
the only thing you have to do is to take
data from today aggregated values and
some track subtract values from
yesterday and there you go you've got
sales
from a single day
so that was an example of
transformations
that's what required to do on this data
set
then
if we take a look at some columns you
will notice that they don't contain
Atomic values so for example the first
column
it stores two types of information
so the first one
it indicates a sale Channel or a shop
through which sale was made in this case
it was Google play right
the other information is a last sale
date from given source
so actually what I had to do was to
parse this single column into two
separate ones to be able to later on
filter data by specific data source
right
and something similar we have in the
second column
but this time we have three types of
information stored
so the first one
is a book title right
the second one is a book format like
ebook audiobook or a paper one
and finally in case of ebooks we've got
its format whether it's ePub Mobi or PDF
so again this data had to be parsed into
separate columns
the last two columns are quite easy so
this one it tells us how many copies are
given book for given a cell Channel
where salt
and the last one tells us how much money
my wife got from those sales and as you
can see on their numbers
writing books is not a very lucrative
business
then
there was yet another issue with data
with this data set so basically
this whole table
it might get reset from time to time
so this is how it works
so from time to time my wife generates
the invoice to the publisher and then as
she gets paid the money right all of
this stuff
but whenever that happens
all of this data from this table it
disappears right
and new values start to accumulate from
scratch
and for me it means that I have to be
well
of this of this fact
to adjust my calculations
it also means that sometimes your data
source
is not a system not a file not an API
but a human like in this case
and about data sources
there was yet another data source I had
to include in this solution
so my wife had this Excel file that she
used to store and track her historical
sales
and she asked me to include that in
those reports
and as you can guess the format and the
structure of this Excel file was
completely different from uh data on
this website
but it's quite a common scenario
that we have some historical data stored
somewhere that we have to process
there is one more data source that I
could use in this case it's Facebook
marketing API
so basically my wife
created some ads on Facebook to promote
her books right
and Facebook exposes this API
through which we can get a lot of
insights about those ads like number of
views or number of clicks on those ads
and I could grab this data and correlate
it with sales
to see if those ads made any difference
right
now
once I processed all of this data
I was able to detect if anything was
sold right
and if it was
then I set this nice email to my wife
and to me with this clear indication
what was sold and how much money she got
and actually this mail was sent
automatically by my solution so no
longer my wife had to log in and check
it manually it was done automatically
and finally
I prepared some power bi reports that
helped her to analyze her sales in a
visual way and I know I admitted those
reports are ugly I am not a power bi
developer I don't have front-end skills
but they do the work like they work
but anyway
the first question my wife asked when I
presented the solution to it to her was
can I export it to excel
yeah and again that's a very common
scenario that end users would like to
have data exportable to excel
but still my wife was impressed by this
solution
accomplished
anyway
looking at those numbers and this
example that I just provided
you can clearly see that there is a lot
of challenges
um for data Engineers a lot of questions
you have to ask yourself
for example what data sources are there
how to connect to them how to
authenticate
what type of data source it is is it is
it is it API is it a file is it a
database
what data is stored in given data source
what it means actually
do I get
just a subset of data let's say from a
particular
date range or offer or from a whole
timeline can I define that time range
right
then
what Transformations have to be applied
on those on this on those data sources
on these data sets how to join data sets
together how to detect changes between
them right
and a lot more don't worry we'll cover
all of this stuff during this course
now
if we think about the exam
and the skills that are required to pass
it
the best source of information is
Microsoft site
so if we type dp203 into a browser
then the first link we have is a
link from Microsoft learn right so
that's the one that we want to check
and here we can see this dp2 free study
guide that's
our link
and on this side you'll have a very very
detailed information about the skills
that you should have right
so you will have the audience profile
and then detailed information about
requirements and those requirements are
split into different sections like
design and implemented data storage
which has something about a partition
strategy data exploration layer
then we've got data processing
interesting and transforming data
batch processing something about
streaming and so on right so make sure
that you review this list
however what I don't like about this
and this list is that it doesn't really
correspond to a natural life cycle of
our data
and
my course will be organized in a
different way than this table
to make it easier for you to learn
so let me show it
how it looks like so let me just turn on
my drawing machine
and then I will proceed
and one question you might have is
is Microsoft fabric included in the exam
this new shiny vital from Microsoft and
the answer is no not yet right for sure
it will be added in the future
but it hasn't happened yet
and what it happens then I will
just update this course or add some
separate videos about Microsoft fabric
all right so let's jump into whiteboard
and let me draw something I call Basic
bi flow
and this basic bi flow it it is quite
common in many
BI Solutions right and I know that every
ba solution is a different but there are
some
common steps that uh usually we have to
implement
another
when we talk about BI Solutions first
we've got to split them
into two areas right so we've got batch
Solutions
and streaming ones
streaming
and right now I will focus on batch ones
I'll get back to streaming later on so
what are batch Solutions what is batch
processing so basically it means that we
are process processing data
in batches right let's say once a day
usually during the night
and in this nicely processing we just
grab all the data
that was generated during the day we
process it generate the reports so when
employees come to work the next day they
will have data prepared right
and then
depending on our requirements it might
either process it once a day twice or
maybe more often for example if you've
got employees from different time zones
we might process the data in a budget
way twice a day
streaming on the other way
is completely different because here
we've got a constant flow of events
right and we've got to process them as
they are delivered
for example we might have some hard bit
sensors in hospitals or some iot devices
in factories that measure
air temperature right for example but
that's a different type of scenarios and
we'll get back to streaming later on
so the batching
Solutions
and let me draw this basic bi flow that
I was talking about error
so basically
when we start
processing our data when we think about
Data Solutions
we start with some data sources right
and quite often there are just some
files like CSV files text files
Excel files and so on
we might have
some databases
like SQL Server Oracle MySQL and so on
they might be located on premises in
some data center and in the cloud like
in a Google Amazon or azure
we might have some apis
to which we've got to connect like
Facebook API Google ads API and so on
right
so let me call all of this stuff just
data sources
data sources
and this is simply input to our BI
Solutions
then
on the other side of our
solution we've got reports
because that's what our end users would
like to see
so we might have different line charts
that's so I don't know sales Maybe
right
we might have some key performance
indicators that show in a visual way if
everything is fine for example this
green arrow up the sauce
we are fine all right we might have a
good old-fashioned tables that just saw
a lot of numbers like detailed sales
we might have some pie charts that are
just yet another way to display data in
a visual way right
so all of this stuff
are simply reports
right and then we've got users
who would like to view those reports
and users
and now basically
it is our task
to fill the gap between data sources and
reports right
so
if we think about those common steps in
data processing it starts with data
ingestion with extracting the data right
so let me call it
ingest
I'll
we can call it extract
all right we just want to grab the data
from the source and store it somewhere
so if you want to store the data
somewhere it means that we need some
data storage layer
that should be flexible enough
to be able to store
different types of data like files data
from databases from apis and so on
then
as I
presented on this real life example
data that we get from The Source very
often
requires some Transformations all right
so that's it another stage that
usually we have to do
so we've got to transform our data
transform
right
and transformed data has to be saved
again somewhere it might be the same
stretch layer that we used previously
but or it might be something completely
different
it depends
then what we might need to do is to
model our data
and serve it to our reports
right
like this
and now
what parts of this whole process
are data engineer responsibilities
so basically it is like this so
everything
between data sources and modeling and
serving is a responsibility of a data
engineer
data engineer
all right the remaining part which
mainly
consists of reports creation is a task
of a data analyst
so that's a data analyst
and Microsoft has
a separate certification and separate
exam for data analysts right here we are
covering data engineering path
now in real life
it will be quite often
to have some overlap between data
engineer and data analyst right so data
engineer might be involved in creating
reports and data analysts might be
involved in transforming the data
but anyway
in this course we will focus on those
early status like ingesting the data
transforming it and so on
and when it comes to data structure or
to the course structure
will follow
this natural
life cycle of data right so
starting from investing from The Source
through transforming it and so on and so
on right
so it will go
like this course
that's our course
and basically
and that's it one more thing about the
exam
if you would like to get more
information
or more lessons about this this exam
then there is this great way to
learn in your own pace those
self-paced learning Puffs I highly
recommend to at least take a look at
them
especially in areas that you feel that
you need some improvement
alright so basically that's it it's uh
it was a lot of fun creating this on
this video I'm really excited to start
working on it
and hopefully
it will be a good series
so see you next time when we'll talk
about
this cool service that we can use to
store our data that we want to ingest
from data sources so that's it for today
take care and see you soon
Weitere ähnliche Videos ansehen
5.0 / 5 (0 votes)