DP-203: 01 - Introduction

Tybul on Azure
16 Aug 202326:45

Summary

TLDRIn this engaging YouTube series, the host, Spirit, guides viewers on a journey to become an Azure Data Engineer and prepares them for the DP-203 exam. With over 18 years of experience in data engineering and multiple certifications, Spirit promises an in-depth and passionate approach to learning. The course is free, with no hidden costs, and assumes that learners have some hands-on experience and basic knowledge of Azure. The content is structured around the natural lifecycle of data, covering everything from data ingestion to transformation and modeling. Spirit emphasizes the importance of taking notes and provides resources such as GitHub links for diagrams. The series also touches on the challenges faced by data engineers, including data source connectivity, authentication, and transformation requirements. The host's real-life example of automating his wife's book sales data retrieval showcases the practical application of data engineering concepts. The series is designed to be informative, interactive, and enjoyable, with a commitment to answering viewer questions in future episodes.

Takeaways

  • 🎓 The course is designed to prepare individuals to become Azure data engineers and to pass the related exams.
  • 💼 The instructor has over 18 years of experience in data engineering and holds multiple certifications, ensuring a high-quality learning experience.
  • 📈 The course is free of charge, with no hidden costs, making it accessible to a wide audience.
  • 📚 Learners are expected to have hands-on experience and to practice the topics covered in the course.
  • 🔑 For those without an Azure subscription, a free trial is recommended and a link is provided in the video description.
  • 📈 The course aims to go beyond exam requirements, delving deeper into important topics for a comprehensive understanding.
  • 📒 It is advised to take notes during the course, using tools like OneNote, Excel, or physical notes to retain information.
  • 🖥️ The instructor will provide sketches and diagrams to explain concepts, which will be available on GitHub.
  • 📅 New episodes will be released at least twice a month, with an option to subscribe for updates.
  • 🤔 The course encourages questions and interaction, with the instructor committing to answering in future episodes.
  • 📈 Data engineering involves challenges such as data source identification, authentication, transformation, and analysis, which will be covered in the course.

Q & A

  • What is the primary goal of the YouTube series presented by Spirit?

    -The primary goal of the series is to help viewers become Azure data engineers and prepare them to pass the DP-203 exam.

  • Why should one choose this course over other available courses?

    -This course is special because it is taught by an experienced professional with over 18 years in data engineering, multiple certifications, and positive feedback from previous trainings. Additionally, it is completely free with no hidden costs.

  • What is the importance of having hands-on experience in Azure for this course?

    -Hands-on experience is crucial as the course assumes that learners will practice the discussed topics, which is essential for truly understanding and mastering the material.

  • What does the instructor recommend for those who do not have an Azure subscription?

    -The instructor recommends using a free trial subscription, which should be sufficient for the training purposes of the course.

  • Why does the instructor suggest taking notes during the course?

    -Taking notes is advised because it helps to reinforce learning, especially when dealing with similar-sounding services and features within Azure.

  • How often does the instructor plan to release new episodes of the series?

    -The instructor plans to release new episodes at least twice a month, with the possibility of more frequent uploads.

  • What is the real-life example used to explain data engineering in the script?

    -The example involves automating the process of checking book sales for the instructor's wife, who is a writer. This involves data extraction, transformation, and analysis from various sources including a publisher's website, an Excel file, and the Facebook marketing API.

  • What is the difference between batch processing and streaming in the context of data solutions?

    -Batch processing involves processing data in chunks or batches, often during off-peak hours, while streaming involves the continuous processing of data as it is generated or received in real-time.

  • Which part of the data lifecycle does a data engineer typically handle?

    -A data engineer typically handles everything between data sources and data modeling/serving, which includes data ingestion, transformation, and storage.

  • What is the recommended approach for keeping track of the different services and features within Azure?

    -The instructor recommends taking notes using a tool like OneNote, Excel, Word, a mind map, or even physical notes to keep track of the various services and features.

  • How can one access the detailed study guide for the DP-203 exam?

    -One can access the detailed study guide by searching for 'DP-203' in a browser, which will lead to the Microsoft Learn page containing the study guide.

  • What is the current inclusion status of Microsoft Fabric in the DP-203 exam?

    -As of the time of the script, Microsoft Fabric is not yet included in the DP-203 exam, but it is expected to be added in the future.

Outlines

00:00

📚 Introduction to the Azure Data Engineering Course

The speaker, Spirit, introduces the YouTube series designed to aid in becoming an Azure data engineer and preparing for the associated exams. The course stands out due to the speaker's 18 years of experience, multiple certifications, positive feedback from previous training sessions, and the passion for the subject. The course is free, and the audience is expected to have some hands-on experience with Azure. The speaker also mentions the importance of taking notes and provides resources for those unfamiliar with Azure fundamentals. The course aims to delve deeper into topics beyond exam requirements, emphasizing their importance in the field of data engineering.

05:02

🔍 Challenges in Automating Sales Data Retrieval

The speaker shares a personal anecdote about automating the process of checking book sales for his writer wife. Challenges included the lack of an API for the publisher's website, requiring a workaround to log in and parse the sales data from a table. The data was aggregated, making it unsuitable for analysis, so the speaker transformed it into daily sales figures. The data also required parsing to separate combined information into distinct columns. The speaker also discusses the need to account for data resets on the publisher's website and incorporating historical sales data from an Excel file. Additionally, the Facebook marketing API was used to gather ad insights to correlate with sales data.

10:04

📈 Data Engineering Tasks and Exam Preparation

The speaker outlines the various challenges and tasks a data engineer might face, such as identifying data sources, authenticating, and understanding the type of data and its scope. The transformation of data, joining datasets, and detecting changes are also highlighted. The speaker refers to the Microsoft site for the most accurate information on exam requirements and skills needed to pass the DP-203 exam. The course will cover these topics, but it will be organized in a way that follows the natural lifecycle of data, which may differ from the official exam outline.

15:04

📊 Batch vs. Streaming Processing in BI Solutions

The speaker explains the concept of batch processing in business intelligence (BI) solutions, where data is processed in intervals, typically overnight. This allows for reports to be prepared for employees each day. The speaker contrasts this with streaming solutions, where data is processed in real-time as it is generated, such as with IoT devices. The focus is on batch processing, with streaming to be discussed later. The speaker outlines the common steps in data processing for BI solutions, starting from data sources and ending with reports for end-users.

20:08

🛠️ Data Engineering Responsibilities in the BI Process

The speaker details the responsibilities of a data engineer in the BI process, which include everything from data ingestion from sources to data modeling and serving. The data engineer's role ends where the data analyst's begins, with the latter focusing on creating reports. The speaker emphasizes the importance of the data lifecycle, from ingestion to transformation, and how the course will follow this lifecycle. The speaker also recommends self-paced learning resources for further exam preparation and concludes with a teaser for the next topic to be covered in the series.

Mindmap

Keywords

💡Azure Data Engineer

An Azure Data Engineer is a professional who specializes in designing and building solutions for processing and managing data within the Microsoft Azure cloud platform. In the video, the speaker aims to help viewers become proficient in this role and prepare for the related certification exams.

💡Data Ingestion

Data ingestion is the process of collecting and importing data from various sources into a system for further processing or analysis. It is a fundamental step in data engineering, as illustrated by the speaker's real-life example of extracting sales data from a website for further analysis.

💡Data Transformation

Data transformation involves converting or modifying raw data into a more useful format for analysis or processing. The script mentions this concept when discussing how aggregated sales data needs to be converted into daily sales figures for more detailed reporting.

💡Data Modeling

Data modeling is the process of creating a representation of data structures to facilitate the efficient organization and use of data. It is a key task for data engineers, as it enables the creation of reports and analysis tools that make sense of the data.

💡Batch Processing

Batch processing refers to the handling of data in groups or 'batches', often at specific intervals such as nightly. This is contrasted with streaming processing, which deals with data in real-time as it is generated. The video script discusses batch processing in the context of preparing reports for employees.

💡Data Sources

Data sources are the origins of data, which can include files, databases, APIs, and more. The speaker emphasizes the importance of understanding various data sources and how to connect and authenticate with them, as part of the data engineering process.

💡Data Storage Layer

A data storage layer is a system or service that is capable of storing different types of data. It is a critical component in data engineering, as it provides the infrastructure needed to hold the data before and after it is transformed and processed.

💡Data Analyst

A data analyst is a professional who works with data to extract insights and support decision-making. While distinct from a data engineer, there can be overlap in their roles. In the video, the speaker differentiates between the responsibilities of a data engineer and a data analyst.

💡Power BI

Power BI is a business analytics service by Microsoft that provides interactive visualizations and business intelligence capabilities. The speaker mentions creating Power BI reports to help analyze sales data visually, demonstrating its use in data engineering projects.

💡Microsoft Certification

Microsoft Certification is a professional certification program offered by Microsoft to validate skills and expertise in various Microsoft technologies. The video's theme revolves around preparing for such a certification, specifically for Azure data engineering.

💡Data Lifecycle

The data lifecycle refers to the stages a piece of data goes through from creation to retirement. The speaker organizes the course content around this concept, covering how data engineers manage data from ingestion through to reporting and analysis.

Highlights

The YouTube series aims to guide viewers to become Azure data engineers and prepare for Azure Data Engineer Associate exams.

The presenter has over 18 years of experience in data engineering and holds multiple certifications.

The course is free, with no hidden costs, and promises an engaging learning experience.

Learners are expected to have hands-on experience and can use a free Azure trial subscription for practice.

The course will delve deeper into topics beyond exam requirements to foster a comprehensive understanding of data engineering.

Basic knowledge of Azure fundamentals is assumed; additional resources are provided for beginners.

The presenter advises taking notes and provides a GitHub link for diagrams and sketches that aid in explaining complex topics.

New episodes of the series will be released at least twice a month.

The presenter encourages questions and engagement, promising to address them in future episodes.

Data engineering is exemplified by automating the process of checking book sales for the presenter's writer wife.

Challenges faced include working with aggregated data, parsing information, and handling data resets.

The solution integrates data from various sources, including a website, an Excel file, and the Facebook Marketing API.

The outcome is an automated email notification system and Power BI reports for visual data analysis.

The course will cover the entire data lifecycle, from sourcing to transformation and analysis.

Microsoft's official study guide for the DP-203 exam is recommended for detailed skill requirements.

The course structure will follow the natural lifecycle of data, making it easier for learners to understand.

Microsoft Fabric is not currently included in the exam, but the course will be updated if it is added in the future.

Batch processing is distinguished from streaming, with the course initially focusing on batch solutions.

The responsibilities of a data engineer include everything between data sourcing and data modeling/serving.

The course will utilize real-life scenarios and examples to illustrate data engineering concepts.

Transcripts

play00:00

hey there this is spirit and in this

play00:02

YouTube series I will help you become

play00:04

Azure data engineer as well I will

play00:07

prepare you to pass deep to free exams

play00:09

so let's get started

play00:11

now if I were you I would ask me a

play00:13

question like this so there is a lot of

play00:16

other courses out there right so what

play00:18

makes this one so special that I should

play00:20

choose it and I'm glad you asked

play00:23

so first of all I've been in it for over

play00:26

18 years and most of the time I spent on

play00:29

a various data engineering related tasks

play00:32

and trust me I've seen some wild stuff

play00:36

secondly I hold multiple other

play00:39

certifications

play00:40

so simply put I just know my stuff

play00:44

then

play00:45

I conducted a lot of trainings a lot of

play00:48

sessions and I got a very positive

play00:50

feedback about them which makes me

play00:52

believe that I really do know how to

play00:54

conduct them in an interesting way

play00:57

and I believe that it's much better to

play01:00

learn from someone who is really

play01:01

passionate about the topic then from

play01:04

someone for whom in just a nine to five

play01:06

job

play01:07

and finally this course is completely

play01:10

free you won't have to pay for anything

play01:12

there are no hidden costs no strings

play01:15

attached

play01:16

and one more thing

play01:18

is a Slough teaching I laughed from the

play01:20

knowledge and for sure I will have a lot

play01:23

of fun preparing those videos and

play01:25

hopefully you will have a lot of fun

play01:27

watching them

play01:30

alright so having some said that let's

play01:33

talk about some general assumptions

play01:35

about this course and you

play01:38

as the audience

play01:40

so

play01:41

you should be able to pass the exam with

play01:44

my help but please be aware that you

play01:47

need some hands-on experience you need

play01:49

to practice the stuff that I will be

play01:51

talking about

play01:53

and if you don't have your own Azure

play01:56

subscription yet

play01:57

no problem just use free other trial

play02:01

subscription right it should be enough

play02:02

for this Trading

play02:04

and in the video description there is a

play02:07

link that will help you create it

play02:10

then my goal of this course is not only

play02:15

to help you pass the exam

play02:17

but rather to help you become other data

play02:20

engineers

play02:22

it means that in some areas I will go

play02:25

deeper into some topics that it is

play02:27

really necessary from the exam point of

play02:29

view

play02:30

but I believe that those topics are

play02:33

really important that's what uh that's

play02:35

why I'm doing this

play02:37

next I assume that you have some basic

play02:41

knowledge about Azure that you know its

play02:43

fundamentals

play02:44

if not then a stopwriter and go and

play02:48

watch the Mars hacks a great playlist

play02:50

about fundamentals

play02:52

and the link is a video description

play02:56

then

play02:57

I will cover a lot of other services and

play03:01

other features during this course and

play03:03

some of which have quite similar names

play03:06

like data Lake data Factor data bricks

play03:10

and those names they might start to mix

play03:13

in your heads right

play03:15

and my advice to you guys is take notes

play03:19

really just take notes and personally I

play03:23

use OneNote to manage my notes but you

play03:26

can use whatever tool works for you

play03:28

whether it's Excel word OneNote mind map

play03:32

or even some physical note but please

play03:35

take notes

play03:38

next I will be drawing a lot sketching

play03:41

groups diagrams that will help me

play03:43

explain various topics

play03:45

and all of these drawings will be

play03:48

available on my GitHub and link to it

play03:51

you can find in the video and

play03:54

description

play03:57

I will do my best to upload

play04:00

new episodes of this series at least

play04:02

twice a month maybe even more often I

play04:05

will see anyway if you don't want to

play04:07

miss any of those episodes you know what

play04:10

to do right

play04:12

and lastly I love questions

play04:15

so if you have any just post a comment

play04:18

to those videos and I will try to answer

play04:22

them in future episodes

play04:26

alright so what is data engineering and

play04:31

I believe that the best way to explain

play04:33

it is to show a real life example so

play04:37

let's do this

play04:38

so

play04:39

have a wife

play04:40

who is a writer she writes books as a

play04:43

hobby right

play04:44

she wrote four books so far she has a

play04:48

publisher who has this website on which

play04:51

my wife can log in and check her sales

play04:56

and actually it became her everyday

play04:57

ritual

play04:59

to log in every morning to check if

play05:01

there was any new sale

play05:03

and when I saw this I realized that hey

play05:07

that's a great opportunity for me to

play05:10

automate this process and prove my wife

play05:13

that I really do know something about

play05:15

those computers and I did it

play05:17

but there were some challenges down the

play05:20

road that I would like to solve

play05:22

so first

play05:24

let's take a look at the

play05:27

um

play05:28

website on which my wife can suck her

play05:31

sales

play05:32

so first obviously she has to log in

play05:34

right

play05:36

then she has to go to this subsite and

play05:40

then in the middle

play05:42

in this table we can see her sales right

play05:47

easy right

play05:49

not not really

play05:51

so first of all this publisher all right

play05:55

they have this website right

play05:57

but there is no API exposed that I could

play06:00

connect to and query to get this data

play06:03

same instead I had to find a workaround

play06:05

in which I would just pretend that I'm

play06:08

logged in as my wife then go to this sub

play06:11

site and finally parse the content of

play06:13

this of this table

play06:17

then

play06:18

if we take a look

play06:20

at the data in this table you will

play06:23

notice that actually it is already

play06:26

aggravated which is a great thing for

play06:29

authors

play06:30

who want to check under sales but it

play06:32

doesn't really work if you would like to

play06:35

run some analysis on this data so let me

play06:38

explain what I mean

play06:40

so for example

play06:42

this Row in the second row means that

play06:47

there was a single copy

play06:50

of particular book actually this this

play06:54

book salt through Google play right

play06:59

okay but it is already aggregated data

play07:04

so let's say that tomorrow my wife sells

play07:07

it another 10 copies of this book

play07:10

then the next day I would see here 11 as

play07:14

a value right 1 plus 10 gives 11.

play07:18

so

play07:20

instead of having those aggregated

play07:22

values

play07:23

I wanted to have daily sales right which

play07:27

would allow me to slice and dice data

play07:29

later on in reports

play07:32

fortunately if you think about it it's

play07:35

quite easy

play07:38

to convert those aggregated data into

play07:40

daily say slide right

play07:43

the only thing you have to do is to take

play07:46

data from today aggregated values and

play07:49

some track subtract values from

play07:51

yesterday and there you go you've got

play07:54

sales

play07:55

from a single day

play07:57

so that was an example of

play08:00

transformations

play08:01

that's what required to do on this data

play08:04

set

play08:05

then

play08:07

if we take a look at some columns you

play08:11

will notice that they don't contain

play08:14

Atomic values so for example the first

play08:17

column

play08:18

it stores two types of information

play08:22

so the first one

play08:23

it indicates a sale Channel or a shop

play08:27

through which sale was made in this case

play08:29

it was Google play right

play08:32

the other information is a last sale

play08:35

date from given source

play08:38

so actually what I had to do was to

play08:42

parse this single column into two

play08:44

separate ones to be able to later on

play08:49

filter data by specific data source

play08:51

right

play08:53

and something similar we have in the

play08:55

second column

play08:57

but this time we have three types of

play09:00

information stored

play09:02

so the first one

play09:04

is a book title right

play09:07

the second one is a book format like

play09:09

ebook audiobook or a paper one

play09:13

and finally in case of ebooks we've got

play09:16

its format whether it's ePub Mobi or PDF

play09:20

so again this data had to be parsed into

play09:24

separate columns

play09:26

the last two columns are quite easy so

play09:29

this one it tells us how many copies are

play09:33

given book for given a cell Channel

play09:35

where salt

play09:37

and the last one tells us how much money

play09:40

my wife got from those sales and as you

play09:43

can see on their numbers

play09:45

writing books is not a very lucrative

play09:47

business

play09:51

then

play09:52

there was yet another issue with data

play09:55

with this data set so basically

play09:58

this whole table

play10:00

it might get reset from time to time

play10:04

so this is how it works

play10:07

so from time to time my wife generates

play10:10

the invoice to the publisher and then as

play10:12

she gets paid the money right all of

play10:16

this stuff

play10:17

but whenever that happens

play10:19

all of this data from this table it

play10:21

disappears right

play10:22

and new values start to accumulate from

play10:26

scratch

play10:28

and for me it means that I have to be

play10:30

well

play10:31

of this of this fact

play10:33

to adjust my calculations

play10:36

it also means that sometimes your data

play10:39

source

play10:40

is not a system not a file not an API

play10:43

but a human like in this case

play10:47

and about data sources

play10:50

there was yet another data source I had

play10:53

to include in this solution

play10:56

so my wife had this Excel file that she

play11:00

used to store and track her historical

play11:03

sales

play11:05

and she asked me to include that in

play11:08

those reports

play11:10

and as you can guess the format and the

play11:12

structure of this Excel file was

play11:14

completely different from uh data on

play11:18

this website

play11:20

but it's quite a common scenario

play11:22

that we have some historical data stored

play11:25

somewhere that we have to process

play11:30

there is one more data source that I

play11:33

could use in this case it's Facebook

play11:35

marketing API

play11:37

so basically my wife

play11:40

created some ads on Facebook to promote

play11:43

her books right

play11:44

and Facebook exposes this API

play11:48

through which we can get a lot of

play11:51

insights about those ads like number of

play11:53

views or number of clicks on those ads

play11:57

and I could grab this data and correlate

play12:00

it with sales

play12:01

to see if those ads made any difference

play12:05

right

play12:08

now

play12:10

once I processed all of this data

play12:13

I was able to detect if anything was

play12:16

sold right

play12:18

and if it was

play12:21

then I set this nice email to my wife

play12:25

and to me with this clear indication

play12:28

what was sold and how much money she got

play12:30

and actually this mail was sent

play12:32

automatically by my solution so no

play12:35

longer my wife had to log in and check

play12:37

it manually it was done automatically

play12:42

and finally

play12:44

I prepared some power bi reports that

play12:48

helped her to analyze her sales in a

play12:51

visual way and I know I admitted those

play12:55

reports are ugly I am not a power bi

play12:58

developer I don't have front-end skills

play13:01

but they do the work like they work

play13:05

but anyway

play13:08

the first question my wife asked when I

play13:11

presented the solution to it to her was

play13:14

can I export it to excel

play13:17

yeah and again that's a very common

play13:19

scenario that end users would like to

play13:22

have data exportable to excel

play13:26

but still my wife was impressed by this

play13:28

solution

play13:29

accomplished

play13:32

anyway

play13:34

looking at those numbers and this

play13:37

example that I just provided

play13:40

you can clearly see that there is a lot

play13:41

of challenges

play13:45

um for data Engineers a lot of questions

play13:47

you have to ask yourself

play13:49

for example what data sources are there

play13:52

how to connect to them how to

play13:54

authenticate

play13:55

what type of data source it is is it is

play13:58

it is it API is it a file is it a

play14:01

database

play14:03

what data is stored in given data source

play14:06

what it means actually

play14:09

do I get

play14:11

just a subset of data let's say from a

play14:14

particular

play14:16

date range or offer or from a whole

play14:19

timeline can I define that time range

play14:23

right

play14:25

then

play14:28

what Transformations have to be applied

play14:31

on those on this on those data sources

play14:34

on these data sets how to join data sets

play14:38

together how to detect changes between

play14:40

them right

play14:42

and a lot more don't worry we'll cover

play14:46

all of this stuff during this course

play14:51

now

play14:54

if we think about the exam

play14:58

and the skills that are required to pass

play15:00

it

play15:01

the best source of information is

play15:04

Microsoft site

play15:06

so if we type dp203 into a browser

play15:11

then the first link we have is a

play15:14

link from Microsoft learn right so

play15:17

that's the one that we want to check

play15:20

and here we can see this dp2 free study

play15:23

guide that's

play15:25

our link

play15:27

and on this side you'll have a very very

play15:31

detailed information about the skills

play15:33

that you should have right

play15:36

so you will have the audience profile

play15:39

and then detailed information about

play15:42

requirements and those requirements are

play15:46

split into different sections like

play15:48

design and implemented data storage

play15:49

which has something about a partition

play15:52

strategy data exploration layer

play15:55

then we've got data processing

play15:58

interesting and transforming data

play16:00

batch processing something about

play16:03

streaming and so on right so make sure

play16:06

that you review this list

play16:11

however what I don't like about this

play16:15

and this list is that it doesn't really

play16:17

correspond to a natural life cycle of

play16:20

our data

play16:22

and

play16:24

my course will be organized in a

play16:26

different way than this table

play16:27

to make it easier for you to learn

play16:31

so let me show it

play16:34

how it looks like so let me just turn on

play16:37

my drawing machine

play16:40

and then I will proceed

play16:43

and one question you might have is

play16:48

is Microsoft fabric included in the exam

play16:51

this new shiny vital from Microsoft and

play16:55

the answer is no not yet right for sure

play16:58

it will be added in the future

play17:00

but it hasn't happened yet

play17:03

and what it happens then I will

play17:06

just update this course or add some

play17:09

separate videos about Microsoft fabric

play17:14

all right so let's jump into whiteboard

play17:17

and let me draw something I call Basic

play17:20

bi flow

play17:21

and this basic bi flow it it is quite

play17:25

common in many

play17:27

BI Solutions right and I know that every

play17:30

ba solution is a different but there are

play17:33

some

play17:33

common steps that uh usually we have to

play17:36

implement

play17:39

another

play17:40

when we talk about BI Solutions first

play17:43

we've got to split them

play17:45

into two areas right so we've got batch

play17:48

Solutions

play17:52

and streaming ones

play17:55

streaming

play17:58

and right now I will focus on batch ones

play18:02

I'll get back to streaming later on so

play18:04

what are batch Solutions what is batch

play18:07

processing so basically it means that we

play18:11

are process processing data

play18:13

in batches right let's say once a day

play18:16

usually during the night

play18:19

and in this nicely processing we just

play18:22

grab all the data

play18:24

that was generated during the day we

play18:27

process it generate the reports so when

play18:30

employees come to work the next day they

play18:33

will have data prepared right

play18:37

and then

play18:38

depending on our requirements it might

play18:41

either process it once a day twice or

play18:44

maybe more often for example if you've

play18:46

got employees from different time zones

play18:49

we might process the data in a budget

play18:52

way twice a day

play18:54

streaming on the other way

play18:57

is completely different because here

play18:59

we've got a constant flow of events

play19:01

right and we've got to process them as

play19:04

they are delivered

play19:06

for example we might have some hard bit

play19:09

sensors in hospitals or some iot devices

play19:12

in factories that measure

play19:16

air temperature right for example but

play19:19

that's a different type of scenarios and

play19:22

we'll get back to streaming later on

play19:25

so the batching

play19:28

Solutions

play19:30

and let me draw this basic bi flow that

play19:34

I was talking about error

play19:36

so basically

play19:38

when we start

play19:40

processing our data when we think about

play19:43

Data Solutions

play19:45

we start with some data sources right

play19:49

and quite often there are just some

play19:52

files like CSV files text files

play19:57

Excel files and so on

play20:00

we might have

play20:02

some databases

play20:04

like SQL Server Oracle MySQL and so on

play20:08

they might be located on premises in

play20:10

some data center and in the cloud like

play20:13

in a Google Amazon or azure

play20:17

we might have some apis

play20:20

to which we've got to connect like

play20:21

Facebook API Google ads API and so on

play20:25

right

play20:26

so let me call all of this stuff just

play20:30

data sources

play20:33

data sources

play20:35

and this is simply input to our BI

play20:39

Solutions

play20:42

then

play20:43

on the other side of our

play20:46

solution we've got reports

play20:49

because that's what our end users would

play20:51

like to see

play20:53

so we might have different line charts

play20:56

that's so I don't know sales Maybe

play21:00

right

play21:01

we might have some key performance

play21:03

indicators that show in a visual way if

play21:07

everything is fine for example this

play21:09

green arrow up the sauce

play21:11

we are fine all right we might have a

play21:15

good old-fashioned tables that just saw

play21:18

a lot of numbers like detailed sales

play21:24

we might have some pie charts that are

play21:28

just yet another way to display data in

play21:30

a visual way right

play21:32

so all of this stuff

play21:34

are simply reports

play21:39

right and then we've got users

play21:43

who would like to view those reports

play21:50

and users

play21:55

and now basically

play21:57

it is our task

play22:01

to fill the gap between data sources and

play22:04

reports right

play22:06

so

play22:10

if we think about those common steps in

play22:13

data processing it starts with data

play22:16

ingestion with extracting the data right

play22:19

so let me call it

play22:21

ingest

play22:24

I'll

play22:25

we can call it extract

play22:28

all right we just want to grab the data

play22:30

from the source and store it somewhere

play22:33

so if you want to store the data

play22:35

somewhere it means that we need some

play22:38

data storage layer

play22:41

that should be flexible enough

play22:44

to be able to store

play22:46

different types of data like files data

play22:48

from databases from apis and so on

play22:52

then

play22:54

as I

play22:55

presented on this real life example

play22:59

data that we get from The Source very

play23:02

often

play23:04

requires some Transformations all right

play23:07

so that's it another stage that

play23:10

usually we have to do

play23:12

so we've got to transform our data

play23:16

transform

play23:18

right

play23:19

and transformed data has to be saved

play23:22

again somewhere it might be the same

play23:24

stretch layer that we used previously

play23:26

but or it might be something completely

play23:29

different

play23:30

it depends

play23:33

then what we might need to do is to

play23:37

model our data

play23:40

and serve it to our reports

play23:45

right

play23:47

like this

play23:48

and now

play23:51

what parts of this whole process

play23:54

are data engineer responsibilities

play23:58

so basically it is like this so

play24:01

everything

play24:02

between data sources and modeling and

play24:05

serving is a responsibility of a data

play24:09

engineer

play24:11

data engineer

play24:16

all right the remaining part which

play24:19

mainly

play24:21

consists of reports creation is a task

play24:24

of a data analyst

play24:28

so that's a data analyst

play24:34

and Microsoft has

play24:36

a separate certification and separate

play24:38

exam for data analysts right here we are

play24:41

covering data engineering path

play24:45

now in real life

play24:47

it will be quite often

play24:50

to have some overlap between data

play24:53

engineer and data analyst right so data

play24:56

engineer might be involved in creating

play24:58

reports and data analysts might be

play25:00

involved in transforming the data

play25:04

but anyway

play25:05

in this course we will focus on those

play25:07

early status like ingesting the data

play25:10

transforming it and so on

play25:13

and when it comes to data structure or

play25:16

to the course structure

play25:19

will follow

play25:21

this natural

play25:23

life cycle of data right so

play25:26

starting from investing from The Source

play25:28

through transforming it and so on and so

play25:31

on right

play25:33

so it will go

play25:35

like this course

play25:37

that's our course

play25:40

and basically

play25:42

and that's it one more thing about the

play25:45

exam

play25:46

if you would like to get more

play25:48

information

play25:51

or more lessons about this this exam

play25:56

then there is this great way to

play25:59

learn in your own pace those

play26:02

self-paced learning Puffs I highly

play26:05

recommend to at least take a look at

play26:07

them

play26:08

especially in areas that you feel that

play26:11

you need some improvement

play26:16

alright so basically that's it it's uh

play26:20

it was a lot of fun creating this on

play26:22

this video I'm really excited to start

play26:24

working on it

play26:26

and hopefully

play26:28

it will be a good series

play26:31

so see you next time when we'll talk

play26:33

about

play26:34

this cool service that we can use to

play26:37

store our data that we want to ingest

play26:39

from data sources so that's it for today

play26:42

take care and see you soon

Rate This

5.0 / 5 (0 votes)

الوسوم ذات الصلة
Azure Data EngineeringData ProcessingCertification PrepFree CourseData TransformationBatch ProcessingStreaming DataMicrosoft CertificationData IngestionPower BIGitHub Resources
هل تحتاج إلى تلخيص باللغة الإنجليزية؟