Data Governance Tutorial

The Career Force
1 Sept 202026:31

Summary

TLDRThis tutorial delves into data governance, highlighting its critical role in organizations amid escalating data and stringent regulations. It distinguishes data governance from data management, emphasizing governance's focus on overarching policies and processes, while management deals with the practical execution. The video outlines the necessity of data governance for ensuring data quality, security, and compliance, and discusses the pivotal roles of data owners, stewards, and champions. It advises starting with a clear scope, documenting data sources, and maintaining data integrity for effective governance. The tutorial also stresses the importance of regular reviews to adapt to the evolving data landscape.

Takeaways

  • πŸ“ˆ Data governance is increasingly important due to the exponential growth of data and new regulations around data management.
  • πŸ” It encompasses the rules, processes, and accountability concerning data, aiming for its routine use, harmonization of sources, and controlled access.
  • πŸ‘€ Data governance involves defining data ownership and ensuring data is managed and updated correctly by those responsible.
  • 🀝 The difference between data governance and data management is that governance sets the structure and rules, while management implements these rules in day-to-day operations.
  • 🏒 Good data governance ensures quality data is accessible to the right people efficiently and avoids data redundancy or unauthorized access.
  • πŸ‘₯ Key roles in data governance include data owners, data stewards, subject matter experts, and data champions, each with specific responsibilities and areas of expertise.
  • πŸ“‹ The process of implementing data governance begins with identifying who's involved, defining the scope of data to be governed, and documenting available data sources.
  • πŸ” Data mapping is crucial for understanding how different data sources relate and combine to form a complete picture of the data.
  • πŸ“˜ Metadata provides essential information about data, such as format and content, aiding in the understanding and use of data sets.
  • πŸ”’ Data integrity is a critical aspect of data quality, focusing on maintaining the accuracy, validity, and consistency of data throughout its lifecycle.
  • ♻️ Data governance is not a one-time task but requires periodic review and updating to adapt to changes in data volume, types, and usage patterns.

Q & A

  • What is data governance?

    -Data governance refers to the rules, processes, and accountability around data. It involves ensuring that data is used routinely, sources are harmonized, access is granted to those who need it, and ownership and management of data are clearly defined.

  • Why is data governance important for businesses?

    -Data governance is crucial as it helps maintain the quality and security of data, ensures compliance with regulations, and allows data to be used effectively within an organization. It prevents issues like multiple databases with the same information or unauthorized access to systems.

  • How does data governance differ from data management?

    -Data governance outlines the overall structure, rules, processes, and accountability for data use, while data management is the hands-on implementation of these governance rules. It involves the day-to-day tasks of ensuring the governance policies are followed.

  • What are the roles involved in data governance?

    -Roles in data governance include data owners or sponsors, who have decision-making power and are accountable for data accuracy, data stewards who oversee the data on a day-to-day basis, subject matter experts who understand the data content, and data champions who promote good data practices.

  • What is the role of a data owner in data governance?

    -A data owner has ultimate decision-making authority and accountability for the data they oversee. They ensure the data is correct, up-to-date, and that those working under them comply with data governance rules.

  • Why should an organization start with a specific scope when implementing data governance?

    -Focusing on a specific scope when starting with data governance helps prioritize areas that are most critical or have regulatory compliance requirements. This approach increases the likelihood of successful implementation and avoids the overwhelm of trying to control all data aspects at once.

  • How does data mapping play a role in data governance?

    -Data mapping helps understand how information in one data source relates to another, creating a more complete picture. It is essential for combining data from various sources and ensuring that the data is used accurately and consistently across the organization.

  • What is metadata and why is it important in data governance?

    -Metadata is information about the data, such as format, content, and field descriptions. It provides a guide to understanding what data fields contain and how they should be interpreted, which is vital for maintaining data quality and consistency.

  • Can you explain the concept of data integrity in the context of data governance?

    -Data integrity refers to the stability, accuracy, validity, and consistency of data throughout its lifecycle. In data governance, maintaining data integrity ensures that data remains reliable and trustworthy as it is accessed, moved, and used within systems.

  • Why is it necessary to periodically review data governance policies?

    -Data governance policies need periodic review because data and its usage are constantly evolving. Regular checks help ensure that policies remain relevant and effective, adapting to changes in data volume, types, and user needs over time.

Outlines

00:00

πŸ“Š Introduction to Data Governance

Jen introduces the concept of data governance, emphasizing its growing importance due to the exponential growth of data and increased regulatory oversight. She outlines the goals of data governance, which include ensuring data is used routinely, harmonized, and accessible only to authorized individuals. Data governance also involves data ownership and ensuring data is managed and updated correctly. Successful governance considers the who, what, when, where, how, and why of data, controlling security, and ensuring compliance. The tutorial aims to differentiate data governance from data management, with the former focusing on the overarching structure and rules, and the latter on the practical implementation of these rules.

05:01

πŸ”‘ Roles and Responsibilities in Data Governance

This section delves into the roles involved in data governance, particularly highlighting the data owner or sponsor who has decision-making power and is accountable for data accuracy. It discusses how larger organizations may have multiple data owners overseeing different data types, such as manufacturing or sales data. The paragraph also introduces other roles like data stewards, subject matter experts, and data champions, who work closely with the data and are crucial for effective data governance. The importance of a data governance committee in larger organizations is also mentioned, which is responsible for making decisions, resolving conflicts, and standardizing data usage across the organization.

10:02

πŸš€ Getting Started with Data Governance

The paragraph discusses the practical steps for organizations to begin implementing data governance. It suggests starting with a focus on who is involved and what data needs to be governed. It advises against trying to control all data from the start, recommending instead to focus on top priorities, such as areas tied to regulatory compliance. The paragraph also stresses the importance of documenting available data sources, understanding how data is currently used, and involving those who are knowledgeable about the data in the governance process to avoid future complications.

15:02

πŸ” Data Mapping and Metadata

This section explains the importance of data mapping, which shows how information from different sources relates and combines to form a complete data set. It uses sales data as an example, illustrating how order information, customer data, and inventory data can be mapped together based on common identifiers. The paragraph also introduces metadata, which provides information about the data, such as format and content, and is crucial for understanding and maintaining data quality.

20:02

πŸ›  Data Quality and Integrity

The focus here is on data integrity as a subtopic of data quality, which concerns the stability and reliability of data throughout its lifecycle. It discusses the importance of maintaining data accuracy, validity, and consistency to prevent data from becoming corrupted or inaccurate as it's used and moved within systems. The paragraph also touches on data scraping as a method to create structure where it may be lacking and to ensure that data used for critical decisions, like warranty claims, is accurate and relevant.

25:04

πŸ”„ Continuous Data Governance

The final paragraph emphasizes that data governance is not a one-time task but requires ongoing attention and periodic review. It stresses the need for policies and guidelines to evolve with changing data and business practices. The tutorial concludes with a call to action for viewers to apply the principles discussed, and it invites feedback and sharing of the tutorial with others who might benefit from the information.

Mindmap

Keywords

πŸ’‘Data Governance

Data governance refers to the rules, processes, and accountability structures surrounding the management of data within an organization. It is crucial for ensuring that data is used correctly, securely, and in compliance with relevant regulations. In the video, data governance is described as encompassing not just the rules but also the effective use of data to benefit the organization. It involves determining who has access to data, ensuring data quality, and maintaining data integrity throughout its lifecycle.

πŸ’‘Data Management

Data management is the practical implementation of the rules and processes outlined by data governance. It involves the day-to-day tasks of ensuring that data is organized, stored, and accessed according to the governance policies. The video distinguishes data management from governance by highlighting that while governance sets the structure and rules, management is about executing those rules and handling the operational aspects of data handling.

πŸ’‘Data Ownership

Data ownership is the concept of having designated individuals or roles within an organization who are responsible for the accuracy, accessibility, and maintenance of specific data sets. The video explains that data owners or sponsors are typically higher-level individuals within an organization who have the authority to make decisions regarding their data and ensure compliance with data governance policies.

πŸ’‘Data Stewards

Data stewards are individuals who work closely with data on a regular basis and have a deep understanding of the data's content and usage. They are responsible for the ongoing maintenance and quality of the data they steward. In the video, it is emphasized that involving data stewards in data governance projects is vital because of their practical knowledge and insights into how data is used within the organization.

πŸ’‘Data Quality

Data quality pertains to the accuracy, consistency, and reliability of data. The video script mentions that maintaining data quality is a critical aspect of data governance, as it ensures that the data used by the organization is trustworthy and useful. Good data governance practices help in preventing data from becoming outdated or corrupted, thus preserving its integrity.

πŸ’‘Data Integrity

Data integrity refers to the stability and consistency of data over its lifecycle. It ensures that data remains accurate and valid as it is used, moved, or processed within an organization. The video discusses the importance of data integrity in avoiding the 'telephone game' effect, where information becomes distorted as it is passed along. Maintaining data integrity is a continuous process that involves checks and balances to ensure data accuracy.

πŸ’‘Data Sources

Data sources are the origins or locations from which data is collected or generated. The video script discusses the importance of understanding data sources in the context of data governance, as it helps in determining the ownership, accessibility, and usage of data. It also touches upon the challenge of managing multiple sources that may provide the same information, requiring decisions on which source is the primary or authoritative one.

πŸ’‘Data Mapping

Data mapping is the process of understanding how different data sets relate to each other and can be combined to provide a more comprehensive view of the information. In the video, data mapping is used as an example of how sales data, customer data, and inventory data can be linked together to form a complete picture of a transaction. This process is crucial for effective data integration and analysis.

πŸ’‘Metadata

Metadata is data that provides information about other data, such as its structure, format, and content. The video script describes metadata as a descriptor of data fields, offering a guide to what information is contained within a data set. It is essential for understanding the context and usage of data and is often used in creating data dictionaries that help users navigate and interpret data accurately.

πŸ’‘Data Scraping

Data scraping is the process of extracting data from various sources, often when there is no direct relationship or mapping between the data sets. The video gives an example of data scraping in the context of warranty claims, where mileage data might need to be pulled from a text box and matched with warranty coverage data to determine eligibility. This technique is used to create structure and relationships where they do not naturally exist.

Highlights

Data governance is crucial for every business due to the exponential growth of data and increased regulations.

Data governance encompasses the rules, processes, and accountability surrounding data usage.

The goal of data governance includes harmonizing data sources, controlling access, and ensuring data ownership and management.

Data governance is about more than just rules; it's also concerned with making data useful to the organization.

Data management is the implementation of data governance rules, focusing on the day-to-day work of data.

Good data governance ensures quality data is accessible to the right people in an efficient manner.

Data governance involves multiple roles, including data owners, stewards, subject matter experts, and champions.

Data owners have the ultimate decision-making power and accountability for data within an organization.

Data stewards and subject matter experts play a crucial role in understanding and guiding data usage.

A data governance committee is responsible for making decisions and resolving conflicts regarding data usage.

When starting with data governance, it's important to define the scope and prioritize areas like regulatory compliance.

Documenting data sources, understanding data usage, and mapping data are essential steps in data governance.

Data mapping shows how information from different sources relates and combines to form a complete data set.

Metadata provides information about the data, such as format and content, aiding in understanding and usage.

Data scraping is used to capture and relate information when direct mapping between data sources is not possible.

Data integrity is a key aspect of data quality, ensuring data remains accurate and consistent throughout its lifecycle.

Data governance policies should be regularly reviewed and updated to keep up with the evolving data landscape.

Transcripts

play00:00

in the 15 years i've been working in

play00:02

analytics i've seen a

play00:03

growing focus on data governance in this

play00:06

data governance tutorial i'll go over

play00:08

what data governance is

play00:10

why it's important for pretty much every

play00:12

single business or organization

play00:14

and what sets apart good data governance

play00:17

from

play00:17

poor or even non-existent data

play00:20

governance at other companies

play00:21

hi i'm jen i help people learn about

play00:24

analytics skills and careers

play00:26

check this video description for

play00:27

additional resources

play00:31

[Music]

play00:32

as the amount of available data has

play00:34

grown exponentially over the last decade

play00:38

and more regulation has been put in

play00:40

place

play00:41

around data and information management

play00:44

many organizations have started to think

play00:46

more about data governance

play00:48

so what exactly is this data governance

play00:51

data governance is the rules

play00:53

processes and accountability around data

play00:56

there are multiple goals of data

play00:58

governance you want the organization to

play01:01

use data in a routine way for sources to

play01:04

be harmonized

play01:05

for the people that have access to it to

play01:08

be people that

play01:09

need to have access to the data and for

play01:11

people that shouldn't have access

play01:13

to not have access it also means

play01:16

ownership

play01:16

of the data who's responsible for it

play01:19

being right

play01:20

who's responsible for it being managed

play01:22

and updated correctly

play01:24

successful data governance considers the

play01:26

who

play01:27

what when where how and why of the data

play01:30

that it's governing

play01:31

while controlling the security of the

play01:33

data and ensuring compliance

play01:35

among many other things data governance

play01:37

should also be

play01:38

concerned with how can this data be made

play01:41

useful to the organization

play01:43

how can we do more than just have a

play01:45

giant storage location for information

play01:48

you may have also heard about data

play01:49

management what's the difference between

play01:51

data management and data governance

play01:54

the main difference is data governance

play01:56

outlines

play01:57

the overall structure that should exist

play02:00

it has the rules it has the processes

play02:02

the accountability

play02:04

it's more about what should happen how

play02:07

should things happen

play02:08

and data management is more about

play02:11

implementing all of those rules it's the

play02:13

hands-on

play02:14

everyday work to ensure that that

play02:16

governance the

play02:18

that's been put in place is being

play02:19

followed so it's the it

play02:22

teams executing on it it's the

play02:24

day-to-day

play02:25

management of information and access

play02:28

requirements and

play02:29

whatnot that people may need when it

play02:32

comes to the data

play02:33

if you want to know more about data

play02:34

management i'll link to a video

play02:36

on my second channel for avant analytics

play02:39

my consulting company

play02:40

now let's talk about why data governance

play02:42

matters

play02:43

i mentioned that data governance isn't

play02:45

just about the rules it's also about the

play02:48

use of the data and making

play02:50

it useful really good data governance

play02:53

implementation means that quality data

play02:56

is accessible to the right people and

play02:59

only the right people

play03:00

in an efficient way throughout the

play03:02

organization

play03:03

it means not having multiple databases

play03:06

that have the same information

play03:08

or access to systems for certain people

play03:10

that shouldn't have them

play03:12

it's making sure that these are in place

play03:14

so that there's a consistent

play03:16

understanding of

play03:17

who has access why they have access and

play03:20

what they're doing with that information

play03:21

that exists

play03:22

let's talk about getting started with

play03:24

data governance what does that

play03:26

actually look like practically for an

play03:28

organization that's implementing it or

play03:30

focusing more on it

play03:34

when it comes to data governance one of

play03:36

the first things that you want to think

play03:37

about

play03:38

is who's involved typically there are

play03:41

multiple roles for a very structured

play03:44

larger organization

play03:45

that is implementing data governance or

play03:47

has been working on it for a while

play03:49

sometimes in smaller organizations or

play03:51

ones that are new to data governance you

play03:53

might see these roles overlap

play03:55

let's talk about what each of those

play03:56

roles are first though the first role is

play03:59

the data

play04:00

owner or this data sponsor these are

play04:03

people that have ultimate

play04:04

decision-making ability about the data

play04:07

and have ultimate accountability for

play04:09

that data being correct and up-to-date

play04:12

typically these people are going to be

play04:14

higher up within the organization

play04:16

and have the ability to order or

play04:20

ensure that those working beneath them

play04:22

are complying with

play04:24

what is outlined in the roles that are

play04:26

defined as part of data governance

play04:28

there are typically many different data

play04:30

owners and sponsors when it comes to

play04:32

data governance

play04:33

often overseeing specific types of data

play04:37

so in an organization you might have a

play04:40

manufacturing data owner data sponsor

play04:43

who's responsible for

play04:44

maintaining everything related to a

play04:47

product that's being manufactured

play04:49

you may also have a sales data owner

play04:52

or even maybe a segment of sales you

play04:55

might have a commercial

play04:56

and a residential data owner depending

play04:59

on the different applications that your

play05:01

company is working with

play05:02

they're responsible for owning providing

play05:06

and following whatever guidelines are

play05:08

outlined for their set of data

play05:10

you can still have multiple data owners

play05:12

even for companies or organizations that

play05:15

aren't dealing with physical products

play05:17

for instance in a local government

play05:18

organization you

play05:20

may have one person that's responsible

play05:22

for voter registration

play05:24

data and one that's responsible for

play05:27

real estate tax data and one that's

play05:29

responsible for

play05:30

home ownership data for the locale

play05:33

so regardless of what scale what type of

play05:36

product or service

play05:38

that your organization has or is

play05:40

providing

play05:41

you can still have multiple different

play05:43

data owners

play05:44

if you're in a super small company if

play05:46

you've got a dozen people maybe there is

play05:49

just one person but even then sometimes

play05:51

you have multiple

play05:52

data owners if you have people

play05:53

responsible for different

play05:55

parts of the business in addition to

play05:57

data owners

play05:58

in larger organizations you'll also have

play06:01

data stewards

play06:02

subject matter experts and data

play06:04

champions these are people that

play06:06

are working more regularly with the data

play06:09

that really understand the content of

play06:11

the data

play06:12

they should ideally be consulted in any

play06:14

data governance project

play06:16

because they understand the more on the

play06:19

ground

play06:20

work that's being done with this data

play06:22

they understand the different

play06:23

uses that people have for it why certain

play06:26

people may need access

play06:27

that an executive may not see an

play06:30

immediate answer for

play06:32

any time that these people are left out

play06:34

of the data governance process

play06:35

there usually end up being a lot of

play06:37

headaches and hoops to jump through

play06:39

in the practical application of using

play06:42

that data

play06:44

there's still room to question here

play06:46

whether just because someone

play06:48

had access in the past do they still

play06:49

need to have access

play06:51

but incorporating involving these people

play06:53

that have more knowledge

play06:55

can really help improve that process and

play06:58

make sure that you're not backpedaling

play06:59

and having to redo a lot of work later

play07:02

on after you roll out these new rules

play07:04

larger organizations also typically will

play07:06

have a data governance committee

play07:08

this group is ultimately responsible for

play07:11

all the decisions that are made

play07:12

if there's conflicts between different

play07:14

groups they can help resolve them

play07:16

if there's decisions that need to be

play07:18

made or implementations that maybe need

play07:21

made to standardize

play07:22

how the data is used or stored or

play07:24

accessed across the organization

play07:27

this committee can act as a central

play07:28

resource to make sure that

play07:31

the data of different types isn't

play07:33

implemented in a lot of different ways

play07:35

across each different area so maybe

play07:38

instead of having

play07:40

one set of sales database with one

play07:42

access point to

play07:44

that and then a separate application

play07:46

that deals with client information or

play07:48

production information

play07:50

the data governance committee can look

play07:52

at it and say how can we

play07:53

integrate these better how can we have

play07:55

one location how can we make it

play07:58

similar regardless of the type of data

play08:00

you're looking for

play08:01

which can usually lead to a more

play08:04

streamlined process

play08:05

overall and make it much simpler to

play08:08

implement changes rather than dealing

play08:10

with maybe a dozen different

play08:12

types of systems to access data you

play08:15

consolidate to

play08:16

much fewer which can pull all of the

play08:19

resources

play08:20

into one location and make it easier to

play08:23

teach people

play08:24

how to access the data that they need to

play08:26

access

play08:27

even if it's contained within one system

play08:29

this doesn't mean

play08:30

automatically everybody has access the

play08:32

nice thing with the actual

play08:34

implementation is you can still have

play08:36

different roles that allow

play08:37

access to different pieces of data but

play08:39

having that centralized location can

play08:41

make it a lot easier

play08:43

for individuals or groups within the

play08:46

organization

play08:47

that need to use data from multiple

play08:48

different sources to complete their work

play08:50

in addition to establishing who's

play08:53

involved one of the very first steps

play08:55

that you should take concerning data

play08:56

governance

play08:57

is to think about the scope of the data

play09:00

that you want to govern

play09:01

it's really tempting to say we want to

play09:03

control it all

play09:04

but the reality is unless you're a very

play09:07

very very small company

play09:09

it's usually not practical to try to

play09:11

control

play09:12

everything from the start instead

play09:15

think about what your top priorities are

play09:18

so

play09:18

an easy solution for this is if you have

play09:21

areas of data management that tie

play09:24

to government or regulatory compliance

play09:27

this is a great place to start with your

play09:29

data governance because it's not just

play09:31

about your company

play09:32

or your organization it's about are you

play09:35

meeting the requirements

play09:36

of the law so focus on that area and

play09:40

then as you have

play09:41

pieces in place you can expand further

play09:44

and further

play09:45

but anytime that you try to take

play09:48

everything within the scope of work that

play09:50

you're doing

play09:51

you are much more likely to fail it's

play09:53

much more likely to take a lot longer

play09:56

to make the same type of progress

play09:58

because you're trying to take care of

play10:00

everything at once

play10:01

instead of one piece at a time an easy

play10:04

comparison is

play10:05

think if you tripped and fell down the

play10:07

stairs if you

play10:08

cut yourself and were bleeding profusely

play10:12

and you had a broken arm and you hit

play10:15

your head

play10:17

you ideally yeah you would fix

play10:19

everything at once

play10:20

but the company that is going through

play10:23

this with their data

play10:24

you fix the thing that's going to hurt

play10:26

you the most so if you fell down the

play10:28

stairs and you're bleeding

play10:30

you stop the bleeding that is the most

play10:32

immediate pressing concern that doesn't

play10:34

mean you

play10:35

ignore the broken bone or the potential

play10:38

head injury

play10:39

but you take them in order of what is

play10:41

the most serious that i deal with first

play10:43

the same is true anytime you're working

play10:45

with data what's the most immediate need

play10:48

what's going to have the most immediate

play10:50

consequence

play10:51

negative consequence if i don't do

play10:53

something about it

play10:54

and then once that's taken care of you

play10:56

can move on to the next thing

play10:58

if you're not dealing with compliance

play11:00

issues or something that's otherwise

play11:02

urgent you still need to set some sort

play11:05

of scope

play11:05

in this case you can just pick an area

play11:08

that may have

play11:09

a lot of advantage to working with or

play11:11

just pick an area

play11:12

sometimes people get too hung up on

play11:14

making sure they pick the right area

play11:16

that they don't just take action

play11:17

so if you're not sure what to do pick

play11:20

something say that you want to

play11:21

work on client data as the first step of

play11:24

governance

play11:25

and then you can move on to the next

play11:27

step or pick manufacturing data whatever

play11:29

you do

play11:30

don't let it stop you from doing

play11:32

something this can also sometimes inform

play11:34

who is involved in the data governance

play11:37

process up front if you're just getting

play11:39

started

play11:40

and you have people that you know are

play11:42

eager and want to be involved

play11:44

that can be a guideline for what you

play11:46

pick to work on

play11:47

or if you pick what to work on that

play11:48

could inform who should be involved in

play11:51

that process

play11:52

now that you have the who and the what

play11:53

it's time to move into more detail

play11:56

document what data you have available

play11:58

these are your data sources

play12:00

what information is in this data where

play12:03

does it come from

play12:04

do you have multiple sources that are

play12:06

providing the same information

play12:08

who owns the data who's an expert in it

play12:11

how often is it updated who checks to

play12:13

make sure it's updated correctly

play12:15

who accesses it and what do they use it

play12:17

for when they do access it

play12:19

answering these types of questions can

play12:20

really help you make a more informed

play12:22

decision on what rules

play12:24

processes and accountability that you

play12:26

put in place regarding

play12:28

a specific type of data before you jump

play12:30

right into making rules it's important

play12:32

to understand

play12:33

how people are currently using the

play12:35

information

play12:36

and why they're using it in that way

play12:39

otherwise

play12:40

again you end up with a poor

play12:42

implementation

play12:44

you end up making more work for people

play12:46

that

play12:47

still have to get their job done but now

play12:49

they have someone who doesn't have any

play12:51

idea what they're doing

play12:52

making decisions about what they can and

play12:54

can't have access to and how they're

play12:55

going to access it

play12:57

this doesn't mean you're not going to

play12:59

bother people by the decisions you make

play13:01

for data governance

play13:02

there are going to be people that are

play13:04

unhappy with the decisions you've made

play13:06

but they tend to be a lot more receptive

play13:09

to change and the organization is

play13:10

typically a lot more receptive overall

play13:13

when you've at least taken the time to

play13:15

listen to them

play13:16

account for their concerns factor that

play13:19

into the decision

play13:20

you're making and at least make an

play13:22

informed decision

play13:23

even if you know that makes things more

play13:25

challenging for some individuals or some

play13:27

teams or department it's easy for people

play13:30

to think about how

play13:31

they think the data should be used it's

play13:33

a completely different story to know how

play13:35

it

play13:36

actually is being used it's rare that

play13:39

there

play13:39

aren't some surprises along the way of

play13:41

how people are using

play13:43

information sometimes because they can't

play13:45

access what they really need and so

play13:47

they're substituting

play13:49

and making adjustments to existing

play13:51

information

play13:52

to be able to do the work that they need

play13:54

to get done you may also find that as

play13:56

you start exploring the data that's

play13:58

available that there are multiple

play14:00

sources for the same

play14:01

data in this case you have a decision to

play14:04

make

play14:04

do you still retain the information from

play14:06

multiple sources

play14:08

which is your primary authoritative

play14:11

source if there's a conflict

play14:13

for instance a simple example of this is

play14:15

in the automotive industry

play14:17

if someone files a warranty claim for

play14:20

their vehicle

play14:21

there are multiple ways that the company

play14:24

can get information about that

play14:26

they can get mileage information based

play14:28

on what's manually submitted on the

play14:30

warranty claim how much the dealer or

play14:32

the customer reports

play14:34

in terms of mileage there was on the

play14:35

vehicle at the time that the repair was

play14:37

scheduled

play14:38

however with newer vehicles that have a

play14:40

lot of remote technology

play14:42

they can also read this information off

play14:44

the control units on the vehicle

play14:46

so if there's a conflict there if

play14:48

there's a difference between the mileage

play14:50

that the dealer says and what the

play14:51

vehicle says

play14:53

unless there's a known issue with the

play14:55

vehicle where the mileage would be

play14:57

reported wrong

play14:58

typically you're going to want to

play14:59

prioritize what the vehicle says what

play15:01

the control units say

play15:03

automatically because there's usually

play15:05

less room for error there

play15:07

you can run into this with all sorts of

play15:09

data where you'll have these conflicts

play15:11

even if you don't see immediate

play15:12

conflicts it's still good to set a

play15:14

priority of what is your main source

play15:17

what is going to be the authority when

play15:20

it comes to

play15:21

the accuracy of that data as you look

play15:23

into the available data

play15:25

in most areas you're going to find that

play15:27

the data being used

play15:28

doesn't just come from one singular

play15:31

location it's usually made up of a

play15:33

variety of different sources

play15:35

for instance let's take sales data sales

play15:38

data might sound like it's one

play15:40

complete isolated thing however most

play15:43

sales data consists of client

play15:46

data information about who made the

play15:48

purchase

play15:49

it consists of actual sales data like

play15:52

what was the sales date what was the

play15:54

sale amount what was the exact order

play15:57

that was placed and it often consists

play16:00

of some sort of inventory or production

play16:02

information

play16:04

this isn't quite as typical for

play16:06

something off the shelf

play16:08

where client information isn't reported

play16:10

but if you're working for an

play16:11

organization that provides a service or

play16:15

provides any sort of customized product

play16:18

or even products that offer multiple

play16:21

variants

play16:22

this production information probably has

play16:24

information on

play16:26

if somebody orders shirt for instance

play16:28

what color did they order what size did

play16:30

they order

play16:31

was their inventory to fulfill it so all

play16:34

of these different pieces

play16:35

are in themselves individual data sets

play16:38

that are brought together to form the

play16:40

one complete

play16:42

set of data the one source of

play16:45

information that we think of the sales

play16:47

data to combine all of these we get into

play16:50

data mapping data mapping tells us how

play16:53

the data the information in one of our

play16:55

sources

play16:56

relates or maps to data in another

play16:59

source

play17:00

to combine to give a more complete

play17:02

picture so in the example of that sales

play17:05

data

play17:06

we would have the order information

play17:09

linked to customer information

play17:12

probably based on a customer name or a

play17:15

customer id

play17:17

so if you look in the customer file you

play17:20

have an id

play17:21

or you have a name there that is

play17:23

identical

play17:24

and unique for an individual for a

play17:28

company that

play17:29

is doing the purchase and then in the

play17:31

order you have that same unique

play17:33

identifier that same unique name that

play17:35

same unique number

play17:37

so that when you match up you look for

play17:40

the same thing in one

play17:41

and two and that's how you tell the

play17:43

systems to combine this information how

play17:46

to map this information

play17:47

same thing with the order and maybe

play17:50

inventory data where

play17:51

you have the part number that was

play17:54

ordered the service that was ordered

play17:56

and then in your inventory or your

play17:58

service information

play18:00

you have probably more detail so you

play18:03

have

play18:03

part number one in your order you have

play18:06

part number one

play18:07

in your inventory database and in your

play18:09

inventory you talk more about the

play18:11

details of that

play18:12

and so you combine those then we have

play18:14

cases where the mapping isn't direct

play18:17

so to map customer to inventory there's

play18:20

no direct mapping there's no

play18:21

direct single relationship they only

play18:24

relate or map

play18:26

together because of that order

play18:28

information

play18:29

that's a fairly simple step to get there

play18:33

sometimes it can be more complicated

play18:35

sometimes there can be two

play18:36

three or more steps in between to

play18:39

connect these different pieces together

play18:42

make them relevant make them relate to

play18:44

each other another piece of information

play18:46

that you'll typically have or should

play18:48

have about your data is

play18:49

metadata think of this as information

play18:52

about the data

play18:53

what type of format should it take by

play18:56

default

play18:57

what type of information is contained

play18:59

within it so for instance let's talk

play19:01

about order information

play19:02

metadata would describe every field that

play19:05

exists in that set of data

play19:07

so if we have our date our metadata

play19:11

would tell us

play19:12

what format that it's in let's say it's

play19:14

a

play19:15

a month day year format then

play19:19

what does it contain a short description

play19:21

so

play19:22

date of order from customer or date

play19:25

order received

play19:26

it gives a almost a dictionary of sorts

play19:29

and

play19:29

sometimes you'll see it called a data

play19:31

dictionary which describes

play19:33

the information that's contained within

play19:35

that data set metadata and a data

play19:38

dictionary

play19:38

aren't always exactly the same but in

play19:41

general they're giving you more

play19:42

general information about what's

play19:44

contained there so that everyone can

play19:47

understand what content should exist

play19:50

there

play19:50

and does exist within those types of

play19:53

data fields

play19:54

ideally different tables of data or

play19:56

different data sets will have

play19:58

clear mapping of how they relate even if

play20:00

they have to go through one or

play20:02

two other tables in the intermediate to

play20:05

connect

play20:06

one to the other however that's not

play20:09

always

play20:09

reality and when that's not the reality

play20:13

sometimes we need to use data scraping

play20:15

to be able to

play20:16

capture the right information for

play20:19

instance

play20:19

maybe we need to know the mileage of

play20:23

that vehicle

play20:24

when there was a warranty claim but what

play20:26

if the warranty database doesn't have

play20:29

mileage in that case maybe we ask that

play20:32

someone

play20:33

puts that information in a text box when

play20:35

they submit the claim

play20:37

data scraping is going and finding that

play20:40

and automatically pulling it out

play20:42

to see how it relates so how might it

play20:44

relate on a warranty claim

play20:46

well most vehicles have a fixed mileage

play20:49

or

play20:49

age limitation on the warranty so if

play20:52

this process has to be done through data

play20:54

scraping

play20:56

we'll scrape out what the mileage is and

play20:59

then

play20:59

map that to the warranty coverage to see

play21:03

is that mileage within the limits is the

play21:05

age within the limits

play21:07

does it qualify to be covered or does it

play21:10

not meet the criteria is it too old does

play21:12

it have too many miles those sorts of

play21:14

things

play21:15

so that's a simple example of data

play21:17

scraping sometimes it can get more

play21:18

complicated

play21:19

in general think about it as a way to

play21:21

try to create structure where

play21:23

not much structure exists i mentioned

play21:26

data quality

play21:27

which has a lot of different subtopics

play21:29

that could

play21:30

easily be ours on their own i'm not

play21:32

going to get into all of those today

play21:34

but there is one area that i do want to

play21:36

talk a bit about data integrity

play21:39

is a subtopic within the data quality

play21:41

area

play21:42

and data integrity is not just

play21:46

what's the overall quality of the data

play21:49

but it's how stable is our data how

play21:52

routine

play21:53

is it can we always trust it how is it

play21:56

updated how do we know that it's not

play21:58

corrupted think of data integrity as how

play22:00

well the accuracy

play22:02

validity and consistency of data is

play22:04

maintained across

play22:05

its life cycle that is from the moment

play22:08

that we first

play22:09

collect that data does it remain the

play22:12

same

play22:13

does it remain consistent does it remain

play22:16

to be true does it continue to be

play22:18

accurate

play22:19

as we move it around within our systems

play22:22

as we put it into different tools as

play22:25

different people start to use it

play22:28

do we maintain that integrity of

play22:30

information

play22:32

so we don't essentially end up with the

play22:33

telephone game that you might have

play22:35

played in school where you whisper

play22:37

something in some person's ear at the

play22:39

start and by the time you've gone

play22:40

through

play22:41

15 different people the message out the

play22:43

other end is very different than the

play22:45

message at the start

play22:46

the same thing can happen with data for

play22:48

a variety of reasons

play22:50

it could be system problems that create

play22:53

this

play22:53

challenge it can also be multiple people

play22:56

being involved that don't understand the

play22:58

context they don't understand

play23:00

where why how the data was collected

play23:03

and they make assumptions a lot of times

play23:06

there's assumptions

play23:07

at every step and by the time you get to

play23:09

the end

play23:10

then it's not really representative of

play23:13

what you started with

play23:14

this doesn't always have to be the case

play23:16

and as long as you're aware of it

play23:19

it's something that you can put more

play23:21

things in place to check you could have

play23:23

someone that checks

play23:24

the source data and the end result data

play23:27

to see

play23:28

do they match do they properly convey

play23:31

the right

play23:32

information are they still accurate are

play23:34

they still valid

play23:36

really checking to make sure that you

play23:38

don't end up with a completely different

play23:39

picture than what you started with

play23:41

maintaining data integrity and data

play23:43

quality throughout the life cycle

play23:45

isn't just a one-off thing you don't do

play23:47

it and then it's done

play23:48

it's also about having checks in place

play23:50

to make sure everything continually

play23:52

functions as expected

play23:54

if you have data automatically pulled

play23:56

into a system every day

play23:58

what checks do you have in place to make

play24:00

sure it all happened accurately how do

play24:02

you catch when mistakes were made

play24:04

so that somebody doesn't need to stumble

play24:06

into a problem with it

play24:08

and raise an alert how do you automate

play24:10

some of that to help

play24:11

ensure that that integrity that quality

play24:14

is maintained

play24:15

all along on an ongoing basis ultimately

play24:19

all of this work of data governance

play24:21

should lead to sets of rules

play24:23

processes and policies that are applied

play24:26

across the business to make sure that

play24:28

you have

play24:28

good data being used in a good way

play24:30

throughout the organization

play24:32

it's the right people accessing the

play24:34

right data at the right time with the

play24:36

right amount of accountability

play24:37

this work should inform business

play24:39

policies as well as data management

play24:42

as with data integrity and data quality

play24:44

data governance in general

play24:46

is not a do it once and forget about it

play24:49

forever sort of thing

play24:50

it constantly needs to be rechecked as

play24:54

fast as information is growing it's

play24:56

exponentially growing every year

play24:58

you may be getting more information

play25:01

tomorrow than you were getting yesterday

play25:04

and you may have different people doing

play25:06

different things with it than they were

play25:07

in the past so it's important to keep up

play25:10

with that

play25:10

if you set up data governance policies

play25:13

now even if they're perfect

play25:15

chances are very high that two years

play25:18

from now they're not going to be perfect

play25:20

something is going to have changed so be

play25:22

aware of that that doesn't mean making

play25:24

changes every day every week every month

play25:27

but it does mean that you have some

play25:29

periodic schedule that you come back and

play25:31

review

play25:31

and check and make sure that your

play25:34

policies your rules your guidelines are

play25:36

really keeping up with the information

play25:38

that's there

play25:39

rather than being really reactionary

play25:41

it's already a little reactionary to

play25:43

only follow up once a year

play25:46

or whatever frequency it is but at least

play25:49

then you only have a small thing you

play25:51

need to react to instead of

play25:53

five years from now two years from now

play25:56

realizing that

play25:57

none of the rules that you put in place

play25:59

are being followed because they're no

play26:01

longer relevant they no longer apply to

play26:03

the information that's available

play26:04

and how people really need to use that

play26:07

data

play26:08

to effectively run the business to

play26:11

effectively do their jobs

play26:12

i hope you enjoyed this data governance

play26:14

tutorial if you did enjoy it please

play26:16

consider

play26:17

giving it a thumbs up and sharing it

play26:18

with someone that you think may benefit

play26:20

from it

play26:21

thank you so much for watching

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
Data GovernanceInformation ManagementData OwnershipData StewardsComplianceData QualityData IntegrityAnalytics SkillsData ManagementBusiness Policies