A Beginners Guide To The Data Analysis Process

CareerFoundry
30 Sept 202110:20

Summary

TLDRThis video script offers a comprehensive guide to the data analysis process, outlining five key stages: defining the question, collecting data, cleaning data, analyzing, and sharing results. It emphasizes the importance of understanding business objectives, using various data types, and employing tools like Databox and Tableau for analysis and visualization. The script highlights the crucial role of data cleaning and the analyst's responsibility to communicate findings clearly to influence business decisions.

Takeaways

  • 🔍 The first step in the data analysis process is defining the objective, which involves formulating a hypothesis and determining how to test it.
  • 🤔 Understanding the business and its goals is crucial for a data analyst to frame the problem correctly and identify the right data to solve the business problem.
  • 📝 Data can be categorized into first, second, and third-party data, each with different sources and levels of relevance and reliability.
  • 🛠 Tools like Databox, DashaRoo, Grafana, Freeboard, and Dashbuilder are useful for creating dashboards to visualize data at the beginning and end of the analysis process.
  • 📈 After defining the objective, a strategy for collecting and aggregating the appropriate data is necessary, which includes determining the types of data needed.
  • 🧼 Data cleaning is a critical step that involves removing errors, duplicates, outliers, and irrelevant observations to ensure high-quality data for analysis.
  • 🕵️‍♂️ Data analysts spend a significant amount of time—up to 70 to 90%—cleaning data, emphasizing the importance of this step for accurate analysis.
  • 📊 Various data analysis techniques exist, including univariate, bivariate, time series, and regression analysis, each serving different analytical goals.
  • 📚 Descriptive, diagnostic, predictive, and prescriptive analyses are the four categories of data analysis, each providing different types of insights into the data.
  • 🗣️ Effective communication of findings is essential, using reports, dashboards, and interactive visualizations to present data insights clearly and unambiguously.
  • 🛠 Tools like Google Charts, Tableau, Datawrapper, and Infogram facilitate the sharing of data insights without requiring coding skills, while Python libraries like Plotty, Seaborn, and Matplotlib cater to those with programming knowledge.

Q & A

  • What are the five key stages of the data analysis process mentioned in the script?

    -The five key stages of the data analysis process are: 1) Defining the question, 2) Collecting the data, 3) Cleaning the data, 4) Analyzing the data, and 5) Sharing the results.

  • What is the importance of defining the objective in the data analysis process?

    -Defining the objective is crucial as it sets the direction for the entire analysis. It involves formulating a hypothesis and determining how to test it, which helps in framing the problem correctly and identifying the right data to solve the business problem at hand.

  • Can you provide an example of how a data analyst might reframe a business problem?

    -An example given in the script is when senior management asks, 'Why are we losing customers?' A data analyst might reframe this to 'Which factors are negatively impacting the customer experience?' or 'How can we boost customer retention while minimizing costs?'

  • What are the three categories of data sources mentioned in the script?

    -The three categories of data sources are first party, second party, and third-party data.

  • What is first party data and how is it typically collected?

    -First party data is data directly collected by the company from its customers. It often comes in a clear and structured form, such as transactional tracking data or information from a customer relationship management (CRM) system.

  • How does second party data differ from first party data?

    -Second party data is the first party data of other organizations. It is usually structured and can be obtained directly from the company or from a private marketplace, and while it is less relevant than first party data, it tends to be reliable.

  • What is third-party data and where can it be sourced from?

    -Third-party data is collected and aggregated from numerous sources by a third party. It often contains a lot of unstructured data or big data and can be sourced from industry reports, market research, open data repositories, government portals, or firms like Gartner.

  • Why is data cleaning considered a crucial step in the data analysis process?

    -Data cleaning is crucial because it ensures that the data is of high quality and free from errors, duplicates, or outliers. This step is important for accurate analysis and can prevent incorrect conclusions, which might lead to poor business decisions.

  • What percentage of time does a good data analyst typically spend on data cleaning?

    -A good data analyst typically spends about 70 to 90 percent of their time on data cleaning.

  • Can you explain the four categories of data analysis mentioned in the script?

    -The four categories of data analysis are: 1) Descriptive analysis, which identifies what has already happened, 2) Diagnostic analysis, which focuses on understanding why something has happened, 3) Predictive analysis, which identifies future trends by analyzing historical data, and 4) Prescriptive analysis, which allows for making recommendations for the future.

  • Why is it important for data analysts to present their findings clearly and unambiguously?

    -Clear and unambiguous presentation of findings is important because it influences the direction of the business. Decision makers rely on these insights for making strategic decisions, and honest communication ensures that conclusions are scientifically sound and based on facts.

Outlines

00:00

📝 Defining the Data Analysis Objective

In this introductory segment, Will outlines the initial step of the data analysis process, which is defining the question or problem statement. He emphasizes the importance of understanding the business's goals to frame the problem correctly. Using the example of a fictional company, 'Top Notch Learning,' he illustrates how to identify the core issue affecting customer retention. Will also introduces the concept of business acronyms and KPIs, and suggests tools like Databox, DashaRoo, Grafana, Freeboard, and Dashbuilder for creating dashboards to track business metrics throughout the analysis process.

05:05

🔍 Collecting and Classifying Data Sources

Will proceeds to explain the second step, which involves collecting and aggregating appropriate data. He categorizes data into three types: first-party data collected directly by the company, second-party data which is the first-party data of other organizations, and third-party data collected from various sources. Will discusses the benefits and sources of each data type, such as CRM systems for first-party data and industry reports for third-party data. He also mentions tools like Salesforce DMP, SAAS, Xplenty, Pymcore, and Dswarm for managing data collection strategies.

10:06

🧼 Cleaning and Preparing Data for Analysis

The third step, as described by Will, is data cleaning, which is essential for ensuring high-quality data analysis. Key tasks include removing errors, duplicates, and outliers, as well as structuring data for easier manipulation. He highlights that data analysts often spend a significant portion of their time on this step. Will recommends tools like Open Refine for basic cleaning and Python libraries such as pandas for more complex data sets. He also mentions enterprise tools like Data Ladder for data matching.

📊 Analyzing Data and Drawing Insights

In this part, Will delves into the actual analysis of the cleaned data, discussing various techniques such as univariate, bivariate, time series, and regression analysis. He categorizes data analysis into descriptive, diagnostic, predictive, and prescriptive analysis, each serving different purposes in understanding past events, diagnosing problems, forecasting trends, and making future recommendations. The choice of analysis technique depends on the insights one hopes to gain.

📈 Sharing Results and Influencing Decisions

The final step discussed by Will is sharing the results of the data analysis. This involves not only presenting the raw data but also interpreting the findings in a clear and unambiguous manner. Will stresses the importance of honest communication and covering all evidence to ensure scientific soundness in conclusions. He suggests using reports, dashboards, and interactive visualizations to support findings. Tools like Google Charts, Tableau, Datawrapper, and Infogram are recommended for those without coding skills, while Python libraries like Plotty, Seaborn, and Matplotlib cater to those with programming knowledge.

🎓 Further Learning Opportunities in Data Analytics

In the concluding part of the script, Will offers a resource for further learning in data analytics, recommending a short course by CareerFoundry that viewers can sign up for free through a link provided in the description. He also invites viewers to watch another video he made on data analytics, suggesting a continued learning path for those interested in the subject.

Mindmap

Keywords

💡Data Analysis Process

The data analysis process is a systematic method of examining and interpreting complex data to extract useful information, draw conclusions, and support decision-making. In the video, this concept is the central theme, with a focus on the five key stages that Will discusses, including defining the question, collecting data, cleaning data, analyzing data, and sharing results.

💡Problem Statement

A problem statement in data analysis is a clear and concise description of the issue that needs to be addressed. It is the starting point for any analysis, guiding the direction of the investigation. In the script, Will emphasizes the importance of defining the problem statement to ensure that the analysis is focused on the right business problem, such as understanding why a company like 'Top Notch Learning' might be losing customers.

💡Hypothesis

A hypothesis is an educated guess or proposal suggesting a possible explanation or prediction, which can be tested through data analysis. In the context of the video, Will mentions that defining an objective in data analysis involves coming up with a hypothesis and figuring out how to test it, which is crucial for framing the problem correctly.

💡Business Metrics

Business metrics are quantitative measures that capture the critical aspects of an organization's performance. They are used to set targets, evaluate outcomes, and make informed decisions. The script mentions that understanding and considering business metrics and KPIs (Key Performance Indicators) are essential for defining objectives in the data analysis process.

💡First Party Data

First party data refers to information collected directly by a company from its customers through transactions, interactions, or other direct contact. In the video, Will explains that first party data is crucial for data analysis as it provides a clear and structured way to understand customer behavior and preferences.

💡Second Party Data

Second party data is data collected by one organization and then shared with another organization. It is typically structured and can be used to enrich the primary data source. The script discusses how second party data can be beneficial in data analysis by providing additional perspectives or information that might not be available from first party sources alone.

💡Third Party Data

Third party data is collected and aggregated by a source that is not directly involved in the transaction or interaction between the other two parties. It often includes unstructured data and can be used for industry reports or market research. Will uses the example of Gartner to illustrate how third party data can be valuable in the data analysis process.

💡Data Cleaning

Data cleaning, also known as data scrubbing, is the process of detecting, correcting, or removing errors, duplicates, or irrelevant observations from a dataset to improve data quality. The video script highlights the importance of this step, stating that a good data analyst may spend 70 to 90% of their time on data cleaning to ensure the accuracy and reliability of the analysis.

💡Descriptive Analysis

Descriptive analysis is a type of data analysis that provides a summary of the data to describe what has happened. It is often the first step in the analysis process. In the video, Will explains that descriptive analysis helps to identify past events or trends without delving into the reasons behind them.

💡Predictive Analysis

Predictive analysis is a forward-looking type of data analysis that uses historical data to forecast future trends or behaviors. It is commonly used by businesses to anticipate future growth or to make strategic decisions. The script mentions predictive analysis as a way to identify future trends, which is a key part of the data analysis process.

💡Prescriptive Analysis

Prescriptive analysis goes beyond predicting future outcomes; it provides recommendations or actions based on the analysis. It is the most complex form of analysis as it incorporates insights from descriptive, diagnostic, and predictive analyses. In the video, Will describes prescriptive analysis as the final step in the analytics process, where the data analyst makes recommendations for the future based on the insights gained.

💡Data Visualization

Data visualization is the graphical representation of information and data. It helps in making the data analysis results more understandable and engaging for stakeholders. The video script discusses the importance of using data visualization tools like Google Charts, Tableau, and others to present findings in a clear and digestible manner.

Highlights

Introduction to the five key stages of the data analysis process.

Defining the question as the first step in data analysis, emphasizing the importance of a clear problem statement.

Understanding the business and its goals to frame the problem correctly.

The role of a data analyst in identifying the core problem and formulating a hypothesis.

The example of Top Notch Learning to illustrate defining the right problem statement.

Importance of identifying the right data sources to solve the business problem.

Explanation of first, second, and third-party data in the context of data collection.

Tools for data collection, including DMPs and open-source software.

The crucial step of data cleaning to ensure high-quality data for analysis.

Key data cleaning tasks such as removing errors, duplicates, and outliers.

Tools for data cleaning, including open-source options and enterprise solutions.

Different types of data analysis techniques, including univariate, bivariate, time series, and regression analysis.

The four categories of data analysis: descriptive, diagnostic, predictive, and prescriptive.

The importance of sharing results effectively with stakeholders using reports, dashboards, and visualizations.

Tools for interpreting and sharing findings, catering to different experience levels.

The impact of data analysis on business decisions and the importance of honest communication.

CareerFoundry's data analytics short course mentioned for further learning.

Transcripts

play00:00

Hi my name is Will and i'm going to give  

play00:02

you a step-by-step guide to the  data analysis process, let's go!

play00:08

in this video we're going to go through the five  key stages of the data analysis process we're  

play00:12

going to give you an overview and an introduction  to each of these stages as well as looking at some  

play00:16

of the tools you'll use to undertake these stages  so let's dive into step one defining the question  

play00:21

the first step in your data analysis process  or any data analysis process is to define your  

play00:26

objective in data analytics terms this is called  the problem statement defining your objective  

play00:32

means coming up with a hypothesis and figuring  out how exactly to test it you can start by asking  

play00:37

what business problem am i trying to solve now i  know this might sound straightforward but it can  

play00:42

actually be trickier than it seems for instance  your organization's senior management might pose a  

play00:47

question such as why are we losing customers it's  possible though that this doesn't get to the core  

play00:52

of the problem a data analyst job is to understand  the business and the business's goals as a data  

play00:58

analyst you need to understand this in enough  depth that they can frame the problem the right  

play01:02

way to give you a practical example let's say you  work for a fictional company for example we'll  

play01:07

call it top notch learning this fictional company  top notch creates custom training software for its  

play01:13

clients in this example top notch is excellent  at securing new clients but unfortunately top  

play01:19

notch has much lower repeat business as such as a  data analyst your question might not be why are we  

play01:25

losing customers but which factors are negatively  impacting the customer experience or even better  

play01:31

yet how can we boost customer retention whilst  minimizing costs now you've identified the problem  

play01:36

you need to find which data is going to help  you solve this issue this is where your business  

play01:41

acronym comes in again for instance perhaps  you've noticed that the sales pipeline for new  

play01:46

customers is very slick but the production team  is extremely inefficient knowing this you could  

play01:51

hypothesize the sales process actually wins a lot  of new clients but the customer experience well  

play01:57

it's kind of lacking could this be the reason that  customers aren't coming back what sources of data  

play02:02

will help you answer this question as a data  analyst considering all these things will help  

play02:06

you define the question and help you solve the  problem at hand there's also a number of tools  

play02:11

that can help you define your objective defining  your objective is mostly about soft skills so  

play02:16

business knowledge and lateral thinking but you'll  also need to consider business metrics and key  

play02:20

performance indicators and these are called KPIs  monthly reports can help you track problem points  

play02:26

in the business there are lots of tools out there  on the market that can analyze this business data  

play02:30

tools like Databox and DashaRoo there's also free  open source software like Grafana freeboard and  

play02:36

Dashbuilder these are fantastic for producing  simple dashboards both at the beginning and at  

play02:41

the end of the data analysis process so that was  step one defining the objective onto step two step  

play02:47

two collecting the data once you've established  your objective you'll need to create a strategy  

play02:51

for collecting and aggregating the appropriate  data a key part of this is determining which  

play02:56

data you need this might be quantitative data or  numeric data eg sales figures or monthly reports  

play03:03

or qualitative descriptive data such as customer  reviews all of this data fits into one of three  

play03:09

categories first party second party and third  party data let's explore each one briefly now  

play03:15

what is first party data first party data is data  that you or your company has directly collected  

play03:21

from customers it might for example come in the  form of transactional tracking data or information  

play03:27

from your customer relationship management  system your CRM system whatever it source  

play03:32

first party data is usually collected in a clear  and structured way other sources of first party  

play03:37

data might include customer satisfaction surveys  focus groups interviews or direct observation  

play03:43

let's talk about second party data to enrich your  analysis you might want to secure a secondary data  

play03:47

source second party data is simply the first  party data of other organizations this might  

play03:52

be available directly from the company or from  private marketplace the main benefit of second  

play03:56

party data is that it's usually structured  and although it's less relevant than first  

play04:00

party data it tends to be reliable examples of  second party data include website app or social  

play04:06

media activity like online purchase history or  shipping data so lastly what is third-party data  

play04:12

third-party data is data that has been collected  and aggregated from numerous sources from a third  

play04:18

party often but not always third-party data  contains a lot of unstructured data or big data  

play04:24

many organizations collect this big data to create  industry reports or to conduct market research the  

play04:30

research and advisory firm Gartner is a good real  world example of an organization that collects big  

play04:35

data and then sells it on to other companies open  data repositories and government portals are also  

play04:40

sources of third third-party data let's take  a moment to look at some of the tools that you  

play04:43

can use to collect data once you've devised this  data strategy ie you've identified what data you  

play04:48

need and how best to go about collecting it there  are many tools that you can use to help you one  

play04:53

thing you'll need regardless of industry or area  of expertise is a data management platform or DMP  

play04:59

a DMP is a piece of software which allows you to  identify and aggregate data from numerous sources  

play05:05

before they're manipulating them segmenting them  and so on there are many DMPs available some  

play05:09

well-known enterprise DMPs include salesforce DMP,  SAAS and the data integration platform Xplenty if  

play05:16

you want to play around you can also try some open  source platforms like Pymcore or Dswarm on to step  

play05:22

three cleaning the data once you've collected your  data the next step is to get it ready for analysis  

play05:29

this means cleaning or scrubbing it and this is  crucial to make sure that you're working with  

play05:33

high quality data key data cleaning tasks include  removing major errors duplicates or outliers all  

play05:40

of which are problems when you aggregate data  from numerous sources removing unwanted data  

play05:45

points so extracting irrelevant observations that  have no bearing on your intended analysis bringing  

play05:50

structure to your data or general housekeeping  so for example fixing typos or layout issues  

play05:55

which will help you map or manipulate your data  more easily and finally it helps filling in major  

play06:00

gaps as you're tidying up you might notice that  important data is missing once you've identified  

play06:04

these gaps you can go about filling them a good  data analyst will spend about 70 to 90 of their  

play06:10

time cleaning data this might sound excessive but  focusing on the wrong data or analyzing error in  

play06:15

this data will severely impact your results it  might even send you back to square one so whatever  

play06:20

you do don't rush this step let's have a look  at some of the tools that you can use to clean  

play06:24

your data cleaning data manually especially large  data sets can be incredibly daunting but luckily  

play06:30

there are many tools available to streamline this  process open source tools such as open refine are  

play06:36

excellent for basic data cleaning as well as high  level exploration however free tools offer limited  

play06:42

functionality for very large data sets now i know  this sounds like a data zoo but python libraries  

play06:47

such as pandas and some r packages are better  suited to heavy data scrubbing you will of course  

play06:53

need to be savvy with languages alternatively  enterprise tools are also available for example  

play06:58

data ladder which is one of the highest rated data  matching tools in the industry there are many more  

play07:03

why don't you see which data cleaning tools you  can find online share your free tools in the  

play07:07

comments below so that was step three cleaning  the data on to step four analyzing that data  

play07:13

finally once you've cleaned your data now comes  the fun bit analyzing it the type of data analysis  

play07:18

you conduct largely depends on what your goal is  but there are many techniques available univariate  

play07:23

or bivariate analysis time series analysis and  regression analysis are just a few you might  

play07:28

have heard of more important than the different  types though is how you apply them this depends  

play07:33

on what types of insights you're hoping to gain  broadly speaking all types of data analysis fit  

play07:38

into the four following categories descriptive  analysis which is analysis which identifies what  

play07:42

has already happened this is a common first step  that companies do before proceeding with deeper  

play07:47

explorations diagnostic analysis where the focus  is on understanding why something has happened  

play07:52

it is literally the diagnosis of a problem just  as a doctor uses the symptoms to diagnose the  

play07:56

patient's disease predictive analysis which is  where you identify future trends by the analysis  

play08:02

of historical data predictive analysis is commonly  used by businesses to forecast future growth and  

play08:07

lastly prescriptive analysis which allows  you to make recommendations for the future  

play08:11

this is the final step in the analytics part  of the process but it's also the most complex  

play08:16

this is because it incorporates aspects of all  the other analyses that we've described today  

play08:20

step 5 sharing your results you've finished  carrying out your analyses you have your insights  

play08:25

the final step of a data analysis process is to  share these insights with the wider world or at  

play08:31

least with your organization's stakeholders this  is actually more complex than just sharing the raw  

play08:35

results of your work it involves interpreting the  outcomes and presenting them in a manner which is  

play08:40

digestible to everybody that's in the room since  you'll also present your work to decision makers  

play08:44

it's very important that the insights that you  share are 100 clear and also unambiguous for  

play08:50

this reason data analysts usually use reports  dashboards and interactive visualizations to  

play08:55

support their findings how you interpret and  present results will often influence the direction  

play08:59

of the business depending on what you share  your organization might decide to restructure to  

play09:04

launch a new product or close an entire division  that's why it's very important to present all the  

play09:09

evidence that you gathered and not to cherry-pick  data ensuring that you cover everything in a clear  

play09:14

and concise way will prove that your conclusions  are scientifically sound and based on facts on the  

play09:19

flip side it's important to highlight any gaps  in the data or to flag any insights that might  

play09:24

be open to interpretation remember that honest  communication is an important part of the process  

play09:29

it will help the business but it will also help  you to excel at your job there's a ton of tools  

play09:34

for interpreting and sharing your findings these  tools are suited to different experience levels  

play09:39

but popular tools that require no coding skills  include Google Charts, Tableau, Datawrapper and  

play09:45

Infogram if you're familiar with python and  r there are also many data visualization  

play09:50

libraries and packages available for instance  check out the Python libraries Plotty, Seaborn  

play09:55

and Matplotlib whichever data visualization  tools you use make sure that you polish up your  

play10:00

presentation skills too visualization is great  but communication is key so hopefully now you  

play10:06

have a better idea of the data analysis process  CareerFoundry have an awesome data analytics short  

play10:11

course and you can sign up for free by the link in  the description thanks for joining us today i hope  

play10:15

this video has been helpful here's another video  i made about data analytics which is just for you

Rate This

5.0 / 5 (0 votes)

Ähnliche Tags
Data AnalysisProblem SolvingBusiness GoalsCustomer RetentionData CollectionData CleaningData ToolsInsight SharingPredictive AnalysisData Visualization
Benötigen Sie eine Zusammenfassung auf Englisch?