What is Data Science?

IBM Technology
13 Jun 202207:50

Summary

TLDRThe video script delves into the realm of data science, highlighting its intersection with computer science, mathematics, and business expertise. It outlines the data science methods, ranging from descriptive to prescriptive analytics, each answering different business questions with varying complexity and value. The script also details the data science lifecycle, starting from business understanding to data mining, cleaning, exploration, and visualization. It discusses the roles of business analysts, data engineers, and data scientists, emphasizing the importance of collaboration among these roles to transform data into actionable business insights.

Takeaways

  • 📊 Data science involves extracting knowledge and insights from noisy data and turning them into actionable steps for businesses.
  • 🔄 Data science is at the intersection of computer science, mathematics, and business expertise, requiring collaboration across all three disciplines.
  • 🔍 Descriptive analytics answers 'what happened,' diagnostic analytics answers 'why it happened,' predictive analytics answers 'what will happen,' and prescriptive analytics answers 'what should be done.'
  • 🏢 The data science lifecycle begins with business understanding to ensure the right questions are being asked.
  • 📥 Data mining is the process of gathering relevant data from various sources for analysis.
  • 🧹 Data cleaning is essential to remove errors, duplicates, and missing values to prepare the data for analysis.
  • 🔬 Data exploration helps analysts use various tools to answer questions, including advanced techniques like machine learning for prediction and recommendation.
  • 📊 Visualization is critical in presenting insights from data analysis in a way that businesses can understand and act on.
  • 🤝 Roles in the data science lifecycle include business analysts, data engineers, and data scientists, all of whom collaborate to cover different stages of the process.
  • 💡 There's often overlap in roles, with business analysts, data engineers, and data scientists sharing tasks such as data exploration, machine learning, and visualization.

Q & A

  • What is the textbook definition of data science?

    -Data science is the field of study that involves extracting knowledge and insights from noisy data, and then turning those insights into actions that a business or organization can take.

  • What are the three disciplines that intersect to form data science?

    -Data science is the intersection of computer science, mathematics, and business expertise.

  • What is the first type of data science method mentioned in the script, and what does it involve?

    -The first type of data science method mentioned is descriptive analytics, which is about understanding what is happening in the business and involves accurate data collection.

  • What is the difference between diagnostic and descriptive analytics?

    -Diagnostic analytics focuses on why something happened, such as why sales went up or down, while descriptive analytics is about what is happening, like whether sales increased or decreased.

  • How does predictive analytics differ from descriptive and diagnostic analytics?

    -Predictive analytics is about what is likely to happen next, using historical patterns to predict future outcomes, whereas descriptive analytics focuses on current happenings and diagnostic analytics on the root causes of past events.

  • What is prescriptive analytics and what kind of question does it answer?

    -Prescriptive analytics is about recommending the best actions to achieve a particular outcome, such as what actions to take to improve sales by 10%.

  • What is the first step in the data science lifecycle?

    -The first step in the data science lifecycle is business understanding, which is critical to ensure that the right questions are asked before proceeding with data science initiatives.

  • Why is collaboration across different roles in a data science project important?

    -Collaboration is important because different roles such as business analysts, data engineers, and data scientists each contribute unique expertise and there is often overlap in their responsibilities, requiring them to work together effectively.

  • What role do data engineers play in the data science lifecycle?

    -Data engineers help find, clean, and prepare data for analysis, playing a crucial role in the data mining and data cleaning stages of the data science lifecycle.

  • How does visualization fit into the data science process?

    -Visualization is the step where insights and outcomes from the analysis are presented in a way that is understandable and useful for business decision-making.

  • What is the role of a business analyst in a data science project?

    -A business analyst is involved in formulating questions, contributing domain expertise, and helping to visualize insights in a way that is useful for the business.

Outlines

00:00

📊 Introduction to Data Science and Its Disciplines

This paragraph introduces the concept of data science as a field that extracts knowledge and insights from noisy data to guide business actions. It emphasizes the intersection of computer science, mathematics, and business expertise as the core of data science. The paragraph also outlines different types of data science methods, including descriptive, diagnostic, predictive, and prescriptive analytics, each serving to answer questions of varying complexity and value. Descriptive analytics focuses on current business conditions, diagnostic seeks to understand why events occur, predictive anticipates future outcomes, and prescriptive recommends actions for desired outcomes. The data science lifecycle is introduced, starting with business understanding, followed by data mining, cleaning, and exploration.

05:00

🔍 Data Science Lifecycle and Roles

The second paragraph delves into the data science lifecycle, discussing the importance of using analytical tools to answer business questions and the progression from data exploration to advanced analytics using machine learning. It highlights the need for visualization to communicate insights effectively. The paragraph also outlines the roles within an organization that contribute to the data science process: business analysts who frame questions and visualize insights, data engineers who handle data procurement and cleaning, and data scientists who specialize in exploration and advanced techniques. The collaborative nature of these roles is emphasized, acknowledging the overlap and interdependence in their contributions to the data science lifecycle.

Mindmap

Keywords

💡Data Science

Data science is defined in the script as the field of study that involves extracting knowledge and insights from noisy data, and then turning those insights into actionable steps for businesses or organizations. It is central to the video's theme as it sets the stage for discussing various methodologies and roles within the data science lifecycle. The script emphasizes the intersection of computer science, mathematics, and business expertise as the core disciplines that constitute data science.

💡Predictive Analytics

Predictive analytics is a method within data science that focuses on forecasting future outcomes based on historical data patterns. In the script, it is positioned as a more complex and valuable type of data science question, asking 'what is likely to happen next?'. An example given is predicting sales performance for the next quarter, which illustrates its application in business forecasting.

💡Machine Learning

Machine learning is mentioned as an advanced analytical tool that leverages massive amounts of computing power and high-quality data to make predictions and prescribe actions for the future. It is a key component in the script's discussion of how data science is applied to complex questions, particularly in predictive and prescriptive analytics. The script implies that machine learning is a tool used by data scientists to analyze data and generate insights.

💡Descriptive Analytics

Descriptive analytics is described in the script as the first level of data science, which is about understanding what is happening in a business. It involves accurate data collection to answer questions like 'did sales go up or down?'. This concept is foundational to the video's narrative as it represents the starting point in the data science process, providing a basis for more complex analyses.

💡Diagnostic Analytics

Diagnostic analytics is portrayed in the script as the next level after descriptive analytics, focusing on understanding the 'why' behind events, such as sales fluctuations. It involves drilling down to the root cause of a problem, which is essential for businesses to identify areas for improvement or success.

💡Prescriptive Analytics

Prescriptive analytics is the highest level of data science methods discussed in the script, aiming to answer 'what should be done next?'. It is about recommending the best actions to achieve a particular outcome, such as increasing sales by a certain percentage. This concept is integral to the video's message about turning data into actionable business strategies.

💡Data Mining

Data mining is referred to in the script as the process of procuring the necessary data for analysis from the data landscape. It is a critical step in the data science lifecycle, ensuring that the right data is collected before analysis can begin. The script highlights the importance of data mining in preparing for the subsequent stages of data cleaning and analysis.

💡Data Cleaning

Data cleaning is described as a necessary step in the data science process where data is prepared and cleaned before analysis. The script mentions issues like missing values and duplicates that need to be addressed, emphasizing the importance of clean data for accurate and reliable insights.

💡Exploration

Exploration in the context of the script refers to the phase of the data science lifecycle where analytical tools are used to start answering questions about the data. It is a phase that can involve both simple and advanced analytics, depending on the complexity of the questions being asked and the insights sought.

💡Visualization

Visualization is mentioned as a crucial step in the data science lifecycle where insights and outcomes of the analysis are represented in a visual format. The script suggests that visualization is important for making data understandable and actionable for business analysts and other stakeholders within an organization.

💡Business Understanding

Business understanding is emphasized in the script as the starting point of any data science initiative. It involves ensuring that the right questions are asked before proceeding with data analysis. The concept is integral to the video's theme as it highlights the importance of aligning data science efforts with business goals and objectives.

Highlights

Data science is defined as the field of study that extracts knowledge and insights from noisy data to inform business actions.

Data science is an intersection of computer science, mathematics, and business expertise.

Data science initiatives require collaboration across computer science, mathematics, and business disciplines.

Descriptive analytics focuses on understanding what is happening in the business through accurate data collection.

Diagnostic analytics investigates the root cause of events, such as why sales went up or down.

Predictive analytics uses historical data patterns to forecast future outcomes, like next quarter's sales performance.

Prescriptive analytics recommends actions to achieve specific outcomes, such as increasing sales by a certain percentage.

The data science lifecycle begins with business understanding to ensure the right questions are asked.

Data mining is the process of procuring the necessary data for analysis.

Data cleaning involves preparing and cleaning data to remove issues like missing values or duplicates.

Data exploration uses analytical tools to answer business questions and may involve advanced techniques like machine learning.

Visualization is crucial for translating insights and analysis outcomes into understandable formats for business use.

Business analysts play a role in formulating questions, understanding the business, and visualizing insights.

Data engineers assist in finding, cleaning, and exploring data as part of the data science process.

Data scientists specialize in advanced exploration and machine learning techniques, contributing to the data science lifecycle.

Collaboration between business analysts, data engineers, and data scientists is essential for a successful data science initiative.

The data science lifecycle transforms noisy data into actionable knowledge and insights for business decisions.

Transcripts

play00:00

Let's talk about data science and some of  the other related terms you may have heard,  

play00:03

such as predictive analytics, machine  learning, advanced analytics and others.  

play00:08

So let's start with the textbook  definition of data science.  

play00:11

So data science is the field of study that  involves extracting knowledge and insights, from  

play00:26

noisy data, and then turning those insights  into actions that our business or organization  

play00:39

can take. Okay. So let's dig into it a little  bit more and discuss what are the different  

play00:45

areas that are covered by data science.  So really data science is the intersection  

play00:50

between three different disciplines. We start  with computer science, but then we also cover  

play01:02

the area of mathematics, and then what I  think is the most important is. Business  

play01:14

expertize. So the intersection of these three  disciplines is data science, and true data  

play01:22

science initiatives involve collaboration  across all these three different areas.  

play01:29

Okay. So now let's touch on the different  types of data science that you can do.  

play01:33

Now, what we need to understand here  is that we have different data science  

play01:37

methods for different questions that we might  ask in an organization. And these questions can  

play01:43

vary by complexity and the value that we  get out of them. So let's chart them here  

play01:51

by complexity and value. Okay. So the first  one that we have here is descriptive analytics.  

play02:08

So this is really about what is happening in  my business, right. And it involves having  

play02:13

accurate data collection to make sure that  we know what's happening. So a good question  

play02:18

we could ask here is, well, did sales go up or  down? The next level is diagnostic analytics,  

play02:31

and this is more about why did something  happen? So why did sales go up or down?  

play02:36

And it involves drilling down to the root cause  of our problem. Now, the next one that we have is  

play02:47

predictive analytics. So this is about what  is likely to happen next. Right. So what will  

play02:55

our sales performance be next quarter? And  it involves using historical patterns in our  

play03:00

in our data to predict outcomes in the future.  And then finally we have prescriptive analytics.  

play03:14

So this is about what do I need to do next? What  is the recommended best action for a particular  

play03:20

outcome? So a question we could ask here is  what do I need to do to improve sales by 10%?  

play03:25

Right. Okay. So now we can talk about how data  science is done and who actually does it. So let's  

play03:34

look at the data science lifecycle. And the first  thing that we always must start with is business  

play03:45

understanding.  

play03:49

So this is really critical to make sure that we're  asking the right question before we go down a  

play03:55

lengthy data science initiative. And this is where  you can see that having the business expertize and  

play04:01

the domain expertize can be incredibly critical to  make sure that we're asking the right questions.  

play04:07

Okay. So once we've defined that,  we can move on to data mining.  

play04:16

So this is the process of actually going out  into our data landscape and procuring the data  

play04:22

that we need for our analysis. So once we've  done that, we can move on to data cleaning.  

play04:34

So the reality of the marketplace is that we when  we find data, it's probably not in the best format  

play04:42

that we need it in. And it probably has some  some issues with it. Right. It might have rows  

play04:48

that have missing values. It might have duplicates  in it. So there are some preparation and cleaning  

play04:53

that we have to do before it's ready for our  analysis. So once we've done that cleansing.  

play05:00

We can move on to exploration.  

play05:08

Okay. So this is the part of the process that  allows us to use different analytical tools that  

play05:14

can start helping us answer some of the types  of questions that I mentioned here earlier.  

play05:20

And if we actually want to get into some of  these higher value questions like predictive and  

play05:24

prescriptive, then we must start using advanced  analytical tools such as machine learning tools  

play05:32

that leverage massive amounts of computing  power and massive amounts of high quality data  

play05:38

to make predictions and prescribe actions for  the future. Now once we've done our exploration  

play05:45

and perhaps our advanced analytics. What  do we do next? Well, we need to visualize  

play05:54

our insights and outcomes of our analysis. Okay.  Now I want to quickly touch on who does what  

play06:02

in this life cycle. So in an organization,  you may have roles like a business analyst,  

play06:10

you might have data engineers. And  then you might have data scientists.  

play06:18

So business analysts are obviously involved  in formulating the questions. They have the  

play06:24

domain expertize. They can help with the business  understanding, but they're also involved with.  

play06:30

Visualizing our insights in a way that's useful  for the business. Right. And then we have folks  

play06:36

like data engineering folks. So these are  the people that can help us find the data,  

play06:42

clean the data. And then also help with some  of the exploration. Next, we move on to our  

play06:49

data scientists. So these are the people that will  really help us with the exploration. They'll help  

play06:54

us with the advanced machine learning techniques.  And they'll also assist in the visualization.  

play07:01

So you can see there's there's some overlap  between the roles. And that's why it's critical  

play07:07

to have collaboration across these roles. And what  you also start seeing nowadays in the marketplace  

play07:15

is that sometimes business analysts have to do  some machine learning. They have to help out  

play07:19

with exploration data. Scientists sometimes  need to go and find the data on their own.  

play07:24

So there's a lot of overlap, and these different  roles must collaborate with each other.  

play07:30

Okay. So I hope you can see now how the data  science lifecycle can help us take noisy data,  

play07:36

turn it into knowledge and insights, and then  turn it into meaningful action for our business.  

play07:41

Thank you. If you have questions,  please drop us a line below.  

play07:45

And if you want to see more videos like this  in the future, please like and subscribe.

Rate This

5.0 / 5 (0 votes)

Etiquetas Relacionadas
Data SciencePredictive AnalyticsMachine LearningBusiness InsightsData MiningData CleaningData VisualizationAnalytical ToolsBusiness AnalyticsData Engineering
¿Necesitas un resumen en inglés?