Introduction to Data Science - Fundamental Concepts
Summary
TLDRSiobhan Kadar introduces the 2017 Data Science Bootcamp, outlining the interdisciplinary nature of data science and its importance in extracting insights from structured and unstructured data. She discusses the skills required to be a data scientist, including computer science, statistics, and domain expertise, and emphasizes the growing demand for such professionals. The presentation covers the role of data scientists in making sense of data, the significance of data analysis in various industries, and concludes with recommended books for further learning.
Takeaways
- 📊 Data Science is an interdisciplinary field focused on using scientific methods, processes, and systems to extract knowledge or insights from structured or unstructured data.
- 🌟 The demand for data scientists is high across industries due to the vast amounts of data available and the need to extract value from it.
- 👨💻 Hal Varian, Chief Economist at Google, emphasizes the importance of the ability to understand, process, extract value from, visualize, and communicate data.
- 🏆 Data science jobs have been recognized for their work-life balance and are considered one of the 'sexiest jobs of the 21st century' by the Harvard Business Review.
- 🧠 A data scientist must have a unique blend of skills, including more computer science knowledge than a statistician and more statistics than a computer scientist.
- 📈 The role of a data scientist involves cleaning, processing, analyzing data, and drawing inferences to make sense of complex datasets.
- 💡 Data scientists are equipped with knowledge in statistics, machine learning, linear algebra, programming, mathematics, data visualization, and domain expertise.
- 🌐 The availability of inexpensive computing power and cloud services like AWS, Google Cloud Platform, and Microsoft Azure has facilitated data analysis on a large scale.
- 🔍 Data scientists use advanced machine learning and programming to uncover deeper insights and make future predictions, going beyond the capabilities of traditional data analysts.
- 📚 The speaker recommends three books for further reading: 'Data Science for Business', 'The Art of Data Science', and 'The Elements of Statistical Learning'.
Q & A
What is the definition of data science according to the speaker?
-Data science is an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured.
Why is data science considered a hot field and important for the future?
-Data science is considered a hot field due to the availability of huge amounts of data and the need for professionals who can understand, process, extract value from, visualize, and communicate data across industries. Hal Varian, Chief Economist of Google, emphasized the importance of these skills for the next decades.
What does the speaker suggest as the key skills for a data scientist?
-A data scientist should know more computer science than a statistician and more statistics than a computer scientist. They must be proficient in statistics, machine learning, linear algebra, programming, mathematics, data visualization, and possess domain expertise.
Why is the demand for data scientists increasing?
-The demand for data scientists is increasing because of the massive amounts of data being collected across industries, the decreasing cost of computing power, and the need for advanced analytics to make future predictions and informed business decisions.
How does the speaker describe the role of a data scientist compared to a traditional data analyst?
-A data scientist uses advanced knowledge of machine learning, programming, and engineering to manipulate data and uncover deeper insights, making future predictions, while a traditional data analyst is bound by SQL queries and analytic packages to extract information from historical data.
What are the steps in data analysis according to the speaker?
-The steps in data analysis are stating the right question, exploratory data analysis, building a model, interpreting the results, and communicating the findings.
What is the significance of Google Trends mentioned in the script?
-Google Trends is significant as it provides an indication of the growing interest in data science over time by showing search trends and the popularity of data science-related queries.
What does the speaker mean by 'data-fication'?
-Data-fication refers to the process of turning previously invisible processes into data, such as quantifying preferences based on likes on Facebook or evaluating the significance of web pages using Google's PageRank algorithm.
What are the types of questions that can be asked during data analysis according to the script?
-The types of questions that can be asked during data analysis include descriptive, exploratory, predictive, causal, and mechanistic.
What are the three books recommended by the speaker for learning data science?
-The three books recommended are 'Data Science for Business' for a general audience, 'The Art of Data Science' for a more technical understanding, and 'The Elements of Statistical Learning' which is a technical book on statistical machine learning.
Outlines
📊 Introduction to Data Science
Siobhan Kedar introduces the 2017 Data Science Bootcamp, outlining the agenda which includes an introduction to data science, the role of a data scientist, and steps in data analysis. She defines data science as an interdisciplinary field focused on extracting knowledge from structured or unstructured data using scientific methods. Highlighting the importance of data science, she quotes Hal Varian, Google's Chief Economist, emphasizing the value of data processing and visualization. The demand for data scientists is underscored by industry and government needs, with data science jobs being highly sought after for their work-life balance and being dubbed the 'sexiest job of the 21st century' by the Harvard Business Review. The talk also touches on the practical aspects of data science, such as cleaning and processing data to draw meaningful inferences.
🧠 The Skills and Necessity of Data Scientists
This section delves into the skills required to be a data scientist, which include a blend of computer science, statistics, and domain expertise. It emphasizes the importance of understanding and manipulating data to uncover insights and make predictions. The speaker discusses the exponential growth of data collection across various industries and the affordability of computing power, facilitated by cloud services. The role of a data scientist is contrasted with that of a traditional data analyst, highlighting the data scientist's ability to use advanced techniques like machine learning for future predictions. The concept of datafication, turning non-quantitative information into data, is introduced with examples like Facebook Likes and Google's PageRank algorithm. The iterative process of data analysis is outlined, from stating the right question to communicating results, with an emphasis on the importance of asking the right questions and the role of hypothesis testing in data analysis.
📚 Conclusion and Recommended Readings
In the concluding part of the presentation, Siobhan Kedar summarizes the significance of data science as an interdisciplinary field with broad applications in various industries. She stresses the role of data scientists in making business decisions through pattern discovery and future predictions. The presentation ends with a recommendation of three books for further reading: 'Data Science for Business' for a general audience, 'The Art of Data Science' for a more technical perspective, and 'The Elements of Statistical Learning' for in-depth statistical machine learning knowledge. The speaker also acknowledges the contributions of Professors from Israel in preparing the presentation.
Mindmap
Keywords
💡Data Science
💡Data Scientist
💡Structured and Unstructured Data
💡Machine Learning
💡Data Cleaning
💡Data Visualization
💡Domain Expertise
💡Cloud Services
💡Data Analysis
💡Hypothesis Testing
Highlights
Introduction to data science as an interdisciplinary field focused on extracting knowledge from data.
Definition of data science as using scientific methods to process data and gain insights.
The importance of data science in various industries due to the availability of large data sets.
Quote from Hal Varian emphasizing the value of data understanding and processing in the future.
Data science being recognized as a field with excellent work-life balance and a 'sexy job' of the 21st century.
The demand for data scientists in new and emerging jobs across both government and industry sectors.
The process of making sense of data through cleaning, processing, analyzing, and drawing inferences.
Skills required to be a data scientist, including computer science, statistics, and domain expertise.
The necessity of data scientists knowing more computer science than statisticians and more statistics than computer scientists.
The role of data scientists in making future predictions using advanced statistics and complex data modeling.
The decrease in cost and increase in computing power, making data analysis more accessible.
The significance of Google Trends in indicating the growing interest in data science over the past 5 years.
The role of a data scientist versus a data analyst, with a focus on future predictions and deeper insights.
Datafication as the process of turning non-quantitative information into data for analysis.
Steps in data analysis, including stating the question, exploratory data analysis, building a model, and communicating results.
The iterative nature of data analysis and the importance of setting the right questions.
Recommendation of three books for further understanding of data science: 'Data Science for Business', 'The Art of Data Science', and 'The Elements of Statistical Learning'.
Conclusion on the interdisciplinary nature of data science and its importance in various industries and business decisions.
Transcripts
good morning everybody I'm Siobhan kadar
I welcome you for the 2017 data science
bootcamp here is the agenda first I will
give you an introduction to data science
and then I will talk about how to be a
data scientist and then why now and then
role of a data scientist and then steps
in data analysis and finally I will
conclude the presentation so in a simple
sentence a data science is an
interdisciplinary field about scientific
methods processes and systems to extract
knowledge or insights from data in
various forms either structured or
unstructured this is a very simple
definition you can start with this and
with the availability of huge amounts of
data and software and technologies
almost every industry today needs a data
scientist as there are lots of
interesting use cases so this is the
picture of Hal Varian he is the chief
economist of Google and what he said is
this the ability to take data to able to
understand it to process it to extract
value from it to visualize it to
communicate it that's going to be a
hugely important skill in the next
decades and as I mentioned data science
is a very hot field and this is has been
noted at this article
data science job stop glass door survey
for best work-life balance and the
Harvard Business Review calls it the
sexiest job of 21st century also both
government and industry have indicated
that there is a dire need for data
scientist for new and emerging jobs now
let's look at this picture you know if
you look at this picture you won't make
any sense right I mean you just some
numbers or whatever it doesn't make any
sense so what a data scientist will do
is the data scientist will take this
data it will be cleaned it will be
processed
we'll be analyzed and then it will draw
inference that's the whole idea so the
art of making sense of data that's the
whole idea of a role of a data scientist
so what it takes to be a data scientist
so in a nutshell this picture gives you
a very good idea about what it takes to
be a data scientist collect and clean
data explore and find trains build
models and algorithms design experiments
communicate results and design data
products this is typically in a nutshell
what a data scientist will know now so
what do you need to know now before I go
to this slide there is a very
interesting comment about data scientist
I don't remember who said that but a
data scientist must know more computer
science than a statistician and more
statistics than a computer scientist so
is it a very challenging yes definitely
it's really hard core data scientists
must know all those things so that means
statistics machine learning linear
algebra programming mathematics
including discrete mathematics data
visualization and also a domain
expertise so these are all the things
you know necessary to become a very good
data scientist and so why now we collect
more data than ever I mean all of you
agree with me that if you look at any
industry they're collecting huge amounts
of data whether it is any business a
large scale computer networks
pharmaceutical industry gene and
genomics life sciences social media
semiconductors you know you name it you
know sensor network smart cities
everyone every industry is collecting
massive amounts of data and the other
thing is the good news is it's
inexpensive and available computing
power so we have lots of computing power
if you look at Google or Amazon or
Microsoft they offer cloud services
right AWS and Google cloud platform
Microsoft Azure these are the typical
cloud service vendors who provide all
these services and it is also if you
look at Google Trends how many of you
are
with Google Trends it's pretty pretty
useful because Google trains actually
let me see if I I think I can show you
this you know you can search on Google
Trends and you can see if you look at
this slide you will see that a the the
horizontal axis actually shows the time
line and the vertical axis gives you a
you know people what are they are
searching for so in a scale of 0 to 100
in 100 being the maximum you can see
like if you look at data science for
example and if you can see past 12
months in we can also look at past 5
years
so this gives you an indication of what
people are looking for in the areas of
data science and how much interest you
know over the period of time last 5
years the interest is growing right so
this is also a very good indicator of
interest in this area and then if you
look at the sequencing of the human
genome you know if you look at this this
data is from National Human Genome
Research Institute you will notice that
the cost is also decreasing and most of
you are familiar with Moore's law which
says that every two years the computing
power the number of transistors in a
dense integrated circuit right is
doubling every two years so that's why
the computing is becoming cheaper and
cheaper over the years and you don't
need to invest a huge amount in IT
infrastructure today you can rent any of
the googles or Amazon's you know data
centers the servers and now I'm going to
talk about the role of a data scientist
you know typically a data analyst or an
architect can extract information from
large sets of data yet they are bound by
the SQL queries and analytic packages
used to slice these data sets so
typically if you look at historical data
they will be stored in a data warehouse
and then you run some SQL queries you
extract the information you generate
reports and that's how a data and
just worked you know but a data
scientist on the other hand used
advanced knowledge of machine learning
and programming engineering and data
scientists can manipulate data at their
own will uncovering deeper insight so a
data scientist will actually can make
some predictions based on the data while
your typical data analyst look to the
past and what's happened a data
scientist must go beyond this and look
to the future okay so through
application of advanced statistics and
complex data modeling
they must uncomfort at insan' make
future predictions so that's the role of
a data scientist and it's becoming an
increasingly important for every
organization to make some you know
future predictions and even later on you
will see how they can be used to make
decisions now let's talk about data
fication taking a process that was
previous previously invisible into data
for example if you look at Facebook
Likes you know so we want to quantify
that and how do i quantify the links you
know so you know measure preference
preferences based on likes and if you
look at Google you know every page you
associate and weight with that page like
Google's PageRank algorithm when you
evaluate the significance of webpages
based on links so now I will talk about
steps in data analysis the first thing
is stating there are question the second
one is exploratory data analysis the
third one is building a model then the
fourth step is interpret and then
communicate the results so setting that
so for each of these steps we will go
through the phases like you know setting
expectations then then we will do we
will collect the data and finally match
expectations with data and this is an
iterative process you know when you do
data analysis you do go through all
these steps and in the first iteration
you may not wait get good results so you
keep on iterating and unless you get do
several times you won't be able to get
good results in data analysis so that's
why it is an
iterative process so first step is
stating the right question like what is
the population of California that's the
descriptive type exploratory generate
hypothesis from data for example you can
make an hypothesis that the height of a
player basketball player is related to
the success of you know you can make an
hypothesis if you six feet and about
then he would be most likely very
successful like this this kind of
hypothesis you can make and then you can
make some prove a hypothesis based on
the data so we will discuss more about
hypothesis testing late later today and
then predictive what data a predicts be
like if you have high levels of co2 in a
particular region what is the effect of
related to the temporizing temperature
or global warming this kind of things
questions you can ask and will changing
a also change B that is like causal and
then how does a a fix big that's
mechanistic now we will go through all
these examples later today when Suman
will be talking in details about how to
run experiments with this kind of
hypothesis and finally I'm going to
conclude this presentation by saying
that data science is an
interdisciplinary subject that has great
applications in various industry and
businesses through application of
advanced statistics and computer data
modelling data scientists discovered
patterns and make future predictions
data scientists are becoming
increasingly important in making
business decisions and finally data
science is an important field with lots
of career opportunities okay and finally
these are the three books that I think
will be very useful particularly the
first one data science for business I
find this very useful it's written for a
general audience so anybody can read
this book the second one is a little
more technical the art of data science
right being and maths we and the third
one is the elements of statistical
learning this is a very technical book
and it's hard to read but it is current
it is a basically statistical machine
learning
things like that so these are the three
books that are very useful in this
context and and professor it is Sauron
from Israel and they were also helped me
in preparing this life so I would like
to thank them
Ver Más Videos Relacionados
Understanding The Data Life Cycle with DataBrew
Truth in Data Science | Jaya Tripathi | TEDxYouth@BHS
What is Data Analysis - Complete Introduction | Python Pandas Tutorial
How I Would Learn Data Science in 2022
How to start a Career in Data Science - [Hindi] - Quick Support
Intro to Data Science: What is Data Science?
5.0 / 5 (0 votes)