Intro to Data Science: Historical Context
Summary
TLDRThis lecture explores the concept of data science, emphasizing its long-standing roots in human history. It distinguishes between data-driven science and the emerging field of data science, which involves handling, cleaning, storing, visualizing, and modeling data. The talk uses the historical example of Tycho Brahe's meticulous planetary observations, crucial for Kepler's laws and Newton's theories, to illustrate data science's impact. It also highlights the importance of moving from descriptive models like Kepler's to generalizable theories like Newton's, a goal for modern data scientists and machine learning practitioners.
Takeaways
- 🔬 Data science is not a new concept; humans have been collecting and modeling data for centuries.
- 📚 The term 'data science' can mean different things to different people, often referring to data-intensive science, engineering, or data-driven inquiry.
- 🌌 Astronomy is highlighted as an example of a data-intensive science, where data collection and analysis have been pivotal in understanding planetary motion.
- 📈 Tycho Brahe's meticulous data collection on planetary and star movements was instrumental in Kepler's discovery of elliptical orbits.
- 🔍 Brahe's dedication to rigorous data collection and storage laid the groundwork for future scientific advancements.
- 🐃 Fun fact: Tycho Brahe was an intriguing character with a pet moose that enjoyed beer, reflecting his unique personality.
- 📚 Kepler's laws describe the elliptical motion of planets, while Newton's laws explain why planets move in these orbits, demonstrating the progression from description to cause.
- 🚀 Newton's generalized laws allowed for practical applications like the Apollo program, showing the importance of generalization in scientific theories.
- 🤖 Modern machine learning algorithms often describe the world as observed (like Kepler), but the goal should be to create models that generalize like Newton's did.
- 📈 The 'fourth paradigm' of data-intensive scientific discovery complements traditional methods like theory, experiments, and simulations, rather than replacing them.
- 📘 For those interested in the technical aspects of data science, the book 'Data-Driven Science and Engineering' and the associated website offer in-depth lectures and resources.
Q & A
What is the main focus of the lecture series on data science?
-The lecture series focuses on providing an introductory overview of data science, explaining what it is, how it can be used, and its various aspects.
Why is it emphasized that data science is not a new concept?
-It is emphasized because humans have been collecting and modeling data for centuries, and the concept of data science has evolved over time rather than being a completely new invention.
What are the different interpretations of the term 'data science' mentioned in the script?
-The different interpretations include data-intensive science, data-intensive engineering, and data-driven inquiry, all of which involve using data to drive scientific investigation and discovery.
What is an example of a data-intensive science field mentioned in the script?
-Astronomy is given as an example of a data-intensive science field, where the collection and analysis of data about celestial bodies have been crucial for scientific advancements.
Who is Tycho Brahe and why was he significant in the history of data science?
-Tycho Brahe was a Danish astronomer known for his meticulous data collection on the motion of planets and stars, which was instrumental in Kepler's discovery of planetary motion laws.
What inconsistency did Tycho Brahe notice between the models of his time and his observations?
-Tycho Brahe noticed inconsistencies between the predicted planetary conjunctions and the models of planetary motion of his time, leading him to collect rigorous and systematic data.
What is the significance of Kepler's laws of planetary motion in the context of data science?
-Kepler's laws describe the elliptical orbits of planets, which were derived from the data collected by Tycho Brahe. This demonstrates the power of data in shaping scientific understanding and theories.
How did Isaac Newton's work build upon the foundation laid by Tycho Brahe and Kepler?
-Newton explained why planets move in elliptical orbits by formulating the universal law of gravitation, which generalized the principles behind planetary motion and enabled further scientific and technological advancements.
What is the difference between Kepler's and Newton's approaches to modeling the world, as discussed in the script?
-Kepler built a model based on observed data describing how the solar system works, while Newton generalized these observations into a physical principle that could predict and explain a wider range of phenomena.
What is the 'fourth paradigm' referred to in the script, and how does it relate to data science?
-The 'fourth paradigm' refers to data-intensive scientific discovery, which complements traditional methods like theory, experimentation, and computation by leveraging massive amounts of data for scientific insights.
What resource is recommended for those interested in the mathematical aspects of data science?
-The book 'Data-Driven Science and Engineering' co-authored by the speaker and Nathan Cutts is recommended, along with their website databook.udub.com, which contains lectures and videos on various topics.
Outlines
📚 Introduction to Data Science and Its Historical Roots
This paragraph introduces the concept of data science, emphasizing that it is not a new field but rather an evolution of human practices that date back centuries. The speaker clarifies that data science can mean different things to different people, such as data-intensive science, data-intensive engineering, or data-driven inquiry. The paragraph uses the example of astronomy to illustrate data science in action, highlighting Tycho Brahe's meticulous data collection on planetary motion, which was instrumental in Kepler's discovery of elliptical orbits. The historical narrative also touches on the significance of data in scientific discovery and the transition from observational data to the formulation of universal laws, as demonstrated by Newton's work on gravitational forces.
🚀 From Observational Models to Generalized Theories in Data Science
The second paragraph delves into the distinction between descriptive models like Kepler's elliptical orbits and the generalized theories that enable practical applications, such as Newton's laws of motion. It discusses the importance of moving from data description to data generalization, which is essential for advancements like the Apollo moon landings. The speaker also references 'The Fourth Paradigm: Data-Intensive Scientific Discovery,' a book that outlines the progression of scientific methods, from theoretical analysis to data-driven inquiry. The paragraph concludes with a recommendation for those interested in the technical aspects of data science to explore a book co-authored by the speaker and Nathan Cutts, which covers the mathematical foundations of data science algorithms, and mentions a website where lectures on various topics are available.
Mindmap
Keywords
💡Data Science
💡Data-Intensive Science
💡Data-Driven Inquiry
💡Astronomy
💡Tycho Brahe
💡Johannes Kepler
💡Isaac Newton
💡Data Collection
💡Modeling
💡Generalization
💡The Fourth Paradigm
Highlights
Data science is not a new concept; humans have been practicing it for centuries.
Data science can mean different things to different people, such as data-intensive science, data-intensive engineering, or data-driven inquiry.
Astronomy is an excellent example of a data-intensive science, with historical roots in the work of Tycho Brahe, Johannes Kepler, and Isaac Newton.
Tycho Brahe's meticulous data collection on planetary motion was critical for Kepler's discovery of elliptical orbits.
Brahe's dedication to rigorous data collection led to a systematic format that was crucial for scientific advancement.
Kepler's laws of planetary motion were a result of analyzing Brahe's data, illustrating the importance of data in scientific discovery.
Isaac Newton's work built upon Kepler's model, providing a generalized physical principle that explained why planets move in ellipses.
Newton's famous quote emphasizes the significance of data in supporting scientific hypotheses and theories.
The difference between Kepler's descriptive model and Newton's generalized theory is a key concept for modern data scientists and machine learning practitioners.
Data science as a field is about handling data through collection, cleaning, storage, visualization, and modeling.
The fourth paradigm of data-intensive scientific discovery complements traditional methods like theory, experimentation, and computation.
Data science does not replace existing scientific methods but rather integrates and enhances them with data-driven insights.
The book 'The Fourth Paradigm: Data-Intensive Scientific Discovery' discusses the progression and impact of data-driven science.
Astronomy's historical context provides a clear example of how data science has been integral to scientific progress.
The story of Tycho Brahe's dedication to data collection and its impact on Kepler and Newton's work highlights the value of rigorous data in science.
The transition from Kepler's descriptive model to Newton's generalized theory represents a goal for modern machine learning algorithms to achieve broader applicability.
The book 'Data-Driven Science and Engineering' by the speaker and Nathan Cutts provides a deeper dive into the mathematical foundations of data science.
The speaker's website offers lectures and resources for those interested in the technical aspects of data science, machine learning, and their mathematical underpinnings.
Transcripts
welcome back so we're talking about data
science this is an intro overview
lecture series on kind of what is data
science how can you use it what are the
aspects and one thing I think is just
really important to emphasize is the
data science is not new we've been doing
data science as humans for hundreds
thousands of years collecting data
modeling the world through that data and
I think data science as a terminology
means different things to different
people so there's what I like to think
of as data intensive science data
intensive engineering or data-driven
inquiry and that's science that you do
based on data ok like if I want to solve
so astronomy is a great example of
something that is data intensive science
I think of the phrase data science this
is an emerging scientific discipline
which is motivated by data intensive
science but it's really the science of
how do you handle data collect clean
store visualize and model with data so
it's a little confusing you have data
driven science and that motivates this
whole new field of science and
engineering called data science and I'm
going to use them interchangeably but
that I just want to kind of deconflict
those two terms early on and astronomy
is a great example I want to walk you
through just this very interesting
history example that I loved about kind
of Tycho Brahe and Kepler and Newton to
give some idea of what data science
looks like in a historical context so
this is Tycho Brahe great Danish
astronomer who collected the rich data
set of the motion of planets and stars
that was critical in Kepler's discovery
of his his ellipses and planetary motion
so to some extent Tycho Brahe was
noticed inconsistencies between the
models of the time kind of the the old
law
of how the planets would move and he
noticed inconsistencies with what he
observed so you know he there was this
predicted conjunction of planets and it
didn't agree with the models to his
satisfaction and so he realized this I
think was as a teenager that he needed
to collect rigorous clean data to store
it in a systematic format and to to make
a science out of the data collection of
planets and stars and he dedicated his
life to this he had an island between
Copenhagen and Sweden I don't know if
you can see it here but this is his
science island of hven where he
collected all of this rich data and he
guarded this data so this was his life's
work and he knew how much value and it
turns out Kepler didn't even really have
full access to the data until Tico Bray
passed away and so so both of the knew
the value of the data and kind of moving
moving the theory of planetary motion
forward and this was a critical piece in
Kepler's famous law of the elliptic
planets elliptic motion of planets fun
fact about Tico very interesting
character I encourage you to read more
about him he lost the tip of his nose in
a duel when he was a young man arguing
about who was a better mathematician on
his Science Island he had a pet moose
which was apparently very fond of beer
and would entertain his guests by
drinking a tremendous amount of beer so
Chico Bray is a really interesting guy
you can only imagine what his
personality would be like he had to you
know he made his life's work of very
very very careful observations which
changed the world forever through
through those who came after and I think
this also laid the foundation so this
this data intensive inquiry laid the
foundation for what Newton would go on
to do so
Kepler described these elliptic motion
of the planets and Newton explained why
the plants move in these ellipses and
actually I think a great
quote by Isaac Newton's Newton when he
was explaining one of his theories he
said that it was because of a
preponderance of the evidence and that's
another way of saying the data supported
his hypothesis or his theory and
something else I think is really
fascinating that we should think about
as data scientists and modelers and
machine learning people today and this
is something I talk a lot about with my
colleague Nathan Cutts is this idea of
the difference between Kepler and Newton
so Kepler built a model of how things
work the way they work on these
elliptical planets this is kind of I
think of an attractor of how how the
world and how the the solar system works
in these elliptical orbits that theory
was useful but it wouldn't have allowed
us to to develop the Apollo program and
and put people on the moon okay and so
what Newton did was somehow a
generalization he distilled the abstract
physical principle that gave rise to
elliptic orbits but in a way that you
could tell you what would happen if you
left your elliptical orbit so what would
happen if you left or pushed on the
system out of the way that it always
behaves and we've always observed it and
his theory truly generalized F equals MA
generalized in a way that allowed us to
land people on the moon which is which
is really a huge achievement and so we
talk about this a lot a lot of machine
learning algorithms today most of them I
would say do what Kepler did they
describe the world as we observe it as
the data describes it and it takes this
epiphany this great leap to get a model
that truly generalizes like what Newton
did and so we should be aspiring to make
our algorithms go
you know from Kepler to Newton and that
that's a worthwhile goal it's also very
very challenging okay so data science
has been around for a long time there's
a really interesting modern book called
the fourth paradigm data-intensive
scientific discovery which basically
shows or describes this progression from
kind of theory and analytics
Mattox to experiments collecting data
from you know running experiments to
test hypotheses to simulations and
numerix and computations kind of the the
digital you know silicon age and now
this fourth paradigm of data-driven
inquiry and scientific discovery really
interesting and you know how this
complements this doesn't this doesn't
displace theory or numerix or
experiments it complements these
generate massive amounts of data and we
need a science that ties these together
okay just like simulations didn't
displace experiments they complement
each other okay so that's just a very
high-level overview I will point out for
those of you who are kind of more
interested in the nuts and bolts of
machine learning and modeling and kind
of the linear algebra and optimization
underlying these data science algorithms
I'll recommend a book that my colleague
Nathan cuts and I just wrote data-driven
science and engineering in Cambridge and
we have a website data book u-dub com
where we filmed up all of our lectures
for all of the chapters and sections so
for example you can go to our website
and find you know different topics
you're interested in and see our YouTube
videos so if you're interested hopefully
that's a resource to kind of get into
the more nitty gritty mathematical
aspects okay thank you
Посмотреть больше похожих видео
5.0 / 5 (0 votes)