How I Would Learn Data Science in 2022

Recall by Dataiku
9 Feb 202217:34

Summary

TLDRThe video script provides a practical guide on learning data science in 2022, emphasizing a breadth-first approach centered around project-based learning. It outlines essential topics including coding, statistics, data visualization, exploratory data analysis (EDA), machine learning, data scraping, APIs, databases, and deployment. The speaker recommends starting with Python due to its simplicity and rich data science libraries. The script also advises on learning SQL for database management and highlights the importance of domain knowledge and communication skills for a data scientist. It suggests using interactive platforms like Free Code Camp and resources like Kaggle for practical learning and emphasizes the evolving landscape of data science with automation of repetitive tasks, stressing the need for understanding algorithms and their application in specific contexts.

Takeaways

  • ๐Ÿ“ˆ **Practical Guide Focus**: The video emphasizes a practical approach to learning data science, focusing on effective learning methods and persistence.
  • ๐Ÿ” **Interdisciplinary Nature**: Data science involves coding, math, statistics, and business acumen, necessitating a breadth-first learning approach.
  • ๐Ÿ› ๏ธ **Project-Based Learning**: A project-based learning approach is recommended for its effectiveness in encoding information deeply and retaining knowledge.
  • ๐Ÿ **Python for Coding**: Python is suggested as the starting language for coding due to its simplicity, great documentation, and data science libraries.
  • ๐Ÿ“Š **Statistics Fundamentals**: Basic statistical knowledge is crucial, including mean, median, mode, standard deviation, and distributions.
  • ๐Ÿ“ˆ **Data Visualization**: Learning a visualization library like seaborn is important for graphically representing data insights.
  • ๐Ÿ”ฌ **Exploratory Data Analysis (EDA)**: EDA is introduced as a method to explore and familiarize oneself with data sets, looking for trends and patterns.
  • ๐Ÿ“š **Learning Timeline**: The video provides a suggested timeline for learning each topic, emphasizing the importance of starting with the basics and progressing to projects.
  • ๐Ÿค– **Machine Learning Algorithms**: Understanding common machine learning algorithms is key, with an intuitive grasp being more important initially than deep mathematical understanding.
  • ๐ŸŒ **Data Scraping and APIs**: As one progresses, learning to scrape data and work with APIs becomes essential for obtaining and manipulating data sets.
  • ๐Ÿ’ก **Domain Knowledge**: With automation on the rise, domain knowledge and the ability to communicate the impact of data science work becomes increasingly important.

Q & A

  • What is the main focus of the video regarding learning data science?

    -The main focus of the video is to provide a practical guide on how to effectively learn data science, emphasizing a breadth-first approach centered around project-based learning.

  • Why is project-based learning recommended for learning data science?

    -Project-based learning is recommended because it allows learners to apply theoretical knowledge in practice, which helps in deeper encoding of information into the brain and better retention of knowledge.

  • What is the recommended first step in learning data science according to the video?

    -The recommended first step is learning coding, specifically starting with Python, as it is a general-purpose language with great libraries for data science.

  • Why is a breadth-first approach preferred over a depth-first approach when learning data science?

    -A breadth-first approach is preferred because it helps learners avoid getting overwhelmed by the depth of each subject, allows them to start implementing what they learn sooner, and keeps the learning process engaging.

  • What are some of the key topics to cover when learning data science?

    -Key topics include programming, statistics, data visualization, exploratory data analysis (EDA), machine learning, data scripting, APIs, databases, and deployment, as well as specific niches like NLP and computer vision.

  • What is the significance of understanding the theory behind machine learning algorithms?

    -Understanding the theory behind machine learning algorithms is important for applying them effectively to specific use cases and ensuring they function properly in a given context.

  • Why is domain knowledge considered crucial for a data scientist?

    -Domain knowledge is crucial because it helps a data scientist understand the business context, communicate the value of their work, and ensure that their analyses and models provide real impact and are used by the organization.

  • What is the recommended timeline for learning the basics of coding in the context of data science?

    -The recommended timeline for learning the basics of coding is one to two weeks at four hours per day.

  • How does the video suggest approaching the learning of statistics for data science?

    -The video suggests brushing up on statistics with a focus on high school to first-year university stats, such as mean, median, mode, standard deviation, distributions, central limit theorem, and confidence intervals.

  • What is the role of accountability in the learning process as discussed in the video?

    -Accountability is built into the learning process to maximize the chances of not giving up, especially for those who may not have the strongest willpower and tend to give up easily.

  • How does the video suggest one should engage with existing projects to enhance their learning?

    -The video suggests taking someone else's project and working through it, understanding each line of code and the rationale behind it, rather than just copying code, to gain a practical understanding of how to approach a project.

Outlines

00:00

๐ŸŽ“ Mastering Data Science in 2022: A Practical Guide

The video provides a practical guide to learning data science in 2022, emphasizing effective learning strategies over the sheer volume of information. It covers the interdisciplinary nature of data science, involving coding, math, statistics, and business acumen. The presenter outlines a step-by-step approach, starting with a breadth-first, project-based learning method. This approach encourages learners to understand the minimum required theory before diving into practical projects. The video also discusses the importance of not giving up and the role of accountability in learning.

05:01

๐Ÿ‘ฉโ€๐Ÿ’ป Coding and Project-Based Learning: Starting Strong

The presenter suggests starting with coding, finding it more motivating to see immediate results. Python is recommended as the language of choice due to its simplicity and rich data science libraries. The basics of coding, including variables, functions, loops, and conditionals, are covered, along with the importance of learning data science modules like pandas and numpy. The video also touches on statistics, visualization, and the first project milestone, which involves exploratory data analysis (EDA). Timelines for learning these topics are provided, with an emphasis on the breadth-first approach and the integration of theory and practice.

10:02

๐Ÿ“ˆ Diving Deeper: Statistics, Visualization, and Machine Learning

The video moves on to more advanced topics, beginning with statistics, where a foundational understanding is crucial. It then transitions into data visualization, recommending the seaborne library for its intuitive interface and aesthetic appeal. The presenter highlights the importance of EDA for understanding data trends and patterns. Following this, the focus shifts to machine learning, introducing common algorithms and the importance of understanding their workings intuitively. The video also discusses the process of learning through other people's projects on platforms like Kaggle and emphasizes the practical aspects of machine learning, such as data preprocessing and model optimization.

15:03

๐Ÿ” Scraping, APIs, and Databases: Expanding Data Science Skills

The presenter covers data scraping and APIs, which are essential for sourcing data when pre-built datasets are unavailable. Beautiful Soup is recommended for web scraping, and the importance of learning SQL for database manipulation is emphasized. The video outlines the types of databases, including relational, NoSQL, and cloud databases. It also provides a timeline for learning these skills and suggests practical exercises like importing datasets into a personal database. The presenter encourages the use of interactive learning platforms and emphasizes the importance of project-based learning.

๐Ÿš€ Deployment, Niche Topics, and the Future of Data Science

The video discusses deployment, which involves putting machine learning models into a live environment, and explores niche areas such as natural language processing (NLP) and computer vision. The presenter provides resources for learning SQL, machine learning algorithms, and databases. They also stress the importance of understanding the automated aspects of data science and the growing significance of domain knowledge. The video concludes with the importance of communication and presenting findings in a business context, as well as the role of data scientists in ensuring their work provides value to the company.

Mindmap

Keywords

๐Ÿ’กData Science

Data Science is an interdisciplinary field that utilizes scientific methods, processes, and algorithms to extract knowledge and insights from structured and unstructured data. In the video, the speaker discusses how to effectively learn data science, emphasizing the importance of understanding both the theoretical and practical aspects of the field.

๐Ÿ’กProject-Based Learning

Project-Based Learning (PBL) is an educational approach where students gain knowledge and skills by working for an extended period of time to investigate and respond to an authentic, engaging, and complex question, problem, or challenge. The speaker advocates for a breadth-first approach centered around PBL, which facilitates deeper learning and understanding by applying concepts in real-world scenarios.

๐Ÿ’กPython

Python is a high-level, general-purpose programming language that is widely used in data science due to its simplicity and the availability of robust libraries for data manipulation and analysis. The video recommends starting with Python for coding in data science, highlighting its ease of understanding and extensive documentation.

๐Ÿ’กPandas and Numpy

Pandas and Numpy are two fundamental Python modules for data science. Pandas is used for data manipulation and analysis, while Numpy is a library that supports large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. The script mentions that understanding these modules is crucial for handling datasets in data science projects.

๐Ÿ’กStatistics

Statistics is a branch of mathematics dealing with data collection, analysis, interpretation, presentation, and organization. In the context of the video, the speaker suggests brushing up on basic statistical concepts such as mean, median, mode, standard deviation, and distributions, which are essential for understanding the nature of datasets in data science.

๐Ÿ’กData Visualization

Data Visualization is the graphical representation of information and data. It enables viewers to understand the significance of data by summarizing and consolidating the data in a visual format. The video mentions seaborne as a preferred visualization module, emphasizing the importance of visual representations for interpreting data insights.

๐Ÿ’กExploratory Data Analysis (EDA)

Exploratory Data Analysis is an approach to analyze data sets to summarize their main characteristics, often using visual methods. EDA provides an initial understanding of the data by discovering patterns, spotting anomalies, and testing the hypothesis. The speaker discusses EDA as a project that integrates coding, statistics, and visualization skills to explore and familiarize oneself with the dataset.

๐Ÿ’กMachine Learning

Machine Learning is a subset of artificial intelligence that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. The video outlines the importance of understanding common machine learning algorithms and their applications in predicting outcomes based on data, which is a significant part of data science.

๐Ÿ’กData Scraping and APIs

Data Scraping is the process of extracting data from websites, while APIs (Application Programming Interfaces) are protocols for building software applications. In the context of the video, the speaker discusses the necessity of learning how to scrape data from websites or use APIs to gather data for analysis, which is a critical skill when pre-built datasets are not available.

๐Ÿ’กDatabases

Databases are structured sets of data stored electronically. The video emphasizes the importance of understanding different types of databases, such as relational and NoSQL databases, and learning SQL, a domain-specific language used in programming and managing relational databases. SQL is particularly important for data professionals as it is a common requirement in job interviews.

๐Ÿ’กDeployment

Deployment in the context of data science refers to the process of putting a developed machine learning model into a live environment where it can be used to make predictions or decisions. The speaker touches on the importance of this step for moving from experimental models to operational models that can be integrated into other software and used in real-world applications.

๐Ÿ’กDomain Knowledge

Domain Knowledge refers to the specific knowledge and expertise in a particular field or industry. The video stresses that as data science tasks become more automated, domain knowledge becomes increasingly important for data scientists to understand the context and business reasons behind their analyses, ensuring their work provides real value to the organization.

Highlights

The video provides a practical guide on learning data science effectively, emphasizing not just what to learn but how to learn it.

Data science is described as an interdisciplinary field involving coding, math, statistics, and business acumen.

A breadth-first approach centered around project-based learning is recommended for learning data science.

The importance of understanding the difference between theory and practice in technical subjects is highlighted.

Python is suggested as the starting language for coding due to its simplicity and rich data science libraries.

Pandas and NumPy are identified as key modules for data manipulation in data science.

Statistics knowledge is crucial for understanding data sets, with a focus on concepts from high school to first-year university stats.

Visualization is emphasized as an important aspect of data science, with Seaborn recommended as an intuitive module.

Exploratory Data Analysis (EDA) is introduced as a foundational project type for beginners to familiarize with data sets.

A timeline is provided for learning the basics of coding, statistics, and visualization, suggesting 1-2 weeks for each.

The video encourages learning from other data scientists and building upon existing projects to enhance understanding.

Machine learning is discussed, with an emphasis on understanding common algorithms and their applications.

Data scraping and APIs are highlighted as essential skills when moving beyond pre-built data sets.

SQL is identified as a vital language to learn for database manipulation and is often a job requirement for data roles.

Deployment and niche areas like NLP and computer vision are considered advanced topics in data science.

Recommended resources for learning include interactive platforms, online courses, and practical project-based learning.

As data science tasks become automated, the importance of domain knowledge and effective communication of findings increases.

The presenter shares personal strategies for staying accountable and motivated during the learning process.

The landscape of data science is evolving, with a growing emphasis on domain expertise and the ability to apply AI tools effectively.

Transcripts

play00:00

welcome back to another recall by data

play00:02

iq video

play00:05

in this video i'm going to walk you

play00:06

through how i would learn data science

play00:08

in 2022. you've probably already seen a

play00:10

couple other videos on this topic before

play00:12

but what i'm going to be focusing on

play00:14

here is a very practical guide because

play00:16

from my experience the hardest part

play00:18

about learning data science is that you

play00:19

can't figure out what to learn but

play00:21

rather how to learn effectively and kind

play00:23

of like how to not give up essentially

play00:25

because data science is hard it's an

play00:27

interdisciplinary field that involves

play00:29

coding math and stats and business staff

play00:31

product sense first i'm going to outline

play00:33

the topics to cover and my step-by-step

play00:35

approach

play00:36

kind of like framework for how to learn

play00:38

then i'll go through the approximate

play00:40

timeline for each topic some recommended

play00:42

resources and finally ending with where

play00:44

i see data science heading and how you

play00:46

should adjust your learning plan to suit

play00:48

it so do stick to the end of the video

play00:49

because data science is a rapidly

play00:51

changing field and i think it's

play00:53

important to understand the landscape if

play00:54

you're genuinely interested in getting

play00:56

into the field throughout the video i'll

play00:58

also be pointing out where i very

play01:00

intentionally built in accountability

play01:02

which is basically how to maximize your

play01:04

chances to not give up because at least

play01:06

for me i don't have the strongest

play01:08

willpower and i tend to give up easily

play01:10

so if you can kind of relate to this

play01:12

maybe these tips and checkpoints will

play01:13

also help you as well okay topics to

play01:16

cover there's programming stats data

play01:18

visualization exploratory data analysis

play01:20

or eda machine learning data scripting

play01:24

apis databases deployment and specific

play01:27

niches like nlp and computer vision

play01:29

don't worry i'm just listing these here

play01:31

now but we're going to go through each

play01:32

of these later in the video and talk

play01:34

about why each topic is relevant and why

play01:36

i recommend learning them in this

play01:38

specific order but first i want to share

play01:40

with you what is called meta learning

play01:42

where how to learn the general approach

play01:44

i recommend is what is called a

play01:45

breadth-first approach centered around

play01:48

project-based learning basically what i

play01:50

mean by this is say we take the topics i

play01:52

listed before right breadth-first

play01:54

approach means that you should cover

play01:55

just enough for the minimum amount of

play01:57

theory for each topic before doing a

play01:59

project surrounding it then you can

play02:01

learn more about the topics and do a

play02:03

more complex project and you do this

play02:04

over and over still being learning more

play02:06

about each topic and expanding your

play02:08

skills this is called a breath first

play02:10

approach to learning and as opposed to a

play02:11

depth first approach where you would

play02:14

attempt to learn every single thing

play02:15

about a topic and then move on to the

play02:17

next topic and then again try to learn

play02:19

every single thing about that and after

play02:21

you learn each topic thoroughly then you

play02:22

would try to do the project the reason

play02:24

why i recommend this breadth-first

play02:26

approach centered around project based

play02:28

learning is for three major reasons the

play02:30

first reason is that technical subjects

play02:33

like coding math and stats etc are

play02:35

really different in theory and in

play02:37

practice if you try to learn how to code

play02:39

before you may have experienced

play02:40

something like you do a course on coding

play02:42

right and you're like okay makes sense i

play02:45

know how to code now and then when you

play02:47

actually sit down to code something

play02:48

yourself you're kind of like uh where do

play02:50

i even start the reason for this is

play02:52

because implementation is really a

play02:54

separate beast and the whole point of

play02:56

learning to code and data science is

play02:58

that you can actually implement and do

play03:00

cool projects right so you do want to

play03:02

know how to implement the second reason

play03:04

is that if you try to deeply learn each

play03:06

subject in turn you will be there

play03:08

learning until the end of days each

play03:10

subject of coding stats and machine

play03:12

learning is huge and you can really go

play03:14

down the rabbit hole and find yourself

play03:16

super overwhelmed not knowing what is

play03:18

actually relevant and important and at

play03:20

some point you're probably going to give

play03:21

up before you've been starting to use

play03:23

these things that you learned trust me i

play03:25

know this from experience and finally

play03:27

another plug for project-based learning

play03:29

studies have actually shown that project

play03:31

based learning is the best form of

play03:32

learning because by doing things and

play03:34

figuring things out yourself you're

play03:36

actually more deeply encoding that

play03:37

information into your brain and more

play03:39

likely to retain the information as

play03:41

opposed to just like kind of passively

play03:43

absorbing information if you're just

play03:46

watching someone else code for example

play03:47

so yes breadth first approach centered

play03:50

around project based learning hopefully

play03:52

i have convinced you alright let's now

play03:54

go through the topics in my opinion you

play03:56

should start with coding first and the

play03:58

reason why i recommend coding first is

play04:00

because it's a lot more motivating at

play04:02

least for me to be able to see the

play04:04

results of things that i do as opposed

play04:06

to starting with more theoretical topics

play04:07

like math and stats which of course is

play04:09

extremely important and you'll certainly

play04:11

get to them later but i find these

play04:13

topics more abstract and less engaging

play04:16

aka easier to get bored and give up for

play04:18

choice of language i would recommend

play04:19

starting with python the reason why i

play04:21

recommend starting with python is

play04:23

because it's a general purpose language

play04:25

that is super simple to understand has

play04:27

great documentation and also has great

play04:29

libraries for data science including

play04:31

machine learning so what to learn for

play04:33

coding you should know the basics

play04:34

including how to declare variable

play04:36

functions loops and if statements then

play04:38

you should get familiar with two

play04:39

specific data science modules pandas and

play04:42

numpy pandas is built on top of numpy

play04:45

and is like the data science module

play04:47

where you can manipulate your data sets

play04:49

and feed them into other more

play04:50

specialized libraries for data

play04:51

visualization and machine learning for

play04:53

example after you learn the basis of

play04:55

coding next i recommend learning we're

play04:57

brushing up on your stats and i'm not

play04:58

talking about like crazy stuff here like

play05:00

we're talking about high school to first

play05:03

year university stats mean median mode

play05:05

standard deviation distributions central

play05:08

limit theorem confidence intervals

play05:10

things like that this comes in really

play05:11

handy when you're understanding the

play05:13

nature of your data set now what's

play05:14

really cool is that because you know how

play05:15

to code now you can actually implement

play05:17

the stats on your data sets which again

play05:19

i think is a lot more fun because you

play05:20

can see the things that you do next up

play05:22

is visualizations there's a lot of

play05:24

different visualization modules out

play05:26

there but honestly if you learn one of

play05:28

them the rest are kind of just

play05:29

variations with different

play05:30

functionalities i personally like

play05:32

seaborne because it's really intuitive

play05:33

to use and the graphs are automatically

play05:36

really pretty as well at this point you

play05:37

should know the basics of coding stats

play05:39

and visualizations and you're ready now

play05:41

for your first project which is some

play05:43

exploratory data analysis or eda eda is

play05:47

just a fancy way of saying exploring

play05:48

your data set and familiarizing yourself

play05:50

with it by seeing if there's any trends

play05:52

patterns correlations between variables

play05:54

etc with the basis in coding stats and

play05:56

visualizations you're now well equipped

play05:58

to do eda by taking a data set playing

play06:01

around with it a bit and doing some

play06:02

stats like finding the mean distribution

play06:04

of variables and making some

play06:06

visualizations okay let's talk about

play06:07

timelines to get to this point in terms

play06:09

of timeline i would say coding should

play06:11

take you about one or two weeks at four

play06:13

hours per day so that should take you

play06:15

again one to two weeks maybe a little

play06:17

more maybe a little longer depending on

play06:18

how much stats that you remember and

play06:20

visualization should take you only about

play06:23

one to two hours to a day to get a hang

play06:25

of now you might be thinking this is

play06:27

probably a lot longer than i thought and

play06:29

that's okay because remember breadth

play06:31

first approach centered around

play06:33

project-based learning you don't have to

play06:35

know everything just enough the basics

play06:37

that you can start doing a project which

play06:39

will help you learn even faster so what

play06:41

exactly should you do for your first

play06:43

project well let me let you in on a

play06:45

secret so much of data science is

play06:47

learning from other data scientists and

play06:49

working on top of what others have built

play06:51

i find that the best projects to start

play06:52

with when you're new in the field is to

play06:54

take someone else's project and work

play06:56

through it for example you can start

play06:57

with the famous titanic data set on

play06:59

cable and pick one of the highly rated

play07:01

notebooks then if you're feeling daring

play07:03

you can add something onto it and take

play07:04

it a step further word of warning here

play07:06

is of course don't just go and copy code

play07:08

right like that clearly will not help

play07:10

you learn but if you understand what

play07:12

each line of code is doing and the

play07:14

rationale behind it you'll gain an

play07:15

understanding on how to approach a

play07:17

project then next time when you're doing

play07:19

another project you will know how to

play07:20

approach it honestly even now when i

play07:22

want to learn something that i'm not

play07:24

super familiar with i find the fastest

play07:26

way to learn is to start by doing a

play07:28

project that someone else has done and

play07:29

then applying it to my own project later

play07:31

so by now after working through a kaggle

play07:34

notebook or two you'll probably notice

play07:35

that for many kaggle notebooks after

play07:37

some initial exploration of the data

play07:39

they start jumping into machine learning

play07:41

for example some exploratory data

play07:43

analysis may show that the likelihood of

play07:45

survival when you're male is far lower

play07:47

than if you're female and also your

play07:49

class has to do with survival then the

play07:51

question becomes can you predict

play07:53

survival and the answer is yes with

play07:55

machine learning so now it's time to

play07:57

learn about machine learning there's

play07:59

around 10 to 15 common machine learning

play08:01

algorithms and there's a lot of ways of

play08:04

classifying them one example is dividing

play08:06

them into supervised learning

play08:07

unsupervised learning and reinforcement

play08:09

learning i recommend intuitively

play08:10

understanding how the algorithms work

play08:12

without worrying too much about the

play08:14

exact math behind it for example linear

play08:17

regression is the simplest machine

play08:19

learning model and intuitively how it

play08:21

works is that it tries to draw a

play08:23

straight line that minimizes the

play08:24

distance between each data point and

play08:27

that line and the model is the line you

play08:28

drew that can predict for example the

play08:30

probability of survival on the titanic

play08:33

given an age the good news is that most

play08:35

machine learning algorithms are actually

play08:37

quite intuitive and not super difficult

play08:39

to understand to learn the basics of the

play08:41

common machine learning algorithms i

play08:43

would say it should take you about like

play08:44

three to four weeks

play08:46

again assuming four hours per day

play08:47

definitely feel free to go deeper into

play08:49

the math if you are interested however

play08:51

depending on your math proficiency you

play08:53

may need to refresh your calculus and go

play08:56

deeper into statistics okay cool now you

play08:58

can continue working through the

play09:00

notebook of someone else's project and

play09:02

trying out the different machine

play09:03

learning algorithms it's also super

play09:05

useful here to understand the notebook

play09:07

author's reason for the data

play09:09

pre-processing that's being done the

play09:10

reason why certain machine learning

play09:12

algorithms are chosen and their pros and

play09:14

cons as well as how to optimize the

play09:16

models these are super practical things

play09:18

that are extremely important to doing

play09:19

machine learning so be sure to really

play09:21

understand the reasoning behind choices

play09:24

that are being made now we'll cover

play09:25

things up to machine learning and next

play09:27

up is data scraping slash apis this

play09:30

comes into play when you graduate out of

play09:32

using pre-built data sets especially if

play09:34

you want to do your own project it's

play09:36

actually really rare that you'll find

play09:37

kind of like just a nice data set laid

play09:40

out for you already the more likely

play09:42

situation you find yourself in is having

play09:43

to scrape the data yourself from

play09:45

websites or using apis which stand for

play09:48

application programming interface for

play09:50

scraping data a module i would recommend

play09:52

checking out is beautiful soup very

play09:54

useful and quite cute and whimsical too

play09:56

if i do say so myself it shouldn't take

play09:58

you more than a couple days to a week to

play10:00

have a good grasp for apis we're

play10:02

application programming interfaces

play10:04

they are software built by other people

play10:06

that you can use to get access to data

play10:08

amongst other functions but what is

play10:10

relevant here is that you can get data

play10:12

using apis to learn how to use an api it

play10:14

may take you some time to understand how

play10:16

to use it because it involves

play10:18

understanding how to use other people's

play10:20

software and this really has to do with

play10:22

how well documented the api is reading

play10:24

documentation is in itself a skill in

play10:27

both understanding how to read

play10:29

documentation as well as like developing

play10:32

the patients to read documentation again

play10:34

remember the approach that i guess i've

play10:36

already beaten into you at this point

play10:37

brad first approach project-based

play10:39

learning learn the minimum and do the

play10:41

project next up databases for databases

play10:45

what to learn here is understanding the

play10:46

different types of databases like

play10:48

relational databases nosql databases

play10:51

cloud databases etc a language that you

play10:54

may especially want to pick up here is

play10:56

sql it's a much easier language to learn

play10:58

compared to python and shouldn't take

play10:59

you more than a week or two to learn it

play11:01

well pro tip here is if you're

play11:02

interested in getting a job as a data

play11:04

scientist data analyst or data engineer

play11:07

almost all companies will ask you sql

play11:09

questions as part of the interview

play11:11

process in my opinion the minimum here

play11:13

to learn is relational databases and the

play11:15

language behind them which is sql

play11:17

especially if you're primarily learning

play11:18

data science to get a job timeline here

play11:21

is two weeks for the basics for database

play11:23

projects i recommend downloading some

play11:25

data sets like from kaggle for example

play11:27

and then importing that data into your

play11:29

own database this teaches you how to

play11:31

create a database create tables inside

play11:33

the database and manipulate the data

play11:36

okay we're almost done so for the next

play11:38

two topics deployment and specific

play11:40

niches i consider these more advanced

play11:43

topics deployment comes into play when

play11:44

you want to take the machine learning

play11:46

model you develop and put into a live

play11:48

environment instead of just having it in

play11:50

a notebook that you have you can deploy

play11:51

the model across different code

play11:52

environments and also integrate them

play11:54

into other software then if you're

play11:56

interested in a specific field of data

play11:57

science you can also explore niches like

play12:00

natural language processing or nlp which

play12:02

has to do with developing algorithms

play12:04

that understand human languages known as

play12:06

natural languages it's really a very

play12:08

cool interdisciplinary field there's

play12:10

also niches like computer vision that

play12:12

has applications in self-driving cars

play12:14

for example it's kind of hard for me to

play12:16

give you a timeline on these niches

play12:18

because theoretically you can easily do

play12:20

a project in nlp for example with the

play12:22

skills you learned so far by employing

play12:24

modules that other people have developed

play12:26

which abstract away a lot of the

play12:27

underlying concepts and this will take

play12:30

you like a few hours to a few days to

play12:32

learn but if you're interested in these

play12:33

niche topics i would also assume you

play12:35

would want to understand more of the

play12:37

theory behind it and i mean there are

play12:38

people who have phds in the field so in

play12:40

terms of timeline really depends on how

play12:42

far you want to go now let's talk about

play12:44

some recommended resources i personally

play12:46

prefer interactive interfaces to learn

play12:48

coding like free code camp for example

play12:50

because you can see what it is that you

play12:51

were coding for basic statistics and

play12:53

theory and math behind machine learning

play12:55

algorithms the top resources i would

play12:57

recommend are stat quests by josh summer

play12:59

and data aiku's own guides both of which

play13:02

are free for projects to follow i

play13:04

already mentioned it before but kaggle

play13:06

is great because there's notebooks where

play13:07

you can see how people approach projects

play13:09

from different perspectives a great free

play13:11

resource to learn sql is moat which is

play13:13

what i personally use to learn sql from

play13:16

scratch and pass my own data science

play13:17

interview to learn more about databases

play13:19

in general there's also great moocs

play13:21

available finally for deployment and

play13:23

more niche topics i would personally go

play13:25

with highly rated courses from moocs and

play13:27

again rely heavily on working on my own

play13:29

projects because at this point you

play13:31

should already be quite proficient in

play13:33

the basics so it's more about building

play13:35

on top of them and doing specific

play13:37

projects that interest you honestly

play13:39

there are so many amazing free and low

play13:41

cost options for learning data science

play13:43

out there and i just listed a food that

play13:45

i personally used and liked my

play13:48

preference is to choose resources that

play13:50

are interactive and already have

play13:52

projects built into them i get it though

play13:53

if you prefer learning from online video

play13:55

courses or books for example and that's

play13:57

totally fine my only recommendation is

play13:59

that you should also intentionally work

play14:00

through project so you can learn to

play14:02

implement and in summary if you want

play14:04

like the most simplistic guide possible

play14:06

for how to choose a good resource and if

play14:08

you're willing to spend a little money

play14:10

you just absolutely cannot go wrong with

play14:12

choosing a highly rated course on the

play14:14

topic on a mooc platform there are many

play14:17

many courses to cover each of these

play14:19

topics that we discussed now finally

play14:21

let's talk a little bit about how the

play14:22

landscape of data science is progressing

play14:25

as the data science field develops and

play14:27

becomes more mature a lot of repetitive

play14:29

tasks in data science like data cleaning

play14:32

pre-processing exploratory data analysis

play14:35

machine learning and even deployment are

play14:36

becoming automated in fact data iq does

play14:39

just this data iq is a platform for

play14:42

everyday ai that systemizes the use of

play14:44

data for business results by using

play14:47

dataiku you're able to create share and

play14:49

reuse applications that leverage data

play14:51

and machine learning to extend and

play14:53

automate decision making data iq also

play14:55

allows you to scale ai safely and

play14:57

effectively and deliver advanced

play14:58

analytics using the latest techniques at

play15:01

big data skills data iq is really

play15:03

powerful and you should check out more

play15:05

about the platform if you're interested

play15:06

link in the descriptions below but wait

play15:08

a second

play15:09

if you've been paying attention

play15:11

you are probably thinking right now why

play15:13

should i learn all the things we just

play15:15

talked about earlier if it's becoming

play15:17

automated there's actually still very

play15:18

good reason to do so first it's still

play15:20

important for you to understand how

play15:22

things work so you can understand how to

play15:23

apply analyses and algorithms to

play15:25

specific use cases and learn how to best

play15:27

leverage these tools available because

play15:30

after all they still are tools even if

play15:32

they're automating things and we need to

play15:34

make sure that they're doing what

play15:35

they're supposed to be doing another

play15:36

implication of so much of the data

play15:39

science and machine learning pipelines

play15:40

being automated is that it's become more

play15:43

and more important for a data scientist

play15:45

to have domain knowledge which by the

play15:47

way is the third pillar of data science

play15:50

that we haven't really discussed until

play15:51

now domain knowledge or business product

play15:54

sets this is just as important as the

play15:56

coding and the stats coding and stats

play15:59

and the machine learning algorithms and

play16:00

all the other technical stuff is only as

play16:02

valuable as how much value it can

play16:04

provide to the company so even if you

play16:06

make the fanciest and best algorithm

play16:08

ever honestly nobody actually would care

play16:11

if it doesn't provide value to the

play16:13

company so it's very much the data

play16:15

scientists job to understand the

play16:17

business reason for doing an analysis

play16:19

where building a model to make sure that

play16:21

what they're doing also has real impact

play16:23

in the organization it is also a data

play16:25

scientist job to communicate the value

play16:27

of what they're doing and make sure what

play16:28

they do is actually going to be used for

play16:30

those of you who are not in industry you

play16:32

might think that this is kind of weird

play16:34

right it's like of course it has so much

play16:36

value and impact but believe me in

play16:38

practice it's actually really crucial

play16:40

and it's not just a given because if

play16:42

decision makers don't understand why

play16:44

your analysis or your model is useful

play16:46

then they don't want to use it right and

play16:49

even if you make the best model ever and

play16:50

it has a lot of impact your effort would

play16:53

be for nothing if it's not being used so

play16:55

to summarize this section since many

play16:57

repetitive data science tasks are being

play16:59

automated it's important to one

play17:02

understand how the algorithms work and

play17:04

make sure that they're functioning

play17:05

properly in your given context and two

play17:08

focus on gaining domain knowledge and

play17:10

learn how to communicate and present

play17:11

your findings and the impact of your

play17:13

work in the business context alright

play17:15

that's all i have for you today i've

play17:17

linked all the resources i've talked

play17:19

about in the description below

play17:21

do also share your thoughts on this

play17:22

guide on how to learn data science i

play17:25

will see you guys in the next video

Rate This
โ˜…
โ˜…
โ˜…
โ˜…
โ˜…

5.0 / 5 (0 votes)

Related Tags
Data ScienceLearning GuideProject-BasedPythonStatisticsMachine LearningData VisualizationExploratory Data AnalysisAPIsDatabasesDeploymentNLPComputer VisionDomain KnowledgeBusiness ImpactAutomated AIDataikuData CleaningPre-processingStatistics RefresherPandasNumPySeabornSQLKaggleInteractive LearningMOOCsFree ResourcesData Science Trends