Machine Learning Tutorial Python - 8: Logistic Regression (Binary Classification)

codebasics
7 Sept 201819:19

Summary

TLDRThis tutorial introduces logistic regression as a technique for solving classification problems, where the prediction is categorical rather than continuous as in linear regression. The video explains the concept of binary and multi-class classification, using the example of predicting customer insurance purchases based on age. It demonstrates how to visualize data with a scatter plot and how linear regression might be inappropriate for certain datasets. The presenter then introduces the sigmoid function, which logistic regression uses to model the probability of a certain class. The tutorial continues with a practical example using a dataset, showing how to perform a train-test split, train a logistic regression model, make predictions, and evaluate the model's accuracy. Finally, the video concludes with an exercise for viewers to apply logistic regression to an HR Analytics dataset to predict employee retention.

Takeaways

  • 📈 The tutorial aims to solve a simple classification problem using logistic regression, which is different from linear regression that predicts continuous values.
  • 🔍 Classification problems predict categorical outcomes, such as yes/no or choosing among multiple categories.
  • 📊 Binary classification involves predicting an outcome with only two categories, while multi-class classification deals with more than two categories.
  • 📉 The script demonstrates using a scatter plot to visualize data distribution, which helps in identifying patterns in the data before applying logistic regression.
  • đŸ€– Logistic regression models use a sigmoid function to transform linear equation outputs into a probability range between 0 and 1.
  • 🧼 The sigmoid function has an S-shaped curve, mathematically represented as 1 / (1 + e^(-z)), where 'e' is Euler's number.
  • 📝 The tutorial covers how to implement logistic regression using the scikit-learn library in Python, abstracting the complex mathematics.
  • ⏭ The process includes data splitting into training and test sets, model training with the training set, and making predictions with the test set.
  • 💯 The accuracy of the logistic regression model is evaluated using the test set, with a score close to 1 indicating a high accuracy for the given dataset.
  • đŸ€“ The script suggests exploring Kaggle for various datasets to practice building logistic regression models and solving real-world problems.
  • 📚 The exercise at the end of the tutorial challenges learners to apply logistic regression to an HR Analytics dataset to predict employee retention.
  • 🔧 The exercise involves exploratory data analysis, plotting bar charts for salary and department impact, building a logistic regression model, making predictions, and measuring model accuracy.

Q & A

  • What is the main goal of the tutorial?

    -The main goal of the tutorial is to solve a simple classification problem using logistic regression.

  • What is the difference between linear regression and classification problems?

    -Linear regression is used to predict continuous values, such as home prices or stock prices, while classification problems predict categorical values, such as yes/no or selecting one category from multiple options.

  • What are the two types of classification problems mentioned in the script?

    -The two types of classification problems mentioned are binary classification, which involves predicting a simple yes or no outcome, and multi-class classification, which involves predicting one category from more than two available options.

  • How does logistic regression differ from linear regression in terms of the output it provides?

    -Logistic regression provides an output that is a probability ranging between 0 and 1, which can be used to classify the prediction into categories, whereas linear regression provides a continuous output that can be any number.

  • What is the sigmoid function and how is it used in logistic regression?

    -The sigmoid function is a mathematical function that takes any input and transforms it into a value between 0 and 1. It is used in logistic regression to convert the linear equation's output into a probability score that can be used for classification.

  • What is the purpose of splitting the dataset into a training set and a test set?

    -The purpose of splitting the dataset is to use the majority of the data (training set) to train the model and a smaller portion (test set) to evaluate its performance and ensure that it generalizes well to new, unseen data.

  • How does the logistic regression model make predictions?

    -The logistic regression model makes predictions by applying the sigmoid function to a linear equation derived from the training data. The output of the sigmoid function is then used to classify the prediction into one of the categories.

  • What is the significance of the score returned by the logistic regression model?

    -The score returned by the logistic regression model represents the accuracy of the model. It is a measure of how well the model's predictions match the actual outcomes in the test set.

  • How can the logistic regression model predict the probability of an event occurring?

    -The logistic regression model can predict the probability of an event occurring by applying the sigmoid function to the linear equation's output. The resulting probability score indicates the likelihood of the event.

  • What is the purpose of exploratory data analysis in the context of the HR Analytics dataset?

    -The purpose of exploratory data analysis is to identify patterns and relationships within the data that can help understand factors affecting employee retention or attrition. This can inform the development of a logistic regression model to predict employee retention.

  • What are the steps involved in building a logistic regression model for the HR Analytics dataset?

    -The steps involved include: 1) Exploratory data analysis to identify key factors affecting employee retention, 2) Plotting bar charts to visualize the impact of factors like salary and department on retention, 3) Building a logistic regression model using the identified factors, 4) Making predictions with the model, and 5) Measuring the model's accuracy.

Outlines

00:00

📊 Introduction to Logistic Regression for Classification Problems

The video begins by contrasting logistic regression with linear regression. While linear regression is used for predicting continuous values, logistic regression is introduced as a method for solving classification problems, which involve predicting categorical outcomes. The tutorial aims to address binary classification, where the outcome is a simple yes or no, and multi-class classification, where there are more than two categories to predict. An example scenario is given where a data scientist is tasked with predicting whether a potential customer will buy life insurance based on their age. The importance of plotting data and observing patterns is emphasized before introducing logistic regression as the solution for such predictive tasks.

05:01

📈 Fitting a Logistic Regression Model to Insurance Data

The paragraph explains the process of using logistic regression to model the likelihood of a customer buying insurance based on their age. It discusses the limitations of linear regression in classification tasks and introduces the sigmoid function as a method to transform a linear equation's output into a probability value between 0 and 1. The sigmoid function is mathematically defined, and its S-shaped curve is described. The video then demonstrates how to implement logistic regression using a library like scikit-learn, abstracting away the complex mathematics. The process includes loading data, plotting a scatter plot to visualize the data distribution, and splitting the dataset into training and test sets. The logistic regression model is trained using the training set, and its accuracy is evaluated using the test set.

10:06

đŸ€– Training the Logistic Regression Model and Making Predictions

This section details the steps to train a logistic regression model using the training data and then make predictions on the test data. The model's predictions are binary, indicating whether the customer will buy insurance (1) or not (0). The accuracy of the model is assessed by comparing its predictions to the actual outcomes, and the model's scoreæŽ„èż‘ 1, indicating near-perfect accuracy. However, the presenter notes that this high score is partly due to the small dataset size. The paragraph also covers how to predict the probability of an outcome using the model, which provides a more nuanced understanding of the prediction's certainty.

15:08

📚 Exercise: Applying Logistic Regression to HR Analytics

The final paragraph transitions into an exercise where viewers are encouraged to apply logistic regression to a real-world dataset focusing on employee retention rates. The task involves exploratory data analysis to identify factors affecting employee retention, plotting bar charts to visualize the impact of salary and department on retention, and building a logistic regression model to predict employee attrition. The exercise aims to help HR departments focus on specific areas to improve employee retention. The video concludes with a prompt for viewers to share their findings in the comments and to attempt the exercise independently before consulting the provided answers.

Mindmap

Keywords

💡Logistic Regression

Logistic regression is a statistical method used for binary classification problems, which involves predicting categorical outcomes. In the script, logistic regression is introduced as a suitable technique for predicting outcomes like buying insurance based on age, differing from linear regression which predicts continuous values. The tutorial details how logistic regression uses a sigmoid function to map predicted values into a binary format, making it ideal for scenarios where the outcomes are either 'yes' or 'no'.

💡Binary Classification

Binary classification is a type of classification problem where there are only two possible outcomes. The script refers to binary classification in the context of determining whether a customer will buy insurance or not, with outcomes being 'yes' or 'no'. This concept is foundational in understanding logistic regression, which the tutorial employs to handle such binary outcomes.

💡Sigmoid Function

The sigmoid function is a mathematical function that outputs a value between 0 and 1, which is crucial in logistic regression to convert linear equation outputs into probabilities. In the script, the sigmoid function is described in the logistic regression section where it's used to compress the output of a linear equation (Z), facilitating the model's ability to classify binary outcomes effectively.

💡Linear Regression

Linear regression is a predictive modeling technique for estimating a continuous dependent variable based on one or more independent variables. The script discusses linear regression in the context of predicting home prices and contrasts it with logistic regression, pointing out its inadequacy for categorical outcomes like those in binary classification problems.

💡Outliers

Outliers are data points that differ significantly from other observations. In the tutorial, outliers are mentioned in the context of fitting a logistic regression model, where they can adversely affect the accuracy of a linear regression line when predicting categorical outcomes, demonstrating the need for logistic regression in certain cases.

💡Model Training

Model training in machine learning involves developing a model based on provided data so that it can make accurate predictions. In the script, training a logistic regression model involves using historical data (age and insurance buying behavior) to fit the model, which is then used to predict whether new customers will buy insurance.

💡Test Size

Test size refers to the proportion of the data set aside for testing the machine learning model's accuracy after training. The script details using a split of 90% training data and 10% test data, demonstrating how to divide data to evaluate the effectiveness of a logistic regression model in predicting new instances.

💡Predictive Model

A predictive model is an algorithm that uses historical data to predict outcomes. The script frequently mentions building and using predictive models, specifically logistic regression, to estimate whether individuals of certain ages are likely to purchase insurance based on past data.

💡Data Set

A data set in the context of the video script is a collection of data used for training and testing a machine learning model. The script provides an example of a data set containing ages and insurance purchasing records, which is used to train and test the logistic regression model.

💡Accuracy

Accuracy in the context of machine learning models refers to the proportion of correct predictions made by the model. In the script, the accuracy of the logistic regression model is assessed after training and testing, with the script highlighting a high accuracy score due to the small size of the data set, which might not be indicative of performance with larger, more complex data sets.

Highlights

The tutorial aims to solve a simple classification problem using logistic regression, contrasting with linear regression which predicts continuous values.

Classification problems predict categorical outcomes, such as yes/no or choosing among multiple options.

Binary classification predicts a simple yes or no outcome, while multi-class classification involves more than two categories.

Logistic regression is introduced as a technique to solve classification problems, different from linear regression.

The tutorial provides a real-world example of predicting customer insurance purchase likelihood based on age.

A scatter plot is used to visualize data distribution and identify patterns before applying logistic regression.

Linear regression is shown to be inadequate for classification problems with non-linear data.

The sigmoid or logit function is explained as a mathematical tool that logistic regression uses to model probabilities.

The sigmoid function maps any input to a range between 0 and 1, creating an S-shaped curve.

Logistic regression combines a linear equation with a sigmoid function to predict the likelihood of a categorical outcome.

The tutorial demonstrates using the scikit-learn library to implement logistic regression without manually coding the mathematics.

Data is split into training and test sets using the train_test_split method for model evaluation.

The logistic regression model is trained using the training data and then used to make predictions on the test data.

The model's accuracy is assessed using the test data and the score method, which returns a value between 0 and 1.

The tutorial also covers predicting the probability of an outcome using the logistic regression model.

An exercise is provided to apply logistic regression to an HR Analytics dataset for predicting employee retention.

The exercise encourages exploratory data analysis to identify factors affecting employee retention.

Participants are guided to build a logistic regression model, make predictions, and measure the model's accuracy.

The tutorial concludes with a call to action for viewers to attempt the exercise and think critically about the solutions.

Transcripts

play00:00

the goal of this tutorial is to solve a

play00:02

simple classification problem using

play00:04

logistic regression if you followed my

play00:08

previous tutorial we have learnt a lot

play00:11

about linear regression especially the

play00:14

home prices example linear regression

play00:17

can be used to predict other things such

play00:20

as weather and stock prices and in all

play00:22

these examples the prediction value is

play00:26

continuous there are other type of

play00:29

problems such as predicting weather

play00:31

email spam or not whether the customer

play00:35

will buy the life insurance product or

play00:38

person is going to vote for which party

play00:41

all these problems if you think about it

play00:44

the prediction value is categorical

play00:47

because the thing that you are trying to

play00:50

predict is one of the available

play00:52

categories in the first two example it

play00:54

is simple yes or no answer in the third

play00:57

examples it is one of the available

play01:00

categories whereas in case of linear

play01:02

regression the home prices example we

play01:05

saw that the predicted value could be

play01:07

any number it is not one of the defined

play01:12

categories okay hence this second type

play01:15

of problems is called classification

play01:18

problem and logistic regression is a

play01:21

technique that is used to solve these

play01:24

classification problems now in the

play01:28

classification examples that we saw

play01:30

there are two types so the first example

play01:33

was predicting whether customer will buy

play01:36

insurance or not here the outcome is

play01:39

simple yes or no this is called binary

play01:42

classification on the other hand when

play01:46

you have more than two categories that

play01:49

example is called multi class

play01:51

classification

play01:53

let's say you are working as a data

play01:56

scientist in a life insurance company

play01:59

and your boss gives you a task of

play02:01

predicting how likely a potential

play02:04

customer is to buy your insurance

play02:06

product and what you are seeing here is

play02:10

the available data

play02:12

and based on the age the information you

play02:15

have is whether customer bought the

play02:17

insurance or not now here you can see

play02:20

some patterns such as young people don't

play02:24

buy the insurance too much you can see

play02:26

like there are persons with 20 to 25

play02:29

these kind of ages where zero means they

play02:32

didn't buy the insurance whereas as the

play02:35

persons age increases he's more likely

play02:38

to buy the insurance so you know the

play02:40

relationship and you want to build a

play02:43

machine learning model that can do a

play02:46

prediction based on the age of a

play02:49

potential customer so as a data

play02:51

scientist now this is the job you have

play02:54

been given now the first thing you would

play02:56

do when you have this data is you will

play02:58

plot a scatter plot which looks like

play03:00

this when you have walked on linear

play03:03

regression problems already the first

play03:06

temptation you have in your mind is you

play03:08

start using linear regression so when

play03:11

you draw our linear equation line using

play03:15

the linear regression it will look

play03:17

something like this now how did we come

play03:19

up with this line for that you can

play03:21

follow my previous linear regression

play03:23

tutorials if you think about it what I

play03:26

can do here is I can predict the value

play03:29

using a linear equation line and say

play03:31

that if my predictive value is more than

play03:34

0.5 so here this is 0.5 if it is more

play03:39

than 0.5 then I will say ok customer is

play03:41

likely to buy the insurance if it is

play03:44

less than that then he is not going to

play03:46

buy the insurance so anything on the

play03:50

right hand side is yes anything on the

play03:54

left hand side is no now of course we

play03:56

have these outliers but we don't care

play03:59

about them too much because for 90% of

play04:02

the cases of our linear regression will

play04:06

work ok now imagine you have a data

play04:11

point which is far on the right-hand

play04:13

side here so say a customer whose age is

play04:17

more than 80 years let's say he bought

play04:19

your insurance ok then your scatterplot

play04:22

will look like this and your linear

play04:25

equation

play04:26

might look like this in this case what

play04:29

will happen is when I draw a separation

play04:32

between between the two sections using

play04:35

0.5 value here the problem arises with

play04:40

these data points actually the answer

play04:43

was yes here

play04:45

but my question predicted them to be no

play04:49

so you can see that this is pretty bad

play04:53

when you use linear regression for data

play04:57

a class like this now here is the most

play05:00

interesting part imagine you can draw a

play05:04

line like this this is much better fit

play05:09

compared to the previous linear equation

play05:12

that we had okay and here when you draw

play05:17

a separation between Z using 0.5 value

play05:20

you can clearly say that this model

play05:24

works much better than the previous one

play05:28

the question arises what is this line

play05:31

exactly and how do you come up with this

play05:34

right if you have learnt statistics you

play05:37

might have heard about sigmoid or logit

play05:40

function and that's what this is okay

play05:43

now

play05:44

the moment you hear this term sigmoid

play05:47

you might pause this video and start

play05:50

googling about sigmoid and it is fine

play05:53

you can read all the articles about

play05:55

Sigma or logic logit function to get

play05:58

your understanding correct on

play06:00

mathematics behind it

play06:01

but if you don't want to do it I will

play06:03

give you a basic idea the Sigma

play06:06

functions equation is 1 divided by 1

play06:09

plus e raised to minus Z where e is some

play06:14

mathematical constant is called Europe

play06:16

Euler's numbers the value is this now

play06:19

think about this equation for a moment

play06:21

what we are doing here is we are

play06:24

dividing by 1 by a number which is

play06:28

slightly greater than 1 and when you

play06:31

have this situation the outcome will be

play06:34

less than 1 correct

play06:36

so all you are doing with this

play06:40

my function is coming up with a range

play06:43

which is between zero to one so if you

play06:47

feed set of numbers to the sigmoid

play06:50

functions all it will do is convert them

play06:53

to zero to one range and the equation

play06:57

that you will get it looks like s-shape

play07:00

right so if you plot a 2d plot 2d chart

play07:03

it will look like a shape function that

play07:07

we saw in the previous slide essentially

play07:12

what we are doing with logistic

play07:14

regression is we have a line like this

play07:17

which is linear equation and you know

play07:20

the equation for our linear line which

play07:24

is MX plus B all you're doing is you are

play07:29

feeding this line into a sigmoid

play07:31

function and when you do that you

play07:35

convert this line into this a shape ok

play07:39

so here you can see that my Z I replace

play07:42

with MX plus B so I applied sigmoid

play07:47

function on top of my linear equation

play07:50

and that's how I got my has shaped line

play07:53

here all right now all of this math is

play07:56

just for your understanding as a next

play07:59

step we are going to write logistic

play08:01

regression using SQL on library and

play08:05

these details are abstracted for you so

play08:08

don't worry about it you don't have to

play08:09

implement all of this mathematics you

play08:11

will just make a one simple call and it

play08:14

will work magically for you all right so

play08:17

let's get straight into writing the code

play08:19

here is the CSV file containing the

play08:22

insurance data you can see there are two

play08:25

columns age and whether that person

play08:27

bought the insurance or not and we are

play08:31

going to import this into our panda's

play08:33

data frame so I have loaded my Jupiter

play08:35

notebook by running Jupiter notebook

play08:39

come on on my command line imported

play08:43

couple of important libraries and then I

play08:46

imported the same CSV file into my data

play08:50

frame which looks like this and now I'm

play08:52

going to plot

play08:53

a scatterplot just to see the data

play08:56

distribution and you can see that I get

play08:58

a plot like this here these are the

play09:01

customers who didn't buy the insurance

play09:02

these are the ones who bought the

play09:05

insurance and you can see that if the

play09:08

person is younger he's less likely to

play09:10

buy the insurance and as the person gets

play09:13

older he is more likely to buy the

play09:15

insurance the first thing now we are

play09:18

going to do is use trained taste split

play09:22

method to split our data set so if you

play09:25

look at our data we have 27 rows so we

play09:31

are going to split these rows into

play09:33

training set and test set again I have a

play09:39

separate tutorial for how to do train

play09:41

and test split so you can watch that it

play09:45

is basically from Escalon model

play09:48

selection you import train taste split

play09:54

method here my X is DF age now I am

play10:06

doing doing two brackets because the

play10:09

first parameter is X which has to be

play10:12

multi-dimensional array so I'm using I'm

play10:15

just trying to derive a data frame here

play10:17

and what insurance is why and I will say

play10:25

what is my taste size if you want to see

play10:29

the arguments you can do shift tab and

play10:33

it will show you a help for this

play10:36

function so I used this a lot

play10:38

it is pretty useful so let's see so

play10:43

there is this taste underscore size

play10:45

parameter so let's use taste and just

play10:48

core size we are going to do or less

play10:54

less a train size right so training size

play10:57

is 0.9 so 90% of the example we are

play11:01

using for training and 10% we will use

play11:04

for actually testing over model

play11:07

now what do you get back as a result so

play11:11

these are the things you get back as it

play11:13

is all so I'm just going to copy from

play11:14

here and that's it hit ctrl enter to run

play11:21

it okay so here there's some warning

play11:25

maybe they are asking us to use test

play11:27

size doesn't matter okay

play11:31

let's look at our test so what test is

play11:37

18 23 and 40 so these are the three

play11:41

values we are going to perform or taste

play11:43

on when you look at our X train these

play11:50

are these are the data samples we will

play11:52

use to train our model all right so

play11:55

let's now import logistic regression so

play12:01

from same linear model you can import

play12:06

logistic regression logistic education

play12:18

eske lon alright so we even have

play12:22

logistic regression class in ported and

play12:24

we are going to create an object of this

play12:27

class we'll call it a model and that

play12:29

model now we'll do a training remember

play12:34

in SQL and whenever you are using this

play12:36

method fit you are actually doing your

play12:39

training for your model so X train

play12:45

invite train this is what you use for

play12:48

your training when you execute this this

play12:51

means your your model is trained now and

play12:54

it is ready to make predictions so what

play12:59

these three values we are making a

play13:01

prediction so I will do model dot

play13:04

product and X paste so here what it is

play13:11

saying is 0 0 1 which means first two

play13:16

samples it is saying these two customers

play13:20

are not going to buy your insurance and

play13:23

you can see that it's kind of working

play13:25

because they have age of 18 and 23 year

play13:29

old and we saw that as the ages the

play13:32

younger age people do not buy the

play13:35

insurance whereas I think anything more

play13:37

than 27 28 to buy so here the age is 40

play13:41

so the answer was 1 okay if you want to

play13:48

look at the score score is nothing but

play13:53

it is showing the accuracy of your model

play13:56

right so what you're doing is you're

play14:02

giving X tests invitees and here the

play14:05

score is 1 which means our model is

play14:07

perfect now this is happening because

play14:09

our data size is smaller we have only 27

play14:12

samples but if you have more wider

play14:15

samples then it will make mistakes in at

play14:18

least few samples so your score will be

play14:21

less than 1 right because of the small

play14:25

size of our data set the score is pretty

play14:27

high here

play14:28

another method to try is you can see

play14:33

that benefit by the way tab it will show

play14:36

you all the possible functions that

play14:38

start with ready okay so here you can

play14:42

also predict a probability so when you

play14:46

predict a probability of X test it will

play14:49

show you a probability of your data

play14:51

sample being in one class versus the

play14:53

other the first class here is if

play14:57

customer will not buy the insurance so

play15:00

for the age 18 and 23 you can see this

play15:05

point six percent probability that they

play15:07

will not buy the insurance whereas for

play15:10

the person with age forty it is reverse

play15:13

there is 0.6 percent poverty that he

play15:16

will buy the insurance and point thirty

play15:18

nine percent probability that he will

play15:19

not point thirty nine percent really

play15:22

it's really thirty nine percent that he

play15:25

will not buy the insurance if you want

play15:29

to do one off then you can just do model

play15:32

predict this six you will buy the

play15:36

insurance that's why you had one and if

play15:38

you had something like twenty five he

play15:40

will not buy the insurance that's why

play15:41

you get zero so this model that we built

play15:44

is working back pretty well with

play15:47

logistic regression that's all I had and

play15:49

now is the time for exercise so if you

play15:53

know about Cagle of website this is the

play15:55

website that hosts different coding

play15:58

competitions and it has one of the more

play16:01

important feature which is the data sets

play16:03

so if you go to this data set section

play16:05

you can download various data sets based

play16:09

on the type based on the file type or

play16:13

you can even search for data set so if

play16:15

you want to do some Titanic Titanic data

play16:20

analysis you can search for that

play16:24

basically you can just explore these

play16:26

data sets for exercises from this I have

play16:31

downloaded this HR Analytics data set

play16:34

where there is an analysis on the

play16:38

employee retention rate or

play16:40

employee attrition rate if I open that

play16:43

CSV file here it looks like this where

play16:47

based on the satisfaction level there's

play16:49

no number of projects or average monthly

play16:51

hours that person has worked on you are

play16:55

trying to establish the correlation

play16:57

between those factors and whether person

play16:59

would leave the form or whether he would

play17:01

continue with the form these kind of

play17:05

analytics are very important for HR

play17:07

department because they want to retain

play17:10

the employees and if you can build a

play17:12

machine learning model for HR department

play17:14

then they can focus on specific areas so

play17:18

that employees don't leave at the farm

play17:21

so that's what you're going to do you

play17:23

are a data scientist you're going to

play17:25

work for your HR department and give

play17:28

them a couple of things so I have

play17:30

mentioned all of those things in the

play17:32

Jupiter notebook which I have available

play17:35

in the video description below so if you

play17:38

open that notebook you will see all the

play17:40

code that we just went through in this

play17:43

tutorial and at the end you will find

play17:45

this exercise section ok so there is a

play17:49

link here you download the data set if

play17:52

you don't want to download it the same

play17:54

level as this notebook there is an

play17:57

exercise folder so download the CSV from

play18:00

that and you're going to give answer on

play18:03

these five questions ok first one is out

play18:07

of all these parameters that we have you

play18:10

want to find out which factors affect

play18:14

the employee retention by doing some

play18:17

exploratory data analysis you will also

play18:20

plot bar charts showing the impact of

play18:24

employees salaries and retention also

play18:27

plot the bar chart showing the impact of

play18:30

department and employee retention and

play18:33

then using the factors that you figured

play18:36

in step one you will build a logistic

play18:39

regression model and using the model you

play18:43

are going to do some prediction in the

play18:46

end you will measure the accuracy of the

play18:48

model let's do that exercise in the

play18:51

comments below let

play18:52

know your answers and if you want to

play18:55

verify the answers then I have a

play18:57

separate notebook at the same level in

play19:01

exercise folder which has all the

play19:03

answers but don't look at the answers

play19:05

directly okay a good student is someone

play19:09

who tries to find the solution on his

play19:12

own and then he looks at the answer all

play19:14

right that's all we had thank you very

play19:17

much for watching I'll see you next

Rate This
★
★
★
★
★

5.0 / 5 (0 votes)

Étiquettes Connexes
Logistic RegressionBinary ClassificationMachine LearningData ScienceInsurance PredictionSigmoid FunctionScikit-learnPythonData AnalysisPredictive ModelingEmployee RetentionHR Analytics
Besoin d'un résumé en anglais ?