Maths behind XGBoost|XGBoost algorithm explained with Data Step by Step

Unfold Data Science
4 Sept 202016:39

Summary

TLDRIn this data science tutorial, Aman explains the mathematical foundation of the XGBoost algorithm, an advanced machine learning technique. He begins with an overview of boosting algorithms, emphasizing their sequential learning approach compared to bagging. Aman then delves into XGBoost's unique features like regularization, auto-pruning, and the importance of parameters like lambda, gamma, and eta. Using a simple dataset, he illustrates how XGBoost models are trained iteratively to minimize prediction errors and handle outliers. The video promises a follow-up on implementing XGBoost in Python and exploring its parameters' impact.

Takeaways

  • 🌟 XGBoost is a boosting algorithm that builds models sequentially, with each new model attempting to correct the errors of the previous ones.
  • 🔍 Boosting differs from bagging in that it is a sequential ensemble method, training models one after another, whereas bagging is parallel.
  • 📊 XGBoost extends the Gradient Boosting algorithm, which uses decision trees and focuses on reducing the residuals of predictions.
  • 📈 The script uses a simple dataset with age as the independent variable and IQ as the dependent variable to illustrate how XGBoost works.
  • 🔢 Lambda is a regularization parameter in XGBoost that helps control overfitting by adjusting the impact of residuals on the model.
  • 🌱 Gamma is a threshold parameter in XGBoost that controls auto-pruning of the trees, thus preventing overfitting.
  • 🚀 Eta, or the learning rate, determines how quickly the boosting models converge to the final prediction.
  • 🌳 XGBoost creates a base model first, often starting with a simple average, and then fits additional models on the residuals of the previous model's predictions.
  • 📉 The residuals are used to build subsequent trees in the boosting process, with each tree attempting to minimize the error of the previous model.
  • 🛠️ The script explains how parameters like Lambda, Gamma, and Eta influence the model's ability to handle outliers and prevent overfitting.

Q & A

  • What is XGBoost?

    -XGBoost stands for eXtreme Gradient Boosting. It is a machine learning algorithm that is an optimized distributed gradient boosting library. It is designed to be highly efficient, flexible, and portable. It implements machine learning algorithms under the Gradient Boosting framework.

  • What is the difference between a boosting algorithm and a bagging algorithm?

    -A boosting algorithm is a sequential ensemble technique where models are trained one after another, with each new model trying to correct the errors of the previous ones. In contrast, a bagging algorithm is a parallel ensemble technique where multiple models are trained independently and then combined.

  • How is XGBoost different from Gradient Boosting?

    -XGBoost is an extension of Gradient Boosting. It adds features like regularization, auto pruning, and a more flexible definition of the objective function, making it more effective in preventing overfitting and handling large scale data.

  • What is the role of lambda in XGBoost?

    -Lambda in XGBoost is a regularization parameter that controls the complexity of the model. It helps in controlling overfitting by penalizing the size of the trees, thus acting as an aggressive pruning parameter.

  • What does gamma represent in the context of XGBoost?

    -Gamma in XGBoost is a threshold for minimum loss reduction required to make a further partition on a leaf node of the tree. It helps in controlling overfitting by preventing the model from learning noise from the training data.

  • What is the significance of the eta parameter in XGBoost?

    -Eta, also known as the learning rate in XGBoost, controls how fast the model learns. It shrinks the feature weights after each boosting step to prevent overfitting. A smaller eta means a slower rate of convergence.

  • How does XGBoost handle outliers?

    -XGBoost handles outliers by using the lambda parameter. By increasing the value of lambda, the impact of outliers on the prediction is reduced, as it effectively neutralizes the effect of extreme data points.

  • What is the base prediction model in the given example?

    -In the provided example, the base prediction model is the average of the target values. It predicts the IQ based on the average age of the students.

  • How does the residual value change after the first model in XGBoost?

    -After the first model, the residual values change based on the difference between the actual values and the predictions made by the model. These new residual values are then used to train the next model in the sequence.

  • What is the concept of similarity score in XGBoost?

    -The similarity score in XGBoost is a measure used to determine the homogeneity of residuals at a node. It is calculated as the sum of squared residuals divided by the number of residuals plus lambda. It plays a crucial role in deciding whether to split a node or not.

  • How does XGBoost reduce the residual error?

    -XGBoost reduces the residual error by sequentially fitting new models on the residuals of the previous model. Each new model aims to correct the errors of the previous ones, thus gradually reducing the overall residual error.

Outlines

00:00

📊 Introduction to XGBoost Mathematics

Aman, a data scientist, introduces the concept of XGBoost's mathematical algorithm. He explains that XGBoost is a boosting algorithm, which is an ensemble technique of sequential learning. Unlike bagging, boosting trains models sequentially, where each model tries to correct the errors of the previous ones. Aman mentions that XGBoost is an extension of the Gradient Boosting algorithm and plans to explain it using sample student data. He highlights the special features of XGBoost, such as regularization, auto-pruning, and the learning rate (eta), which contribute to its effectiveness in reducing errors and improving predictions.

05:02

🌳 Understanding XGBoost Tree Construction

Aman demonstrates how XGBoost constructs trees by fitting models on residuals. He introduces the concept of similarity score, which includes a regularization parameter lambda, to control overfitting. Aman explains that the tree splitting criteria are based on gain, which is calculated by comparing the similarity score before and after a split. The gamma parameter is used to determine whether a split should occur based on the gain value. He also discusses how lambda affects the similarity score and, consequently, the tree's growth and pruning. Aman emphasizes the importance of lambda in controlling overfitting and managing the impact of outliers on predictions.

10:02

🔍 Deep Dive into Lambda's Role in XGBoost

Aman further explores the role of lambda as a regularization parameter in XGBoost. He explains that increasing lambda leads to a more aggressive pruning of the tree to control overfitting. Additionally, lambda is used in predictions to neutralize the effect of outliers. Aman illustrates how lambda affects the prediction outcome by adjusting the impact of residuals, thus generalizing the model's predictions. He also touches on the use of the learning rate (eta) in updating predictions, which controls how quickly the model converges to the correct values.

15:03

🔚 Wrapping Up and Looking Forward to Implementation

Aman concludes the theoretical discussion on XGBoost's mathematics and invites viewers to ask questions or comment if they need further clarification. He assures that he will respond to comments to help clarify any doubts. Aman also previews the next video, where he will implement XGBoost in Python, demonstrating how different parameters affect the algorithm's performance and speed.

Mindmap

Keywords

💡XGBoost

XGBoost stands for eXtreme Gradient Boosting. It is an advanced machine learning algorithm that is used for supervised learning tasks. As explained in the video, it is an extension of the Gradient Boosting algorithm. XGBoost is known for its ability to handle large-scale data and for its efficiency in terms of speed and performance. The video script mentions that XGBoost is more effective due to its implementation of regularization, auto-pruning, and other special features that enhance its predictive capabilities.

💡Boosting

Boosting is an ensemble technique in machine learning that involves training models sequentially. The video script explains that unlike bagging, which is a parallel ensemble method, boosting is sequential. In boosting, each model is trained on the errors of the previous model, with the aim of improving the overall prediction accuracy. XGBoost is a type of boosting algorithm that builds models one after another, each trying to correct the errors of the previous ones.

💡Ensemble Technique

An ensemble technique in machine learning refers to the process of combining multiple learning algorithms to solve a particular problem. The video script uses the example of XGBoost to illustrate how different models are combined to give a better result. Ensemble methods are often used to improve the accuracy and robustness of machine learning models by reducing overfitting and increasing generalization.

💡Regularization

Regularization is a technique used in machine learning to prevent overfitting by adding a penalty term to the loss function. In the context of the video, regularization is one of the special features of XGBoost that helps control overfitting by penalizing complex models. The script mentions a regularization parameter 'lambda' that is used to adjust the complexity of the model and balance the trade-off between bias and variance.

💡Auto Pruning

Auto pruning is a feature of XGBoost that helps in controlling overfitting by automatically deciding when to stop growing a tree. The video script explains that this is achieved through a parameter called 'gamma', which acts as a threshold for deciding whether a tree should be further split or not. If the gain from a split is less than the gamma value, the split is not made, effectively pruning the tree.

💡Lambda

In the context of the video, lambda is a regularization parameter used in XGBoost. It is used to control the complexity of the model and to penalize large values of weights in the model. The script explains that by increasing the lambda value, the impact of outliers can be reduced, and the model becomes more robust. It also helps in controlling overfitting by adjusting the similarity score used in the tree splitting criteria.

💡Gamma

Gamma is a parameter in XGBoost that controls the minimum gain required for a split to be made in the tree. The video script describes gamma as a threshold that determines whether a split is beneficial enough to be included in the model. A higher gamma value leads to more aggressive pruning, while a lower value allows more splits, potentially leading to a more complex model.

💡Eta

Eta, as mentioned in the video, is a learning rate parameter in XGBoost that controls how fast the model learns. It is used to scale the contribution of each tree, ensuring that the model does not overfit by making large updates to the prediction in a single step. The script explains that eta helps in controlling the convergence rate of the algorithm, with smaller values leading to a more cautious learning process.

💡Residuals

Residuals in the context of the video refer to the errors or the difference between the predicted values and the actual values. The script explains that in boosting algorithms, including XGBoost, the next model is trained on the residuals of the previous model. This process helps in reducing the errors step by step, as each model tries to correct the mistakes of the previous ones.

💡Similarity Score

Similarity score is a concept introduced in the video to explain how XGBoost evaluates the quality of splits in the tree. It is calculated as the sum of squared residuals divided by the number of residuals plus the lambda value. The script uses this concept to illustrate how XGBoost decides whether a split is beneficial by comparing the similarity scores before and after a potential split.

Highlights

XGBoost is a boosting algorithm and an extension of gradient boosting.

Boosting algorithms train models sequentially to improve results.

XGBoost introduces regularization, auto pruning, and a learning rate for enhanced performance.

The base model in XGBoost starts with the average of the target values.

Residuals are used to fit the next model in the boosting process.

Lambda is a regularization parameter that controls overfitting in XGBoost.

Gamma is a threshold for auto pruning and controlling the tree's growth.

Eta is the learning rate that controls how quickly the algorithm converges to the next value.

The similarity score is calculated using residuals and lambda to measure the quality of splits.

Gain is the difference in similarity score before and after a split, used to decide whether to split a node.

Increasing lambda can help in reducing the impact of outliers on predictions.

The prediction in XGBoost is a combination of residuals, number of residuals, and lambda.

XGBoost builds trees by iteratively fitting models on the residuals of the previous model.

The learning rate (eta) influences the step size in the direction of the gradient.

XGBoost's ensemble model aims to minimize the residuals and closely approximate the original observations.

The video will continue with an implementation of XGBoost in Python and exploring parameter effects.

Transcripts

play00:00

welcome to unfold data science friends

play00:01

this is aman here and i am a data

play00:03

scientist

play00:04

in this video we will understand the

play00:06

concept of xg boost mathematics

play00:09

so what is the mathematical algorithm or

play00:11

what is the mathematical

play00:12

approach behind xc boost algorithm this

play00:15

is one topic which was requested by many

play00:17

of you

play00:18

so i just thought of taking a sample

play00:20

data

play00:21

and explaining you with the data let's

play00:23

start a discussion

play00:24

now fg boost is what kind of algorithm

play00:27

guys

play00:27

fg boost is a boosting algorithm what is

play00:31

your boosting algorithm by definition

play00:33

boosting algorithm by definition is

play00:35

nothing but an ensemble technique

play00:37

of sequential learning what i mean by

play00:40

sequential learning

play00:41

so the difference between bagging and

play00:42

boosting is backing is a parallel and

play00:44

symbol and boosting is a sequential

play00:46

ensemble

play00:47

which means in boosting different model

play00:50

get trained

play00:51

one after another so first model will be

play00:53

getting trend

play00:54

then the second model then the third

play00:56

model and then many models will combine

play00:58

to give you a better result

play01:00

now fg boost is nothing but an extension

play01:04

of gradient boost we can say

play01:06

so how gradient boost works there is a

play01:08

link

play01:09

in the description and you can see on

play01:11

the card as well

play01:12

i have given you a detailed description

play01:14

of how gradient boost works

play01:16

if you have not watched that video i

play01:18

advise you to watch that video as well

play01:21

now lg boost is nothing but an extension

play01:23

of gradient boost

play01:24

let us try to understand how hdboost

play01:27

will work on this data

play01:28

so i have taken a very simple data here

play01:30

guys let us say this is students data

play01:33

here is age column which is my x feature

play01:36

when i say x feature independent feature

play01:38

dependent feature or target is iq so iq

play01:41

of a student

play01:42

based on the age okay just three data

play01:45

points i have taken for simplicity

play01:47

how xg boost is more effective or how

play01:50

exit boost is more

play01:52

you know gives you better result is

play01:54

based on certain things that it does

play01:57

i had explained you in my last video

play01:59

what are the special things about

play02:01

hd boost for example regularization for

play02:04

example

play02:04

auto pruning what are the things which

play02:06

make xd boost and special algorithm

play02:09

we will try to understand with this

play02:10

example so i have plotted these three

play02:13

points here okay

play02:15

now not only xgboost but

play02:18

how normal gradient boost will also work

play02:21

is

play02:21

it will try to create a base model first

play02:24

so what is my job here my job here is to

play02:26

predict

play02:27

iq based on the age so the first step is

play02:30

creating a base model what is your base

play02:32

model

play02:33

let us say the the very normal

play02:35

assumption what you can take

play02:37

you can take very normal assumption of

play02:39

prediction is

play02:40

let us take the prediction as the

play02:43

average number of these three numbers

play02:45

okay so what is the average of these

play02:47

three numbers

play02:48

so the average of these three numbers

play02:50

will be 30.6

play02:51

for simplicity i am putting it 30 here

play02:54

so let us say this is the average line

play02:57

this is my model 0 okay base model

play03:00

so my base model is saying that the

play03:03

prediction

play03:04

for these three cases all these cases

play03:07

are actually 30

play03:08

but by my base model will have some

play03:10

errors right

play03:11

what will be these errors obviously

play03:13

these distance is right

play03:15

so this is error for this point this is

play03:17

error for this point and this is error

play03:19

for this point right

play03:20

we have certain errors now these errors

play03:23

right

play03:24

e1 e2 and e3

play03:27

next model m1 will be fitted on these

play03:32

errors

play03:33

to minimize these errors okay

play03:36

next model will be fitted on these

play03:38

errors

play03:39

to minimize these errors this is very

play03:41

important to understand

play03:43

so when i talk of m one for m1

play03:47

input data will be my independent

play03:49

features

play03:50

plus my target feature will be these

play03:53

errors

play03:54

simple that is what exactly happens in

play03:58

gradient boost as well but how is xgboos

play04:01

different

play04:02

so in hd boost you have to understand

play04:04

three things mainly

play04:05

first is known as lambda lambda is

play04:08

nothing but a regularization parameter

play04:10

okay regularization parameter second

play04:14

thing you have to understand in the fg

play04:15

boost model is known as gamma

play04:17

so i am writing it like this so gamma is

play04:20

nothing but a threshold

play04:21

that that defines your auto pruning of

play04:25

the tree

play04:26

or that controls your overfitting how i

play04:28

will tell you that okay

play04:29

and the third thing you should remember

play04:31

is something known as eta

play04:33

so eta is nothing but how well or how

play04:36

fast

play04:36

you want to converse to the next value

play04:39

so there is learning rate and there is

play04:41

eta so you must not confuse with the

play04:43

learning rate and eta this tells you

play04:46

how fast you want to converge okay i

play04:48

will give you examples to make you

play04:50

understand what are these things

play04:52

let us see how the models will be fitted

play04:54

on these errors first

play04:55

so that was our model 0 after model 0

play04:58

which was an

play04:59

average model we have these residuals

play05:01

right

play05:02

now as i told you our job is to fit a

play05:04

model on these residuals using

play05:06

independent features let us try to fit a

play05:09

normal decision tree first

play05:11

okay so what i am trying to do here is i

play05:14

am trying to create an

play05:15

something called as an xg boost tree on

play05:18

residuals okay

play05:19

so let me write here residual values

play05:22

minus 10

play05:23

4 and 8 okay now there is a concept of

play05:27

something called as similarity score of

play05:30

residuals

play05:31

on this node similarity score

play05:34

so what is similarity score i am writing

play05:36

here similarity score is equal to

play05:40

similarity score is equal to sum of

play05:42

residuals

play05:43

okay sum of residuals square

play05:47

divided by number of residuals

play05:50

which means number of how many residuals

play05:52

are there

play05:53

plus lambda okay this lambda is nothing

play05:57

but the regularization parameter which i

play06:00

spoke about

play06:00

some time back okay we will try to

play06:03

understand what is the use of this

play06:05

so if we plug in the values here what

play06:08

will be the sum of residuals here

play06:10

8 plus 4 12 minus 10 how much

play06:13

2 so 2 square how much 4

play06:16

4 divided by number of residuals how

play06:19

much

play06:20

how many residuals we have 3 4 divided

play06:23

by let us put lambda is equal to 0

play06:25

so it will be nothing but 4 sum of

play06:28

residual square

play06:29

divided by 3 plus lambda we are starting

play06:33

as 0

play06:33

let us see i'll tell you how what will

play06:36

happen if you increase lambda

play06:37

so lambda 0 what will be this value

play06:40

something more than 1

play06:41

let us put this as 1.3 for example

play06:45

this is the similarity score of the

play06:47

residuals

play06:48

at this particular node now

play06:52

we will define a tree splitting criteria

play06:55

for example let us say i define a tree

play06:57

splitting criteria saying

play06:59

age greater than 10 just a simple

play07:02

splitting criteria

play07:04

so how many records will come this side

play07:06

one record

play07:07

how many records will go this side two

play07:08

records which is the one record which

play07:10

comes on the left

play07:11

residual wise i'm talking -10 which is

play07:15

the one record two records which goes

play07:16

right

play07:17

residual wise again these two residuals

play07:20

write four and eight

play07:21

four and eight similarly using this

play07:25

score

play07:26

similarity score will be calculated at

play07:28

these nodes as well

play07:30

so what will be the similarity score

play07:31

here sum of residual square

play07:34

divided by for this node i am talking

play07:36

about 100

play07:37

divided by 1 so it will be 100

play07:40

what will be similarity score here sum

play07:43

of residual square

play07:44

which means 12 144 divided by

play07:47

2 right so it will be nothing but 72

play07:50

right so this is the similarity score

play07:53

here and this is the similarity score

play07:54

here

play07:55

what we have to understand here is when

play07:58

the residuals are of opposite sign

play08:01

then the similarity score is lower the

play08:03

reason for that is residuals cancel each

play08:05

other

play08:06

when the similarity score is when the

play08:08

residuals

play08:09

sign is same then it does not cancel

play08:13

each other and the

play08:14

similarity score value is higher as you

play08:17

can see similarity score value is higher

play08:18

here and here both

play08:20

now there is one term defined known as

play08:23

gain

play08:23

okay so how do you define gain is you

play08:26

say gain is equal to

play08:28

similarity score of this branch after

play08:31

split

play08:32

minus similarity score of the branch

play08:34

before split

play08:35

so in this case 100 plus 72 minus 1.3

play08:39

that becomes the gain so 172 minus

play08:43

1.3 whatever you get here

play08:46

that becomes your gain now comes the use

play08:49

of the parameter that i was talking

play08:51

about known as gamma

play08:52

so gamma is nothing but when you call an

play08:55

xg boost algorithm

play08:57

you supply a value gamma let us say your

play09:00

gamma in this case is

play09:01

130 okay just a simple number

play09:05

whenever your gamma value is

play09:08

less than the gain value then this split

play09:11

will happen otherwise the split will not

play09:13

happen

play09:14

and that is how your auto pruning

play09:16

happens

play09:17

so i was discussing about auto pruning

play09:19

in my last video

play09:20

so how long this tree will grow that

play09:23

will depend on

play09:24

how much of gain it is gaining by

play09:27

splitting and if that gain is

play09:31

within the range of the gain that we

play09:33

intending to go uh

play09:34

i mean more than the range that we are

play09:36

intending to get

play09:38

if this is not satisfied then this

play09:39

breakage will not happen

play09:41

okay this is about your gamma parameter

play09:44

so

play09:44

higher gamma if you give that means you

play09:47

want

play09:48

to prune the tree with more aggressive

play09:51

approach right

play09:52

lower gamma you give means you want to

play09:54

prune the tree with lesser aggressive

play09:56

approach

play09:57

now what is the use of this lambda here

play10:00

two things to understand we have

play10:02

proceeded this which lambda is equal

play10:04

to 0 as a baseline what will happen

play10:07

if we keep lambda is equal to 1 if we

play10:09

keep lambda is equal to 1

play10:11

then similarity score here will be

play10:14

1 in this case so if i replace this 0 by

play10:17

one similarity score will be one right

play10:19

so here similarity score in place of one

play10:21

point three will be one right

play10:23

similarly at this places also similarity

play10:26

score will come down

play10:28

which means that to control your tree to

play10:31

over fit

play10:33

the overall similarity score will come

play10:34

down which means the gain will come down

play10:36

right

play10:37

if the gain comes down then will your

play10:40

tree will be pruned right

play10:42

here gain number is bigger hence the

play10:45

breakage is allowed

play10:46

if the gain number is in place of let us

play10:49

say 170

play10:50

if the gain number is just 70 and we are

play10:53

supplying gamma as 130

play10:54

then this split will not be allowed

play10:57

which means your tree is pruned

play11:00

i am repeating it again guys lambda is a

play11:03

very important parameter in fg boost

play11:04

known as

play11:05

regularization parameter what is the use

play11:08

of regularization parameter one uses

play11:10

if you increase your regularization

play11:12

parameter then you are taking a more

play11:14

aggressive approach to prune your tree

play11:16

or control the overfitting of the tree

play11:18

that is number one

play11:20

there is one more use of lambda what is

play11:22

that use

play11:23

so how the prediction will happen here

play11:25

is

play11:26

how the prediction will happen let us

play11:28

say this is metric built

play11:29

this is my tree right so how the

play11:31

prediction here happens is

play11:33

it simply tells you sum of all the

play11:36

residuals

play11:37

divided by number of residuals plus

play11:40

lambda

play11:41

again okay so i'm talking about

play11:44

if tomorrow a new data comes then how

play11:46

the prediction will happen let us say

play11:48

new kid comes

play11:50

whose age is 11 so what this model i am

play11:53

talking about model m1 only now

play11:55

what this model will predict for that

play11:57

guy it will predict

play11:59

sum of residuals to whichever branch it

play12:02

goes for example in this branch it comes

play12:04

right

play12:05

sum of residuals is how much 12. number

play12:08

of residuals is how much 2

play12:10

plus lambda now here also

play12:13

we can play with lambda for one moment

play12:16

let us put lambda is equal to 0

play12:17

so what will be the prediction in this

play12:19

branch 6

play12:21

next moment put lambda is equal to 1

play12:24

what will be the prediction in this

play12:25

branch

play12:26

6 uh 2 plus 1 3 right 4

play12:29

if we increase lambda further then it

play12:31

will come further down

play12:32

what is the use of lambda in this case

play12:34

if you think carefully is

play12:37

the effect of outliers so another

play12:40

advantage of xg boost is

play12:42

it takes care of the outliers to some

play12:44

extent so if you increase the lambda

play12:46

size if you increase the lambda value

play12:48

the impact of

play12:50

outliers on the prediction will come

play12:52

down significantly

play12:54

these are the usage of your

play12:56

regularization parameter one

play12:58

is how aggressive you want to control

play13:00

your overfitting how aggressive you want

play13:01

to prune your tree

play13:03

similar kind of use for this gain gamma

play13:06

parameter as well right

play13:08

and what is the use of lambda other use

play13:11

you know

play13:11

how much you want to control the effect

play13:13

of an outlier in your data

play13:16

how much you want to generalize so if

play13:18

you keep lambda is equal to 0 then this

play13:20

is nothing but an average right

play13:22

lambda is equal to 0 is just an average

play13:24

number of the prediction

play13:25

if you increase lambda it's not just

play13:27

average it is trying to neutralize the

play13:29

effect of

play13:30

extreme variables or extreme data points

play13:33

this is about

play13:34

gamma lambda and the third parameter i

play13:37

am going to tell you what is that

play13:39

new prediction is equal to previous

play13:42

prediction which is nothing but my model

play13:44

0 prediction base prediction

play13:46

plus learning rate into output now let

play13:49

us take the example of the previous tree

play13:51

that we created

play13:52

i am considering this second data point

play13:54

okay the second data point

play13:56

is greater than 10 right so it will come

play13:58

to that side

play13:59

where the output was 6 right you can go

play14:02

back and see

play14:03

output was 6 what is the previous

play14:05

prediction for this guy i mean this

play14:07

record

play14:08

it is nothing but 30.6 so i am

play14:10

considering that 30

play14:11

for simplicity the average of these

play14:12

three nothing more 30

play14:15

plus learning rate i'll tell you what is

play14:17

this learning rate in a moment

play14:19

into output what was the output or

play14:22

output prediction for that branch

play14:24

for lambda is equal to zero it was six

play14:27

so six

play14:28

now this learning rate is nothing but

play14:30

what i told in the beginning a parameter

play14:32

called eta

play14:34

okay so eta in typically fgboost we take

play14:37

0.3

play14:38

you can tweak the value between 0.121 as

play14:40

well

play14:41

if i put 0.3 right it will be 6

play14:45

30 plus how much

play14:48

1.8 right 30 plus 1.8

play14:52

is equal to 31.8 that is the new

play14:55

prediction

play14:56

for this particular record when the new

play15:00

prediction

play15:00

comes then the residual value will

play15:03

automatically

play15:04

change what will be the new residual

play15:06

value now

play15:07

new residual value will be nothing but

play15:09

34

play15:10

minus 31.8 how much

play15:14

2.2 i think 32 yeah 2.2

play15:18

so new residual here will be 2.2

play15:22

right now previous residual goes out of

play15:25

the picture

play15:26

new residual comes and model 2

play15:29

is trained this is how one after another

play15:33

model 1 model 2 model 3 model 4 will get

play15:35

trained

play15:36

and in the last model what will happen

play15:38

is we will have

play15:40

reduced residual so if you see here

play15:42

previously receivable was 4

play15:44

now as it will reduce to 2.2 in next

play15:47

model it will reduce further

play15:48

and similarly this is how xg boost will

play15:50

try to reduce the residual

play15:52

and give you the final ensemble model

play15:56

which is very close to the original

play15:58

observations

play16:00

so this was all about the mathematics

play16:01

behind fg boost with a simple data

play16:04

little complex i am sure you might be

play16:06

having some doubt

play16:08

write me in comment i'll definitely

play16:09

respond to you and i try to explain you

play16:11

in a simple way still

play16:13

if you feel at some step you wanted to

play16:15

ask something just don't hesitate to

play16:17

write me

play16:17

i'll see you all in the next video with

play16:19

the implementation of this in python

play16:21

and i'll also show you how different

play16:23

parameters change and how

play16:24

this algorithm runs faster and many

play16:27

other things right

play16:28

i'll see you all in the next video till

play16:30

then all of you stay safe and take care

play16:39

you

Rate This

5.0 / 5 (0 votes)

Связанные теги
XGBoostData ScienceMachine LearningAlgorithmEnsemble LearningBoostingGradient BoostRegularizationAuto PruningLambda
Вам нужно краткое изложение на английском?