Xgboost Regression In-Depth Intuition Explained- Machine Learning Algorithms 🔥🔥🔥🔥

Krish Naik
19 Oct 202019:29

Summary

TLDRIn this YouTube tutorial, the host Krishna dives into the workings of the XGBoost Regressor, an ensemble machine learning technique. He explains how decision trees are constructed within XGBoost, detailing the process from creating a base model to calculating residuals and constructing sequential binary trees. Krishna also covers the calculation of similarity weights and information gain, crucial for optimizing splits in the tree. The video is educational, aiming to provide in-depth intuition into XGBoost's regression capabilities.

Takeaways

  • 🌟 XGBoost is an ensemble technique that uses boosting methods, specifically extreme gradient boosting.
  • 📊 The script explains how decision trees are constructed in XGBoost, starting with creating a base model that uses the average of the target variable.
  • 🔍 Residuals are calculated by subtracting the base model's output from the actual values, which are then used to train the decision trees.
  • 📈 The script discusses the calculation of similarity weights in XGBoost, which is a key component in determining how to split nodes in the trees.
  • 📉 Lambda is introduced as a hyperparameter that can adjust the similarity weight, thus influencing the complexity of the model.
  • 📝 The process of calculating information gain for different splits is detailed, which helps in deciding the best splits for the decision trees.
  • 🌳 The script walks through an example of constructing a decision tree using the 'experience' feature and comparing it with other potential splits.
  • 🔧 The concept of gain, which is the improvement in model performance from a split, is calculated and used to decide which features and splits to use.
  • 🔄 The script mentions the use of multiple decision trees in XGBoost, each contributing to the final prediction with a learning rate (alpha) applied.
  • 🛠 The role of hyperparameters like gamma in post-pruning to prevent overfitting is discussed, highlighting the importance of tuning these parameters for optimal model performance.

Q & A

  • What is the main topic of the video?

    -The main topic of the video is an in-depth discussion on the XGBoost Regressor, focusing on how decision trees are created in XGBoost and the mathematical formulas involved.

  • What is XGBoost and what does it stand for?

    -XGBoost stands for eXtreme Gradient Boosting. It is an ensemble machine learning algorithm that uses a boosting technique to build and combine multiple decision trees.

  • What is the role of the base model in XGBoost?

    -The base model in XGBoost is used to calculate the average output, which serves as the initial prediction before any decision trees are applied. It helps in computing residuals that are used to train the subsequent decision trees.

  • How does the script define 'residual' in the context of XGBoost?

    -In the context of XGBoost, 'residual' refers to the difference between the actual value and the predicted value by the base model. It represents the error that the model is trying to correct.

  • What is the significance of the lambda parameter in XGBoost?

    -The lambda parameter in XGBoost is a hyperparameter that controls the complexity of the model. It is used in the calculation of similarity weight, which in turn affects the decision of splitting nodes in the decision trees.

  • What is the purpose of calculating similarity weight in XGBoost?

    -The similarity weight in XGBoost is calculated to measure the quality of a split in a decision tree. It is used to determine the best split that maximizes the reduction in the weighted sum of squared residuals.

  • How does the script describe the process of creating a decision tree in XGBoost?

    -The script describes the process of creating a decision tree in XGBoost by first creating a base model, then calculating residuals, and using these residuals to determine the best splits for the decision tree nodes based on similarity weight and gain.

  • What is the role of the learning rate (alpha) in the XGBoost algorithm?

    -The learning rate (alpha) in the XGBoost algorithm determines the contribution of each decision tree to the final prediction. It is used to control the impact of each tree, allowing the model to update predictions incrementally.

  • What is the purpose of the gamma parameter mentioned in the script?

    -The gamma parameter in the script is used for post-pruning in the decision trees. If the information gain after a split is less than the gamma value, the split is pruned, which helps in preventing overfitting.

  • How does the script explain the process of calculating the output for a decision tree in XGBoost?

    -The script explains that the output for a decision tree in XGBoost is calculated by taking the average of the residuals that fall into a particular node, and then using this output to update the predictions for the corresponding records.

  • What is the final output of the XGBoost model as described in the script?

    -The final output of the XGBoost model is the sum of the base model output and the outputs from each decision tree, each multiplied by their respective learning rates (alphas).

Outlines

00:00

🌟 Introduction to XGBoost Regressor

The speaker, Krishna, introduces the topic of the video, which is an in-depth exploration of the XGBoost Regressor. He explains that the video will cover the creation of decision trees within XGBoost and the mathematical formulas involved. Krishna references a previous video on the HDBSCAN classifier and encourages viewers to watch it for context. The video then delves into a problem statement involving three features: experience, gap, and salary. The aim is to demonstrate how decision trees are constructed in XGBoost, starting with the creation of a base model using the average salary as a predictor.

05:01

📊 Constructing Decision Trees in XGBoost

Krishna explains the process of creating sequential decision trees in XGBoost, beginning with the calculation of residuals based on the average salary. He then discusses how the first decision tree is constructed using the features 'experience' and 'gap', along with the residuals. The video illustrates how to split the data at the root node based on the 'experience' feature, creating two branches: one for values less than or equal to 2 and another for values greater than 2. Krishna also introduces the concept of similarity weight, which is calculated using the sum of residual squares and a hyperparameter called lambda. The goal is to find the best split that maximizes the information gain, which is the difference between the sum of similarity weights of the left and right nodes and the similarity weight of the root node.

10:04

🔍 Calculating Information Gain and Decision Tree Output

The speaker continues by calculating the information gain for different splits in the decision tree, comparing the gains to determine the most effective split. He emphasizes the importance of choosing the split that yields the highest gain. Krishna then explains how to calculate the output of the decision tree nodes, using the average of the residuals for each node. The video demonstrates how the base model output is combined with the decision tree output, adjusted by a learning rate, to produce the final predictions. The process is repeated for each record, and Krishna clarifies a mistake made in the calculation, ensuring accuracy in the demonstration.

15:05

🛠️ Advanced XGBoost Techniques and Conclusion

In the final part of the video, Krishna discusses advanced techniques in XGBoost, such as the use of the gamma parameter for post-pruning to prevent overfitting. He explains that if the information gain after a split is less than the gamma value, the split is pruned. Krishna summarizes the process of constructing an XGBoost model, which involves creating multiple decision trees and combining their outputs with the base model output, each multiplied by a learning rate. He concludes by encouraging viewers to subscribe to the channel for more content and thanks them for watching.

Mindmap

Keywords

💡XGBoost Regressor

XGBoost Regressor is an ensemble machine learning technique that uses boosting for regression tasks. It constructs sequential decision trees, each correcting the errors from the previous one, improving the overall model accuracy. In the script, the speaker explains how XGBoost Regressor is used to model relationships between features like experience and salary.

💡Boosting Technique

Boosting is a machine learning method where multiple weak learners (often decision trees) are combined to create a strong learner. Each model in the sequence focuses on correcting the errors of the previous ones. XGBoost is one such boosting technique, and the video emphasizes how it uses boosting to improve model performance.

💡Decision Tree

A decision tree is a flowchart-like structure used to model decisions based on feature values. In XGBoost, decision trees are sequentially created, with each tree learning from the residual errors of the previous tree. The script describes how decision trees are created in XGBoost Regressor to predict outcomes like salary based on experience and gap.

💡Residuals

Residuals represent the difference between the predicted and actual values in a regression model. In XGBoost, residuals from the base model are used to train subsequent decision trees. The video shows how residuals are calculated after the base model's prediction of the average salary and how they influence the training of the next tree.

💡Similarity Weight

Similarity weight is a measure used in XGBoost to determine the best splits in decision trees. It is calculated based on the residuals and helps to quantify how similar or different the data points in a split are. The speaker demonstrates how similarity weights are computed for different splits to decide the optimal node division.

💡Lambda

Lambda is a hyperparameter in XGBoost that controls the regularization term applied to the similarity weight calculation. It helps prevent overfitting by penalizing overly complex models. The script mentions that increasing the lambda value decreases the similarity weight, thus influencing the decision-making process in tree construction.

💡Information Gain

Information gain measures the improvement in accuracy or reduction in uncertainty after a split in a decision tree. It is calculated by comparing the similarity weights before and after the split. In the video, the speaker explains how the total gain from a split is computed, and the split with the highest gain is selected.

💡Hyperparameter Tuning

Hyperparameter tuning involves adjusting the settings of a machine learning algorithm to optimize performance. In XGBoost, parameters like lambda and gamma control model complexity and pruning. The speaker explains how tuning lambda and gamma values can affect the similarity weights and post-pruning of decision trees.

💡Learning Rate

The learning rate (alpha) in XGBoost controls the contribution of each new tree to the final model. A smaller learning rate makes the model learn more slowly but can improve performance by avoiding overfitting. In the video, the speaker uses a learning rate of 0.5 to adjust the output of the trees in the XGBoost Regressor.

💡Pruning

Pruning is the process of removing nodes from a decision tree to avoid overfitting and improve model generalization. In XGBoost, pruning is influenced by the gamma hyperparameter. The speaker discusses how gamma is used to decide whether a split should be pruned based on the information gain and its effect on the overall model.

Highlights

Introduction to XGBoost Regressor and its ensemble technique using boosting.

Explanation of how decision trees are created in XGBoost.

Discussion on the Mean Absolute Percentage Error (MAPE) formula involved in XGBoost.

Overview of the base model creation process in XGBoost.

Calculation of residuals based on the base model output.

Introduction to the concept of similarity weight in XGBoost Regressor.

Calculation of similarity weight for different splits in the decision tree.

Explanation of information gain in the context of XGBoost Regressor.

Demonstration of how to choose the best split point in a decision tree.

Construction of a binary tree with experience as a continuous feature.

Use of lambda as a hyperparameter to adjust similarity weight.

Calculation of the output for each node in the decision tree.

Description of how to pass a record through the decision tree to get the output.

Introduction to the concept of learning rate in the context of XGBoost.

Explanation of how residuals are recalculated after each iteration.

Discussion on the role of gamma as a hyperparameter for post-pruning in XGBoost.

Final output calculation using the XGBoost formula with multiple decision trees.

Encouragement for viewers to subscribe for more in-depth tutorials on XGBoost.

Transcripts

play00:00

hello all my name is krishna welcome to

play00:01

my youtube channel so guys today in this

play00:03

particular video we are going to discuss

play00:05

about xg boost regressor we'll try to

play00:07

understand how the decision tree

play00:09

is actually created in xgboost and apart

play00:11

from that we'll also try to see

play00:13

the maps formula that are actually

play00:15

involved in xg boost now this will be an

play00:17

xg boost regression in-depth intuition

play00:19

my previous video i had already covered

play00:21

with respects to hdbush classifier

play00:23

if you have not seen that guys please go

play00:25

ahead and see that again the link of

play00:27

that particular video will be given in

play00:28

the description

play00:29

okay now to begin with what i'm going to

play00:31

do is that again if you don't know about

play00:33

xgboost it is an ensemble technique it

play00:35

uses boosting technique

play00:36

and there are many other algorithms in

play00:38

boosting technique like

play00:39

boost gradient boosting extreme gradient

play00:41

boosting i've covered all these three

play00:43

and again there are some more boosting

play00:46

techniques which i'm going to cover in

play00:47

the upcoming sessions

play00:49

so to begin with i'm going to actually

play00:51

take this simple problem statement

play00:53

this paper i have actually done all the

play00:54

computation and

play00:56

suppose if i miss out any calculation

play00:57

i'll just verify it from here okay

play01:00

so in this particular problem statement

play01:02

i have three features

play01:03

like experience gap and salary

play01:06

so this basically says that a person if

play01:08

he's having an experience of two years

play01:10

and suppose if he has gap

play01:12

gap is basically a categorical feature

play01:14

if he has a gap the salary will be

play01:15

somewhere around 40k

play01:17

okay it may be in dollars then 2.5 yes

play01:20

41k if he does not have gap usually

play01:23

based on experience the salary will

play01:24

become more

play01:25

okay so this is the complete data set i

play01:27

have okay

play01:29

if you remember if you have gone through

play01:30

my xg boost classifier

play01:32

you know there i've actually shown you

play01:34

how to construct the decision tree

play01:36

similarly i'll try to show you how you

play01:38

can construct a decision entry in exibus

play01:39

regressor

play01:40

so let's begin now over here always

play01:42

remember

play01:43

as in xgboost what we do is that we

play01:46

create sequential decision trees

play01:48

so first of all we'll try to create a

play01:49

base model this base model

play01:51

will actually give you the output what

play01:53

kind of output it will give

play01:55

so based on this average salary suppose

play01:57

uh by default you know the first base

play01:59

model will take the average of all this

play02:01

salary

play02:01

40k 41k 52k 60k and 62k

play02:05

if i calculate the average of this the

play02:07

average salary will be somewhere on 51k

play02:09

okay then based on this 51k i will try

play02:12

to find out the residual

play02:14

okay this will be my residual one

play02:16

because i'm finding out for the first

play02:17

residual

play02:18

and if i whenever i talk about boosting

play02:20

techniques guys the decision tree gets

play02:22

trained

play02:23

based on this particular residual values

play02:25

so i'll try to subtract 40 k

play02:28

minus 51k so this will be nothing but 9k

play02:31

sorry minus 9k then if i sorry it should

play02:34

not be 9

play02:35

minus 9 but instead it should be minus

play02:37

11k

play02:38

right if i subtract 41 with 51 it should

play02:42

be minus 10

play02:43

right if i subtract 52 with 51 it should

play02:46

be 1.

play02:47

if i subtract 60 with 51 it should be 9

play02:50

and if i subtract 62 uh with 51

play02:54

then at that particular point of time it

play02:56

should be actually

play02:57

51 11 11 1162 right it should be 11

play03:01

okay now these are all my values with

play03:03

respect to residual ones

play03:04

let me consider guys one thing let me

play03:06

just make a small change over here

play03:08

instead of writing 41k

play03:11

i will try to write 42k okay

play03:14

just a small change this just to make my

play03:17

calculation easier

play03:18

but again it will not matter if you also

play03:20

write 41k

play03:21

now if i try to find out the difference

play03:23

this will be somewhere around

play03:25

9k just to make my calculation easier i

play03:27

have just written it as 40k

play03:29

now what we will do the xg boost

play03:32

regressor we have created the base model

play03:34

the base model output is 51k based on

play03:36

that we have got the residuals residual

play03:37

basically means error

play03:38

right now in the first decision tree

play03:41

that is basically getting created

play03:43

we are going to take this independent

play03:44

feature experience gap and salary

play03:46

sorry not salary experience gap and

play03:49

residual

play03:50

so first of all we will try to take the

play03:52

root node we will try to see in which

play03:54

road node we will try to divide

play03:56

okay so suppose if i take experience

play03:59

now in experience okay i know all my

play04:02

output values what are my output values

play04:04

it is minus 11k then it is minus 9

play04:08

then it is 1 then 9 and 11

play04:12

right i have this and remember

play04:14

experience is a continuous feature right

play04:16

it is not a category feature it is a

play04:18

continuous feature two two point five

play04:20

three four four point five

play04:21

so in a continuous feature how a

play04:23

decision tree is basically constructed

play04:24

suppose

play04:25

every time remember in extreme boost we

play04:26

create a binary trees only

play04:28

so here i will try to create a binary

play04:30

tree

play04:31

the first tree over here i'll say that

play04:33

the condition will be less than or equal

play04:35

to the first record that is 2

play04:37

okay then in my next record i will say

play04:39

this will be greater than

play04:40

2. okay so i have actually taken this it

play04:44

will be less than or equal to 2 it will

play04:45

be greater than 2

play04:46

that basically means the first record

play04:47

will go over here remaining all the

play04:49

records will go in this side

play04:51

right now if i try to get all the values

play04:54

over here

play04:55

right what will be the values if i have

play04:57

less than or equal to

play04:58

2 that is only this one residual so here

play05:01

i will be getting minus 11

play05:02

and remaining or i will be getting minus

play05:04

9 1

play05:06

9 comma 11. so all these values will be

play05:08

coming over here which is greater than 2

play05:10

pretty much simple pretty much easy okay

play05:12

this is one way

play05:14

you can also take like this first of all

play05:15

you can take first two records you can

play05:17

calculate the average

play05:18

then less than or equal to average all

play05:20

the records you put it on the left hand

play05:21

side

play05:22

otherwise you put it on the right hand

play05:23

side but here what i've done is that

play05:25

i've taken the first record i've written

play05:27

greater than less than or equal to two

play05:28

made it as one branch the other branch

play05:30

will be this now this is the first step

play05:33

now in the second step we basically

play05:35

calculate

play05:36

something called as similarity weight

play05:39

now if you remember in xd boost

play05:40

classifier

play05:41

the similarity weight was defined as

play05:43

summation of residual square

play05:45

in exhibit classifier again i'm telling

play05:47

you guys classifier

play05:49

summation of residual square summation

play05:52

of probability

play05:52

minus 1 minus probability and this

play05:54

probability used to be 0.5

play05:56

now in case of sg bush regressor this

play05:59

formula will change little bit

play06:01

okay here we'll basically say that

play06:04

number of residuals number of residuals

play06:09

plus a parameter hyper parameter which

play06:11

is called as lambda

play06:13

okay this lambda value will definitely

play06:16

if we increase the lambda value this

play06:17

will decrease the

play06:18

similarity weight in short okay so this

play06:21

can be treated as a hyper parameter

play06:23

so let me go ahead and let me start

play06:25

computing the similarity weight

play06:26

in exhibit regressor also we create we

play06:29

calculate the similarity weight

play06:30

now suppose for this we'll try to

play06:32

calculate the similarity weight

play06:33

because i need to basically understand

play06:37

if i am taking experience as my root

play06:39

node how do i have to split

play06:41

right now i've just taken the first

play06:42

record in the left hand side and

play06:44

remaining greater than 2 in the right

play06:45

hand side

play06:46

right whether this record is the best

play06:48

record to split

play06:49

right for that we'll calculate the

play06:51

similarity to g weight after that we'll

play06:52

calculate the information gain

play06:54

or we'll say all as gain okay so here

play06:57

we'll try to calculate the similarity

play07:01

weight okay so here some residual square

play07:05

i have only one residual so this will

play07:06

become 121

play07:08

minus 11 is nothing minus 11 square is

play07:10

nothing but 121

play07:12

number of residual is again 1 plus in

play07:15

this particular case let me consider

play07:16

that i am going to take lambda as 1

play07:18

so if i try to write 121 divided by 2

play07:21

this would be nothing but

play07:22

six zero and then uh five zero 0.5 so my

play07:26

similarity weight in this case is 65.5

play07:29

so i've calculated

play07:30

it i'll write it over here so my

play07:32

similarity weight

play07:36

is 65.5 remember my lambda is equal to

play07:39

1.

play07:40

now similarly i will try to calculate

play07:42

the similarity weight for

play07:44

this how will i calculate i'll take all

play07:46

the residual

play07:47

summation of this minus 1 9 plus 1

play07:50

plus 9 plus 11 whole square

play07:54

divided by how many numbers are there it

play07:56

is basically 4

play07:57

4 plus 1 right now if i try to compute

play08:00

this

play08:01

right and this 9 and this 9 will get

play08:02

cancelled 12 12 is nothing but 144

play08:05

divided by 5 right so if i try to do

play08:09

this

play08:10

ok how many records i will be getting

play08:12

five ones are five twos are ten

play08:14

then five eights are 40 88 28.5

play08:17

so i will get the sims this similarity

play08:19

weight as 28.5

play08:20

if there is little bit mistake don't

play08:22

worry okay just calculate the

play08:24

similarities weight will be 28.5

play08:26

if it is getting coming some other value

play08:28

you can let me know in the description

play08:29

so i'm going to just going to write 28.5

play08:33

so here i'm going to write 28.5

play08:36

now the third thing that i'm going to do

play08:38

is basically

play08:39

i also have to compute the similarity

play08:41

weight of the root

play08:42

right this root so here i will be taking

play08:45

again

play08:46

minus 11 minus 9 9 11 this will get

play08:49

cancelled

play08:50

only 1 will be remaining so this will be

play08:51

1 square divided by how many numbers i

play08:54

have 5

play08:55

5 plus 1 so this is nothing but 1 by 6

play08:58

1 by 6 is nothing but if i say that it

play09:01

is somewhere around

play09:02

0.1 um 0.16

play09:06

okay so i'm going to take this 0.16 so

play09:09

i will write over here my similarity

play09:11

weight

play09:13

is nothing but 0.16

play09:17

now if i really want to calculate the

play09:19

information gain or again i'll say

play09:21

then we have to take this left

play09:22

similarity weight

play09:24

add it with right similarity weight

play09:27

subtract with the root similarity weight

play09:30

okay so here if i try to calculate this

play09:33

is

play09:33

20 um suppose 5 5 1 13 14

play09:38

14 and 94 minus 0.16

play09:42

okay and here if i try to subtract 2094

play09:46

minus

play09:46

0.16 you know so this will be somewhere

play09:50

around four

play09:53

eight three ninety three point eight

play09:56

four

play09:57

ninety three point eight four so the

play09:59

total gain

play10:01

that i am getting from this split is

play10:03

basically 93.84

play10:05

so here i'm just going to note down this

play10:07

one and i'll say that the total gain

play10:09

that i'm going to get is

play10:11

93.84 okay so this is my

play10:14

gain now similarly what i'll do

play10:18

i have done this now i'll go with the

play10:20

next split in next plate i'll say that

play10:22

okay

play10:22

instead of making see guys with one

play10:24

split we got the gain 98.4 so let me

play10:27

just write it down somewhere here

play10:29

okay so for the first record split i

play10:32

got somewhere around 98.34

play10:35

okay so i hope i'm writing it right okay

play10:38

now

play10:39

in my second split what i'm going to do

play10:40

is that i'm going to just drop this

play10:42

and now i'll go to my second record

play10:46

i'll go to my second record now second

play10:48

record is nothing but 2.5 so here i'll

play10:49

say that

play10:50

less than or equal to 2.5 and greater

play10:53

than

play10:54

2.5 now when i do this less than or

play10:56

equal to 2.5 how many records are there

play10:58

one 2 so here i am going to write minus

play11:00

11 minus 9

play11:02

okay here again there will be 3 records

play11:05

1 9 11

play11:07

okay 1 9 11 now when we try to calculate

play11:11

the similarity weight again over here it

play11:12

will be minus 11 minus 9 whole square

play11:15

divided by 2 which is nothing but 400

play11:19

sorry 2 plus 1 right because alpha value

play11:21

is also there

play11:22

so 400 divided by 3 this will be

play11:24

somewhere around

play11:26

3 1 are three threes or three threes are

play11:28

0.33 so it will be nothing but 133.33 so

play11:31

this similarity weight over here

play11:33

will be 133.33

play11:36

okay then for this similarity weight

play11:39

again i will try to

play11:40

add up all these things 1 plus 9 is 10

play11:42

10 plus 11 is 21

play11:44

21 whole square will be 441 441 divided

play11:48

by 3 plus 1 which is nothing but

play11:50

4 okay so if i write 441 divided by 4

play11:55

it will be nothing but 1 1 0

play11:58

or 0.25 so 1 1 0.25 so i'm going to take

play12:02

this similarity weight as

play12:03

1 1 0.25

play12:07

okay now i'll try to i know my root

play12:09

similarity

play12:10

now if i really want to find out the

play12:12

gain it will be 133.33

play12:15

plus 110.25 minus

play12:18

0.16 okay so once i do this

play12:22

probably how much it will come so let me

play12:25

add it for you

play12:28

133.33 110.25

play12:32

okay so this will be 85.34 143.58

play12:36

and if i try to subtract with 1 6

play12:40

this will be nothing but 4 143.42

play12:44

so here the total gain that i will be

play12:45

getting is

play12:48

143.42 and obviously this gain is

play12:51

better than this gain so i'll definitely

play12:53

not do the split from here

play12:54

i'll do the split from here and

play12:56

similarly i'll try to compare with all

play12:57

the splits

play12:58

and suppose i find out that i have to do

play13:00

the split from this record

play13:01

i will take this and i will try to

play13:03

create it now once i decided okay this

play13:05

is the split that i have to use

play13:07

after this what i'll do is that i will

play13:09

go with the next category feature

play13:11

now suppose over here i go with gap as

play13:13

yes and no

play13:14

in gap if it is yes suppose this is yes

play13:18

right and this is no in yes because both

play13:21

these values are in yes so this becomes

play13:23

the root node so i will not go with this

play13:24

splitting

play13:25

right then next with this splitting i'll

play13:27

go with gap

play13:28

again i'll write yes or no so how many

play13:32

in this records which is greater than

play13:34

2.5 there are two nodes

play13:36

two nodes over here right two nodes and

play13:38

these are nothing but one and nine

play13:40

so i'll write it as one nine nine here i

play13:42

will write it as

play13:44

um remaining one is eleven then again

play13:46

i'll try to calculate the similarity

play13:48

score

play13:48

i'll try to calculate the gain okay now

play13:51

suppose if i go with this split right

play13:53

now

play13:53

okay and then i'll try to compare like

play13:55

whichever will be having the highest

play13:56

gain i'll be taking that

play13:57

now suppose this is the overall tree

play13:59

structure that has been created for the

play14:00

first tree

play14:01

how will i calculate the output now

play14:03

suppose in this particular path if it is

play14:05

going the output over here

play14:06

will be the average of this value so

play14:08

minus 11 minus 9 is nothing but 20

play14:11

20 divided by 2 will be nothing but 10

play14:13

so this output of this node

play14:14

will be actually 20. so here i'm going

play14:16

to write it as

play14:17

output as 20 okay

play14:21

in this particular output since this is

play14:22

a single node it will be 11

play14:24

in this it will be the 5 because this

play14:27

average is nothing but 5

play14:28

right when i have done this when i have

play14:31

done this now suppose

play14:32

i take this record and i try to

play14:35

calculate what will be the output now

play14:37

right i know that first

play14:40

after this base model then will

play14:42

concatenate basically will

play14:44

then pass through this tree only after

play14:46

the base model

play14:48

so as soon as i pass this record suppose

play14:50

i am passing this first record

play14:51

first of all it will go to the base

play14:52

model the base model will be giving me

play14:54

the value as 51

play14:56

because this is 51k then suppose i

play14:58

consider my learning rate parameter

play15:01

that is alpha is 0.5 okay now this 0.5

play15:05

multiplied by the output of this tree

play15:08

now suppose this is 2 2 basically means

play15:11

will come here

play15:12

right this path will be taken here and

play15:14

once we find out the output is 20

play15:16

so here i will write it as 20 right

play15:19

this is with respect to one decision

play15:21

tree like this i can create any number

play15:22

of decision trees like i can create

play15:24

with the help of gap as my root node

play15:26

with the help of experience on some

play15:28

other records

play15:28

right i can create multiple decision

play15:30

tree so this will be my

play15:32

alpha t one then again i can have alpha

play15:35

this can be alpha one t one this can be

play15:37

alpha two t

play15:38

2 alpha 3 t 3 like this i can have any

play15:41

number of trees

play15:42

now in this particular case i have since

play15:44

i have constructed only one decision

play15:45

tree

play15:46

i am just going to use this so what will

play15:48

be the value 51 plus

play15:50

0.5 into 2 is nothing but 10.

play15:54

sorry the output uh sorry this output

play15:58

will be

play15:58

i'm extremely sorry guys this output

play16:00

will be minus

play16:02

minus 20 by 2 minus 10 okay

play16:06

i made one mistake the average output

play16:09

for this will be minus 10

play16:11

okay minus 10 so here i will write it as

play16:13

0.5

play16:14

multiplied by minus 10 right

play16:18

when i subtract it it will be nothing

play16:19

but minus 0.5

play16:22

sorry minus minus 5 okay

play16:25

this will be minus 5 so here my total

play16:28

value will be 46 that basically means

play16:30

once i pass this particular output my

play16:32

real output now

play16:34

my real output now will be 46

play16:38

then again if i try to pass with respect

play16:39

to 42 k 42 to k will again pass in this

play16:42

path

play16:43

again this will also become 46 right

play16:45

plus 0.5

play16:47

minus multiplied by minus 10 similarly

play16:49

we have to pass all these particular

play16:50

records

play16:51

calculate our value suppose this value

play16:52

is somewhere around for this value let

play16:54

me compute okay what will be this value

play16:56

okay see three is basically passing

play16:59

through this road

play17:00

node right now negative is no no is

play17:03

basically getting passed over here

play17:04

the output is 5 so i will write 51

play17:08

k plus 0.5 which is my learning rate

play17:11

multiplied by 5

play17:13

okay and here i can write 51 plus

play17:16

0.25 right five five two sorry it should

play17:19

not be 0.25 it should be

play17:21

2.5 2.5 so here i will be having

play17:25

53.5 so this will be my output over here

play17:29

right and similarly i will be trying to

play17:31

write all these values i'm just going to

play17:32

rub this guys

play17:34

i'm just going to rub this okay

play17:37

similarly i'm going to find out like

play17:38

this so it'll be here will be 62

play17:40

and here probably will be 63. then again

play17:42

i'll try to calculate my residual 2

play17:44

some value will be again coming now in

play17:47

my next iteration i'll create one more

play17:49

tree

play17:49

where i'll be taking this two input

play17:51

features and this will be my dependent

play17:52

feature and again i'll be trying to

play17:53

create a decision tree

play17:55

this will be my second decision tree my

play17:56

first decision tree is basically getting

play17:58

created here

play17:59

i have this particular output like this

play18:01

over here and after that this will be my

play18:03

second decision tree after this i will

play18:05

try to add it again my formula will be

play18:07

like what

play18:08

my base model output right plus

play18:11

alpha 1 t 1 plus alpha 2

play18:15

t 2 like this plus alpha and

play18:18

t n and like this will be my complete

play18:21

x j boost output that it will be coming

play18:23

up you know

play18:24

and that is the how we have to say there

play18:27

is also one more parameter which is

play18:29

called as gamma

play18:30

gamma basically says that suppose if i

play18:32

say gamma is 140

play18:34

or let me just write it as okay gamma is

play18:37

somewhere around 150.5

play18:40

and suppose after this split the

play18:42

information gain i got it as 140.

play18:44

if i try to subtract 150 with 140 uh if

play18:47

i

play18:48

try to subtract this 140 with 150.5

play18:51

if it is negative value then we can

play18:53

postpone this

play18:54

we can prune basically we can cut that

play18:56

particular tree if it is a positive

play18:58

value

play18:59

it will be no we should not prune it and

play19:01

again this is the kind of hyper

play19:03

parameter

play19:04

that we can set for doing the post

play19:05

pruning technique only when it is trying

play19:07

to get over fitted

play19:08

but most of the scenarios some value of

play19:10

alpha or gamma will be actually set

play19:13

the default value you can again see in

play19:15

the scale who are there also

play19:16

in that particular library some default

play19:18

value will be getting set

play19:20

so i hope you understood how xgboost

play19:22

aggressor work

play19:23

please do subscribe the channel if you

play19:24

have not already subscribe i'll see you

play19:25

in the next video

play19:26

have a great day ahead thank you bye bye

Rate This

5.0 / 5 (0 votes)

Benötigen Sie eine Zusammenfassung auf Englisch?