Introduction to Deep Learning - Part 3

AfiaKenkyu
1 May 202014:21

Summary

TLDRThis video script discusses challenges in neural network learning, particularly the vanishing gradient problem in deep networks. It introduces solutions like using the ReLU activation function and one-hot encoding. The script also explains concepts like softmax for multi-class classification and cross-entropy loss, crucial for training accurate models. Additionally, it touches on overfitting, where models perform well on training data but poorly on unseen data, emphasizing the need for generalization.

Takeaways

  • 🧠 The video discusses challenges with backpropagation in deep neural networks, particularly the vanishing gradient problem due to the network's complex architecture.
  • 📉 The vanishing gradient issue arises when the multiplication of small values (from the activation functions) leads to very small gradients, slowing down the learning process.
  • 🔄 To address this, the video suggests changing the activation function in the hidden layers to one that outputs values not constrained between 0 and 1, like the ReLU (Rectified Linear Unit).
  • 💡 The video explains the use of one-hot encoding representation in the output layer, which is not suitable for all cases, and proposes using softmax to better represent multi-class classification problems.
  • 📊 It introduces the softmax function and how it calculates the probabilities of each class based on the input from the previous layer's neurons.
  • 📈 The video touches on the concept of cross-entropy as a loss function, which is inspired by information theory and is effective for classification tasks with two or more classes.
  • đŸš« The script warns against overfitting, where a model performs exceptionally well on training data but poorly on unseen data, which is a common issue in deep learning.
  • 🔍 To illustrate overfitting, the video uses the analogy of a model that can solve homework problems well but fails on exam questions it hasn't seen before.
  • 📉 The video suggests monitoring the loss function or error function over epochs to detect overfitting, where the training loss decreases but the testing loss increases or does not improve significantly.
  • 🔧 Lastly, the video hints at strategies to combat overfitting, which will be discussed in more detail in the next video.

Q & A

  • What is the main issue discussed in the video regarding backpropagation in deep neural networks?

    -The main issue discussed is the vanishing gradient problem, which occurs due to the multiplication of small gradients during the update process of the weights in a deep neural network with many hidden layers.

  • Why does the vanishing gradient problem slow down the learning process in neural networks?

    -The vanishing gradient problem slows down the learning process because the small gradient values result in tiny updates to the weights, leading to a slow convergence of the learning algorithm.

  • What is one of the suggested solutions to address the vanishing gradient problem mentioned in the video?

    -One suggested solution is to change the activation function used in the hidden layers from functions that saturate at 0 or 1, like the sigmoid function, to functions that output values that are not constrained to a small range, such as the ReLU (Rectified Linear Unit).

  • What is the significance of using one-hot encoding representation in the output layer of a neural network?

    -One-hot encoding representation is significant because it allows for a clear distinction between different classes in classification tasks, where only one neuron in the output layer is active for a given class, with the rest being zero.

  • How does the use of softmax activation function in the output layer help in classification tasks?

    -The softmax activation function helps in classification tasks by converting the output of the network into probabilities, allowing the model to select the class with the highest probability as the predicted class.

  • What is the purpose of using cross-entropy loss function in neural networks?

    -The cross-entropy loss function is used to measure the performance of a classification model whose output is a probability value between 0 and 1. It helps in penalizing the model when the predicted probabilities are incorrect, thus guiding the model to improve its predictions.

  • What is the concept of overfitting in the context of neural networks, and how does it relate to the script?

    -Overfitting occurs when a neural network model performs well on the training data but poorly on new, unseen data. In the context of the script, overfitting is discussed as a potential issue that arises when the model is too complex and fits the training data too closely, failing to generalize well to new data.

  • How can overfitting be identified from the loss function graph during training?

    -Overfitting can be identified when the loss function graph shows a significant difference between the training loss and the validation or testing loss, with the latter being higher, indicating that the model is not generalizing well to new data.

  • What is the role of the number of hidden layers in the complexity and performance of a neural network?

    -The number of hidden layers in a neural network affects its complexity and ability to model complex functions. More layers can increase the model's capacity to learn from data, but it can also lead to issues like vanishing gradients and overfitting.

  • What is the difference between underfitting and overfitting in neural networks?

    -Underfitting occurs when a model is too simple to capture the underlying pattern of the data, resulting in poor performance on both training and testing data. Overfitting, on the other hand, happens when a model is too complex and performs well on training data but poorly on testing data.

Outlines

00:00

🧠 Deep Learning Challenges: Gradient Vanishing in Neural Networks

The paragraph discusses the challenges of using backpropagation in deep neural networks, particularly the issue of gradient vanishing. The speaker explains that the problem arises due to the complex architecture of deep neural networks with many hidden layers. This results in a multiplication of small values, leading to tiny gradients that slow down the learning process. To illustrate, the paragraph uses the sigmoid activation function to show how the gradient update formula can lead to diminishing error gradients. The speaker suggests changing the activation function in the hidden layers to functions like ReLU (Rectified Linear Unit) to allow for a greater range of values and potentially faster learning.

05:00

🔱 Improving Neural Network Output with One-Hot Encoding and Softmax

In this section, the speaker addresses the representation of output layers in neural networks, particularly the use of one-hot encoding. The paragraph explains that traditional binary representations may not be suitable for learning processes due to their limited range. The speaker then introduces one-hot encoding as a method to represent multiple classes, where each class is represented by a unique vector of zeros and ones. Additionally, the use of the softmax activation function in the output layer is discussed, which helps in finding the class with the highest probability. The speaker also mentions the use of cross-entropy as a loss function, which is inspired by information theory and helps in calculating the performance of the model based on the predicted probabilities.

10:00

📊 Analyzing Overfitting in Neural Networks

The final paragraph delves into the concept of overfitting in neural networks. Overfitting occurs when a model performs well on training data but poorly on new, unseen data. The speaker explains that this is often due to the model being too complex and fitting too closely to the training data, including its noise and outliers. The paragraph discusses how the number of linear classifiers or decision boundaries in a model can contribute to overfitting. The speaker also touches on the importance of generalization to new data and the use of loss functions and error rates to evaluate model performance. The discussion includes the visual representation of overfitting through graphs showing the difference between training and testing error rates.

Mindmap

Keywords

💡Backpropagation

Backpropagation is a supervised learning algorithm used to train artificial neural networks. It involves the calculation of the gradient of the loss function with respect to each weight by the chain rule, which then updates the weights to minimize the loss. In the video, backpropagation is discussed as a method to address the vanishing gradient problem in deep neural networks, which is a significant challenge when training networks with many layers.

💡Vanishing Gradient

The vanishing gradient problem refers to the phenomenon where gradients become increasingly small as they are propagated back through layers in a neural network, leading to very slow learning. This is a critical issue in the script, where the presenter explains that it occurs due to the multiplication of small gradients at each layer, which can significantly slow down the learning process in deep neural networks.

💡Activation Function

An activation function in neural networks determines the output of a neuron given an input or set of inputs by adding a non-linear element to the model. The choice of activation function can greatly affect the performance of the network. In the video, the presenter discusses changing the activation function in hidden layers to address the vanishing gradient problem, suggesting the use of functions like ReLU to allow for a greater range of output values.

💡ReLU (Rectified Linear Unit)

ReLU is a type of activation function that has become very popular due to its efficiency in training deep neural networks. It outputs the input directly if it is positive, otherwise, it outputs zero. The video script mentions ReLU as a solution to the vanishing gradient problem because it can help maintain larger gradients during backpropagation, thus speeding up learning.

💡One-Hot Encoding

One-hot encoding is a representation technique used for categorical variables in which each category is represented by a binary vector. In the context of neural networks, it is often used for the output layer when dealing with classification tasks. The script explains that one-hot encoding is used to represent the target classes in a classification problem, which is crucial for the training process.

💡Softmax Function

The softmax function is another activation function that is commonly used in the output layer of a neural network for multi-class classification problems. It converts a vector of real numbers into a probability distribution over predicted output classes. The video script discusses the use of the softmax function in conjunction with one-hot encoding to determine the class with the highest probability as the predicted class.

💡Cross-Entropy

Cross-entropy is a measure used in information theory and is also used as a loss function in classification problems. It quantifies the performance of a classification model whose output is a probability value between 0 and 1. In the video, cross-entropy is mentioned as a loss function that can be used to train neural networks, particularly in the context of classification tasks.

💡Overfitting

Overfitting occurs when a model learns the detail and noise in the training data to the extent that it negatively impacts the model's performance on new data. This means the model has poor generalization to an independent dataset. The video script addresses overfitting as a common problem in neural networks, where the network performs well on training data but fails to make accurate predictions on unseen data.

💡Hidden Layer

A hidden layer is a layer of neurons in a neural network that is not an input layer or an output layer. The video script discusses the impact of the number of hidden layers on the complexity of the network and the potential for vanishing gradients, as more layers can lead to smaller gradient updates and thus slower learning.

💡Neural Network Architecture

Neural network architecture refers to the design and structure of the neural network, including the number of layers and the number of neurons in each layer. The script touches on different architectures, such as deep neural networks, and how they can be affected by issues like vanishing gradients and overfitting, which are critical considerations in designing effective neural network models.

Highlights

Introduction to the third part of a series on neural networks, focusing on learning algorithms suitable for BIP (Backpropagation in Pattern) neural network architectures.

Review of previous discussions on single-layer perceptrons, multilayer perceptrons, and backpropagation in deep neural networks.

Discussion on the problem of vanishing gradients in deep neural networks due to the architecture's complexity.

Explanation of how the multiplication of small values in the hidden layers can lead to vanishing gradients and slow learning.

Proposal to change the activation function in hidden layers to address the issue of vanishing gradients.

Introduction of the ReLU (Rectified Linear Unit) activation function as a solution to the vanishing gradient problem.

Explanation of how ReLU allows for a more significant gradient and faster error reduction.

Discussion on the use of one-hot encoding representation in the output layer and its limitations.

Introduction of softmax encoding as an alternative to one-hot encoding for better representation in the output layer.

Explanation of how softmax encoding works and its advantages in classification tasks.

Introduction of the softmax function and its role in calculating the probabilities of different classes.

Discussion on the use of cross-entropy as a loss function in neural networks, inspired by information theory.

Explanation of how cross-entropy loss function works and its effectiveness in handling imbalanced datasets.

Discussion on the problem of overfitting in neural networks and its impact on model performance.

Explanation of the concept of overfitting, where a model performs well on training data but poorly on unseen data.

Introduction of techniques to prevent overfitting, such as regularization and dropout methods.

Conclusion and鱄摊 of the next topic, which will delve into overfitting in more detail.

Transcripts

play00:00

halo halo assalamualaikum warahmatullah

play00:02

wabarakatuh ketemu lagi dengan saya dia

play00:04

oke kita kembali lagi nih di topik yang

play00:07

sudah topik yang sama ya tapi ini bagian

play00:10

ketiga untuk pengenalan di pernik Anda

play00:12

perumusan ya oke nah am kemarin di

play00:16

video-video sebelumnya kita sudah ada

play00:18

review single Let's natron multilayer

play00:21

perceptron kemudian juga ada hingga adik

play00:24

Naura Network ya kemudian apa sih

play00:27

masalah backpropagation untuk Deep

play00:29

neuron Network kemudian ada di burning

play00:32

kemudian saat ini kita membahas terkait

play00:37

algoritma atau bagaimana sih caranya

play00:40

begitu ya algoritma pembelajaran yang

play00:43

cocok untuk neural Network dengan

play00:45

arsitektur yang tipe alias BIP neural

play00:48

Network oke nah Anda bisa lihat kesini

play00:53

di slide saya begitu nah eh Kemarin

play00:57

begitu ya ada problem gitu

play01:00

kalau misalnya kita menggunakan

play01:01

backpropagation untuk detailnya untuk

play01:03

apa problemnya problem utamanya itu

play01:06

adalah terkait dengan finishing gradien

play01:08

venition gradien itu kenapa kok bisa

play01:10

terjadi karena memang ke arsitekturnya

play01:13

yang sangat beach teksturnya yang dear

play01:17

maka artinya adalah Jumlah hidden layer

play01:20

itu cukup banyak Nah ini berarti

play01:23

perkalian begitu ya untuk proses update

play01:26

bobotnya Perkalian antara nilai prediksi

play01:29

yang merupakan hasil fungsi privasi dan

play01:32

fungsi aktivasi itu memetakan dari am

play01:35

0-1 atau main 1-1 Yaga dinilainya cukup

play01:39

kecil kemudian dikalikan dengan satu

play01:42

minus nah ini ya kalau seperti ini

play01:45

nyanyi katakan ini contohnya menggunakan

play01:47

fungsi aktivasi sigmoid ini adalah

play01:49

formula update bobot yang didapatkan

play01:52

dari turunan atau diferensial kos

play01:55

function makanya disebut dengan gradien

play01:57

yang jadi Venice a3dan a

play02:00

ini kalau kita lanjutkan formula ini ya

play02:02

bisa-bisa saja sebenarnya tetapi apa

play02:05

yang jadi masalah perambatan errornya

play02:08

akan kecil dan itu menyebabkan

play02:10

pembelajarannya akan lama ya Nah kenapa

play02:15

kok perambatan errornya kecil karena ini

play02:18

adalah hasil perkaliannya gitu ya kalau

play02:20

misalnya ini adalah kurang dari 1

play02:22

kemudian Ini juga pasti kurang dari 1 ya

play02:25

prediksikan bersatu minus juga kurang

play02:27

dari 1 otomatis nilainya akan semakin

play02:29

kecil nilainya semakin kecil jadi dari

play02:33

layer output ke hidden dia sudah kecil

play02:36

tambah kecil lagi ke hidden ke hidden

play02:38

tambah kecil tambah kecil yang dapat ke

play02:41

input untuk feedback kemudian kembali

play02:43

lagi begitu yaitu akan semakin kecil

play02:45

maka perambatan errornya itu sangat

play02:47

kecil Nah inilah yang Nanti lama-lama

play02:49

kalau terlalu banyak jumlah layarnya di

play02:53

header Joomla hidden layer Nya maka

play02:55

gradiennya itu adalah kan finish gitu

play02:57

gradien dimaksud disini adalah

play02:58

perambatan errornya

play03:00

Mama 0,0000 rambatan error yang sangat

play03:03

kecil itu otomatis terang apa

play03:06

pembelajarannya akan lambat kan gitu ya

play03:08

Nah Oleh karena itu bagaimana

play03:12

mengkoreksi backpropagation gitu ya Apa

play03:15

yang dilakukan untuk memperbaiki dari

play03:17

perfection Jadi yang pertama begitu tadi

play03:21

problemnya Apa sih yang pertamanya

play03:23

problemnya yaitu nilai inian hasil

play03:28

fungsi aktivasi diprediksi ini

play03:32

dipecahkan dinormalisasi dengan fungsi

play03:34

aktivasi ya jadi antara 0-1 gitu ya

play03:38

Misalnya Nah itu kan cukup kecil

play03:39

angkanya nah Bagaimana sih supaya itu

play03:42

bisa di handle yakin nah salah satu

play03:45

caranya yang pertama adalah langkahnya

play03:48

mengubah fungsi aktivasi ya mengubah

play03:52

fungsi aktivasi di hidden layer itu ya

play03:57

normalisasinya bukan dari nol

play04:00

satu atau minus 1-1 itu tapi kita

play04:03

bebaskan saja jadi dia menggunakan

play04:05

maksudnya nilai maksimumnya Ya silahkan

play04:08

saja tidak harus maksimumnya adalah satu

play04:11

karena kalau Maksudnya satu nantikan

play04:13

perkalian yang nilainya kurang dari saat

play04:16

lebih besar dari nol dan kurang dari 1

play04:18

itu kalau semakin eh screens

play04:22

perkaliannya semakin banyak itu Thomas

play04:24

kan nilainya semakin kecil errornya

play04:25

semakin kecil Nasa yang diusulkan yang

play04:28

pertama adalah menggunakan Lightyear

play04:30

activated linear unit atau sering

play04:33

disebut dengan Real Jadi Real ini fungsi

play04:36

aktivasinya dia adalah nol ya untuk

play04:39

expert udah nanti dia mengikuti fungsi

play04:42

linear Jadi kalau misalnya esnya besar

play04:45

ya Otomatis dianya juga bisa mengikuti

play04:48

besar gitu ya Nah ini akan mengurangi

play04:50

harapannya nilai Error itu dirambatkan

play04:52

nya lebih cepat gitu itu yang pertama Ya

play04:55

menggunakan fungsi aktivasi ya Kemudian

play04:57

yang kedua yaitu fungsi

play05:00

Classy dah buka fungsi aplikasi ya

play05:02

istilahnya di output layer gitu ya dia

play05:05

akan menggunakan yang disebut dengan one

play05:08

hot encoding representation ini yang

play05:11

saya Terangkan dimulai persiapkan

play05:13

sebelumnya am sebelum ada di burning

play05:17

gitu ya sebagian besar menggunakan Benny

play05:19

representation gitu dan itu kurang cocok

play05:21

ya Kenapa Anda bisa melihat di sini

play05:30

Hai Serius Anda bisa melihat di sini ya

play05:34

ampun oke di sini kita punya 5 kelas

play05:39

direpresentasikan untuk Banery

play05:41

representation itu dengan tiga neuron

play05:44

betul ya Nah berarti proses

play05:46

pembelajarannya kan targetnya kalau

play05:48

kelas 1 tuh ye satunya nol ya doanya

play05:50

Noya 30 lain untuk 5 kelas tetapi ada

play05:54

yang tersirat sebenarnya disitu berarti

play05:57

kalau angka-angka yang lain misalnya

play05:59

saya punya ya satunya satu y21 y31 lu

play06:03

gimana oleh kira-kira lima kelas kan

play06:05

gitu berarti ada Elsanya ncan gitu ya

play06:08

Nah ini sesuatu yang kurang tepat untuk

play06:10

merepresentasikan proses pembelajarannya

play06:12

maka menggunakan yang disebut dengan

play06:15

warhot encoding gitu jadi disini kalau

play06:19

kelas 1 yeay satunya itu satu yang

play06:21

lainnya nol begitu + 2y 2-nya nol yang

play06:26

lain Sorry kelas 2 Yesaya 2-nya satu

play06:29

yang lain

play06:30

10 kan begitu ya Jadi ini targetnya nah

play06:34

disini menggunakan softlens unit ya

play06:37

bagaimana semixxi.net itu softlens unit

play06:40

adalah proses kita mencari tinggal

play06:43

mencari saja ya amin Iya kita mencari

play06:47

ini softlens unit kita punya X1 sampai

play06:50

dengan XB misalnya Iya kemudian ini

play06:53

adalah output layernya begitu output

play06:56

lahirnya nah ih ada bobot Nah ini kan

play07:00

kita tinggal bisa mendapatkan nilai y

play07:03

kannya gitu ya YK nya itu adalah hasil

play07:06

some Eh Sam of class project production

play07:10

gitu jadi perkalian gitu ya Perkalian

play07:12

antara bobot dengan input masuk disini

play07:15

hasil perkaliannya kemudian kita lakukan

play07:18

ya yaitu mencari probabilitas terbesar

play07:23

kita cari nilai yang paling besar jadi

play07:25

kalau ini yang paling besar Maka itulah

play07:28

yang terpilih begitu Itu soft

play07:30

jadi semuanya mirip dengan m mencari

play07:33

probilitas aja gitu jadi ambil yang

play07:36

terbesar Nia s1ke itu adalah jika itu

play07:41

sama s1ke itu adalah artinya adalah

play07:43

hasil perkalian begitu hasil perkalian

play07:47

dari seluruh neuron dihidden sebelumnya

play07:53

dengan bobot yang berkorespondensi

play07:55

begitu di seenggaknya ada hasilnya e nah

play07:59

jika nya ini itu adalah probabilitas ini

play08:05

ya nilai ini terhadap keseluruhannya

play08:07

jadi dibagi di dibagi oleh Sam

play08:09

seluruhnya intinya konsepnya mana yang

play08:12

paling besar itulah yang dipilihkan gitu

play08:14

aja ya nilai airnya ini Perkalian antara

play08:18

neuron di sebelum dihidden sebelumnya

play08:22

dikalikan dengan bobot yang

play08:23

berkorespondensi Gun mana yang paling

play08:26

besar itulah yang terpilih begitu itulah

play08:28

yang berangkat nilai

play08:30

ia kemudian yang terakhir adalah Cross

play08:33

entropi jadi closed jadi kos fashionnya

play08:35

nanti menggunakan Cross entropi begitu

play08:40

yang oke nah croce andropia kos function

play08:44

gitu ya Ah ini menggunakan sebenarnya

play08:47

kalau misalnya anda belajar terkait

play08:49

information theory ya teori informasi

play08:51

jadi di sini ada ada datanya sejumlah NR

play09:00

vaksinnya yaitu 1 sampai dengan n kita

play09:02

akan menghitung begitu ya bagaimana gitu

play09:05

Ini tekkaya satu minus Teja di warnet

play09:10

target teclast teh gini ada dua kelas ya

play09:13

apa sore ini ada target gitu sorry ya

play09:16

ini saya terbawa yang teori informasi

play09:19

yang jadi ini terilhami atau

play09:21

terinspirasi dari teori informasi gitu

play09:24

ya ini target ke target untuk data yang

play09:28

keka kemudian lock

play09:30

di begitu ya data yang kekah ditambah

play09:34

satu minus t-lock satu minus y ke data

play09:40

kekal begitu ya Nah prinsipnya Bagaimana

play09:43

prinsipnya gini kalau misalnya kita

play09:46

punya data ya m data kelas 1nya misalnya

play09:52

dua kita punya data dengan 2 kelas gitu

play09:54

ya dua kelas itu data kelas pertama 150

play10:00

totalnya 200 gitu ya data kelas

play10:02

pertamanya sasima puluh ya kemudian data

play10:05

yang berikutnya 50 gitu saat 50 dan 50

play10:10

gitu ya Nah itu kalau saya kita pakai

play10:12

error function yang sama fscore error

play10:16

begitu ya sama of error atau setengahnya

play10:20

dengan konsep yang seperti itu ini

play10:23

Pokoknya prediksi dikurangi target

play10:25

prediksi dikurangi target maka akan

play10:26

otomatis kalau dia tidak peduli gitu

play10:30

Hai dengan engkau Bagaimana efek dari

play10:37

data itu jumlah data itu terhadap

play10:40

errornya gitu harusnya kan lebih dominan

play10:42

yang kelas yang jumlahnya lebih banyak

play10:45

gitu ya Nah di sini kita kembali ya jadi

play10:51

disini menggunakan teori informasi ya

play10:55

oke nah ada masalah nih buat di perning

play11:00

gitu ya Nah sendiri bukan masa depan nih

play11:03

tapi masalah untuk Deep neuron itu

play11:05

kenapa Karena di Netral Network itu

play11:08

Subhana arsitekturnya yang sangat BIP ya

play11:11

Otomatis kan hipotesanya konsepnya

play11:14

adalah linier kasih file-nya tuh banyak

play11:18

gitu bisa contohnya Ini dengar klasik

play11:22

file-nya banyak gitu Ini juga bisa kita

play11:25

buat segidelapan j8 Plus

play11:27

Ayo kita bisa buat 10 kelas nah semakin

play11:31

banyak itu Ya semakin akurat sih untuk

play11:34

data trainingnya lebih bagaimana untuk

play11:37

data testingnya Nah itulah overfed jadi

play11:40

dia terlalu Fit untuk data training

play11:42

tetapi hasilnya tidak terlalu baik gitu

play11:46

ya untuk data testing pada kita ingin

play11:49

membuat model itu kan nanti untuk

play11:50

testing Jadi kalau ada Kalau contohnya

play11:53

ada ujian itu bisa gitu ya kalo ngerjain

play11:58

soal-soal tugas yang sudah diberikan

play12:01

tapilatu dikasih soal ujian nggak bisa

play12:03

mengerjakan itu kan enggak baik gitu ya

play12:04

ini overfed yang jadi overfitting itu

play12:07

adalah dia baik untuk data training ya

play12:10

bagus tetapi untuk beta testing dia

play12:12

tidak sebaik atau performanya tidak

play12:15

seberapa bagus data training kita tidak

play12:17

bagus banget training nah tidak sebagus

play12:20

data trainingnya kiri overfit itu

play12:22

terlalu Fit terhadap data training

play12:24

konsepnya seperti itu Jadi kalau

play12:26

misalnya gambarannya

play12:27

Hai ini tadi ada tanya ini kok bisa kita

play12:33

banyak karena hit one di planet itu

play12:36

semakin banyak linear classifier nya

play12:38

gitu ya maka kok semakin miris terlalu

play12:43

Fit gitu masih terlalu Fit nanti ya

play12:45

memodekan mengklasifikasi yang untuk

play12:47

data training yang kita enggak kita

play12:49

enggak karena nanti kalau disini

play12:51

misalnya untuk data dan disini kok ada

play12:53

yang warna biru di sini ya itu di sini

play12:55

nih warna biru tapi karena kita mau

play12:57

dengan mudahnya terlalu Fit dengan garis

play12:59

5 ini dia warna biru ini enggak akan

play13:01

terdeteksi sebagai tidak terkasih ke

play13:03

kasifikasi ada testing biru yang di sini

play13:06

dangdut Antara ini menjadi Abby rumah

play13:09

jadi orange kan gitu Itu artinya overfit

play13:12

ya Modelnya nah nah of itu secara

play13:16

perhitungan bisa dilihat dari fungsi

play13:21

grafik antara Lost function atau error

play13:24

function dengan epoh nya Nah ini ya A

play13:27

Anda bisa lihat bedanya kalau yang merah

play13:30

adalah training jadi dia semakin banyak

play13:32

efeknya memang seharusnya semakin kecil

play13:34

kemudian yang testing itu juga harusnya

play13:37

sama dan fenol nih dia bisa sebaik ya

play13:44

sama baiknya atau bahkan lebih baik

play13:46

daripada training data testing Wow ini

play13:48

bagus sekali ini modelnya seperti ini ya

play13:51

ideal ini yang diharapkan tetapi

play13:54

kebanyakan apa yang terjadi tetap

play13:56

trennya kecil-kecil ya tapi data

play13:59

testingnya Oh ya ini masih belum terlalu

play14:01

buruk gitu ya Masnya lebih jelek ini

play14:04

testingnya daripada lening ada yang

play14:06

mungkin langsung jret gitu aja ya udah

play14:08

jauh lebih tinggi errornya Nah inilah

play14:10

yang disebut dengan overfeed ya Oke jadi

play14:14

berikutnya kita akan membahas overfed

play14:16

video kali ini saya cukupkan dulu kita

play14:19

berikutnya akan langsung masuk ke over

Rate This
★
★
★
★
★

5.0 / 5 (0 votes)

Étiquettes Connexes
Neural NetworksBackpropagationDeep LearningMachine LearningAlgorithmsArtificial IntelligenceData ScienceTech EducationCoding TutorialAI Architecture
Besoin d'un résumé en anglais ?