Mengenal si GRU

Anak AI

19 Dec 202102:49

Summary

TLDRIn this video, the concept of GRU (Gated Recurrent Unit) is explained in detail. GRU is a simpler alternative to LSTM, with fewer parameters, making it ideal for smaller datasets. The GRU architecture uses two key gates: the reset gate, which controls how much of the previous hidden state to forget, and the update gate, which decides the proportion of the previous hidden state and the candidate activation vector to use for the next step. GRU is more efficient due to the absence of an output gate, resulting in fewer parameters compared to LSTM.

Takeaways

😀 GRU (Gated Recurrent Unit) has fewer weights compared to LSTM, making it potentially more suitable for small datasets.
😀 A small dataset with many parameters has a higher chance of overfitting, so GRU could be a better choice in this case.
😀 The diagram for GRU shows a single hidden state that carries over to the next time step, unlike LSTM, which also has a cell state.
😀 GRU has a reset gate that combines the previous hidden state and the current input with weights and bias (W_r and b_r) before applying a sigmoid function.
😀 The reset gate in GRU is similar to the forget gate in LSTM, determining how much of the previous hidden state to carry forward.
😀 The update gate in GRU, like the reset gate, combines the previous hidden state and the current input with weights and bias (W_z and b_z), followed by a sigmoid function.
😀 The update gate in GRU functions similarly to the input gate in LSTM.
😀 The candidate activation vector in GRU uses a filtered previous hidden state, the current input, and specific weights (W_h and b_h), then applies the tanh function.
😀 The candidate activation vector in GRU is similar to the input layer in LSTM but differs in that it uses a filtered hidden state, while LSTM uses the raw previous hidden state.
😀 The final output in GRU combines the previous hidden state (scaled by 1 minus the update gate) and the candidate activation vector (scaled by the update gate), determining the proportion of each in the final output.
😀 GRU does not have an output gate, contributing to its fewer parameters compared to LSTM.

Q & A

What is the key difference between GRU and LSTM in terms of parameters?
-The key difference is that GRU has fewer parameters compared to LSTM, which makes it more suitable for small datasets. Fewer parameters reduce the risk of overfitting in situations with limited data.
Why is GRU considered more suitable for small datasets?
-GRU is more suitable for small datasets because its simpler structure and fewer parameters reduce the likelihood of overfitting, a common issue when using more complex models like LSTM with limited data.
How does the GRU handle the hidden state differently from LSTM?
-In GRU, only the hidden state is passed to the next time step, unlike LSTM, which also uses a cell state along with the hidden state to preserve long-term information.
What is the purpose of the reset gate in GRU?
-The reset gate in GRU controls how much of the previous hidden state should be considered when calculating the new hidden state. It is used to filter out irrelevant past information, similar to the forget gate in LSTM.
How does the update gate in GRU work?
-The update gate in GRU determines the proportion of the previous hidden state and the new candidate activation vector that should be combined to form the final hidden state. It acts similarly to the input gate in LSTM.
What role does the candidate activation vector play in GRU?
-The candidate activation vector is created by combining the filtered previous hidden state with the current input, and it is processed through a tanh function. This vector helps in generating the new hidden state.
How is the final output of GRU calculated?
-The final output of GRU is a combination of the previous hidden state (scaled by 1 minus the update gate) and the candidate activation vector (scaled by the update gate). This determines how much of the new information and old information should be included in the output.
What is the missing component in GRU that is present in LSTM?
-The missing component in GRU compared to LSTM is the output gate. This makes GRU a simpler model with fewer parameters.
How does the GRU model compare to RNN in terms of structure?
-GRU, like RNN, uses a recurring structure across time steps. However, GRU introduces gates such as the reset and update gates, which are designed to improve performance and mitigate issues like vanishing gradients, which RNNs struggle with.
Why does the GRU have fewer parameters than LSTM?
-GRU has fewer parameters than LSTM because it lacks certain components, such as the cell state and output gate, which reduces the complexity of the model and makes it more efficient.