Deep Reinforcement Learning Tutorial for Python in 20 Minutes
Summary
TLDRこのビデオでは、強化学習の基礎について解説しています。ディープラーニングや機械学習と異なり、強化学習はライブ環境でのトレーニングに重点を置きます。ビデオでは、OpenAI Gymを使用して環境を作成し、TensorFlowとKerasでディープラーニングモデルを構築し、Keras-RL2でポリシーベースの学習を通じて強化モデルをトレーニングする方法を紹介しています。最終的に、トレーニング済みのモデルを保存し、必要に応じて再ロードしてプロダクションに展開することができるデモンストレーションを行います。
Takeaways
- 🤖 強化学習とは異なる学習手法で、リアルタイム環境でのトレーニングが特徴です。
- 🧠 強化学習の核心概念を覚えるための記憶術として「AREA 51」が使われ、AはAction、RはReward、EはEnvironment、AはAgentを表します。
- 🚀 動画ではOpen AI Gymを使用して環境を作成し、TensorFlowとKerasでディープラーニングモデルを構築し、KerasRLで強化学習モデルをトレーニングする方法が説明されています。
- 🐍 PythonとJupyter Notebook内で作業を行い、依存関係としてTensorFlow、Keras、KerasRL、Open AI Gymをインストールすることが必要です。
- 🎮 Open AI Gymには多数の事前構築済み環境があり、特にCartPoleという環境を使用してモデルをテストします。
- 📈 CartPole環境では、カートを動かしバネを倒さないように保つことでポイントを獲得し、最大200ポイントを目指します。
- 🔧 KerasのSequential APIを使用してディープラーニングモデルを構築し、FlattenとDenseレイヤーを組み合わせてモデルを作成します。
- 🤹♂️ KerasRLを使用してDQN(Deep Q-Network)エージェントをトレーニングし、ポリシーベースの学習を行っています。
- 💾 トレーニングされたモデルの重みを保存し、後で再読み込みすることで、モデルをプロダクション環境にデプロイすることができます。
- 📊 最終的な結果として、強化学習によってCartPoleのスコアが大幅に向上し、ほぼ完璧な200ポイントのスコアを達成することが示されています。
Q & A
強化学習とはどのような学習手法ですか?
-強化学習は、主にエージェントが環境と相互作用を通じて最適な行動を選択するように学習する手法です。これは、教師あり学習や非教師あり学習とは異なるアプローチです。
強化学習における'A.R.E.A.'とは何を表しますか?
-A.R.E.A.は、強化学習モデルで必要な4つの要素を表しており、それぞれAction(行動)、Reward(報酬)、Environment(環境)、Agent(エージェント)を意味します。
OpenAI Gymとは何ですか?
-OpenAI Gymは、強化学習アルゴリズムのテストや開発に使用される、環境とアルゴリズムの標準化されたツールキットです。
カートポール環境とはどのようなものですか?
-カートポール環境は、強化学習の基本的なテスト環境で、カートを動かして棒を倒れないように保つゲームです。報酬は、棒が倒れるたびに減少し、目標は200ポイントまで獲得することです。
TensorFlowとKerasはどのようにして強化学習モデルを構築するのですか?
-TensorFlowとKerasを使用して、順伝播ニューラルネットワーク(Sequential model)を構築し、強化学習エージェントの行動を決定するモデルを作成します。
DQN(Deep Q-Network)とは何ですか?
-DQNは、価値ベースの強化学習アルゴリズムの1つで、ディープラーニングを用いて行動価値関数を学習します。
ボルツマンQポリシーとは何ですか?
-ボルツマンQポリシーは、強化学習におけるポリシーの一種で、行動選択の確率をQ値に基づいて計算します。これは、探索と利用のバランスを取るために使用されます。
Sequential Memoryとは何で、なぜDQNエージェントで必要なのですか?
-Sequential Memoryは、DQNエージェントが過去の経験を記憶し、最適な行動を選択するためのメモリです。これは、長期的な状況を考慮する必要がある強化学習タスクで特に重要です。
モデルの重みを保存し、後で再利用するにはどうすればよいですか?
-DQNモデルの重みは、save_weightsメソッドを使用して保存でき、load_weightsメソッドで再ロードすることができます。これにより、トレーニング済みのモデルを再利用して、新たな環境やタスクに適用することが可能です。
このビデオスクリプトで使用されたJupyter Notebookとは何ですか?
-Jupyter Notebookは、Pythonコードを実行し、分析、可視化を行えるウェブベースのインタラクティブな開発環境です。これにより、データサイエンスや機械学習のタスクをより効率的に実行できます。
Outlines
😀 リンforcement Learningの概要
この段落では、ニコラスが強化学習(Reinforcement Learning)の基本を紹介しています。強化学習は、ディープラーニングやマシンラーニングとは異なるアプローチで、ライブ環境でのトレーニングが特徴です。強化学習の4つの主要概念をArea 51と記憶するようアドバイスされ、それらはアクション(Action)、報酬(Reward)、環境(Environment)、エージェント(Agent)です。このビデオでは、Open AI Gymを使用して環境を作成し、TensorFlowとKerasでディープラーニングモデルを構築し、KerasRLで強化学習モデルをトレーニングするプロセスが説明されています。
🔧 Open AI Gymを使った環境の構築
このセクションでは、Open AI Gymを使用して強化学習のための環境を構築する方法が解説されています。Open AI Gymには、様々な事前構築済みの環境があり、特にCartPoleという環境が選ばれています。CartPoleでは、カートを動かし、棒を倒さないように保つゲームです。報酬はステップごとに1ポイント、最大200ポイントが目標です。ランダムステップで環境を可視化し、強化学習モデルがトレーニングされる前のランダムな動作を確認できます。
🤖 Kerasを使用したディープラーニングモデルの構築
この段落では、KerasのSequential APIを利用してディープラーニングモデルを作成するプロセスが説明されています。モデルは4つの状態を入力として取り、2つのアクション(左または右への移動)を出力します。モデルの概要は、Flattenレイヤーと2つのDenseレイヤーを通じて構成され、ReLUアクティベーション関数が使用されています。このモデルは、強化学習のトレーニングに使用されます。
🛠 Keras RLを使った強化学習モデルのトレーニング
ここでは、Keras RLを利用してDQN(Deep Q-Network)エージェントをセットアップし、強化学習モデルをトレーニングする方法が紹介されています。DQNエージェントは、ポリシーベースの学習手法を使用してトレーニングされ、Boltzmann Q Policyが適用されています。トレーニングには、環境、ステップ数、可視化の有無、ログの詳細レベルが指定されています。トレーニングが終了すると、モデルは200ポイント近くまで到達することがわかります。
🏆 モデルのテストとデプロイ
最後に、トレーニングされた強化学習モデルのパフォーマンスをテストし、モデルの重みを保存して後で再利用する方法が説明されています。DQNモデルのテストでは、ほぼすべてのエピソードで200ポイント近くのスコアが達成されています。モデルの重みは.h5fファイルとして保存され、後で再ロードして同じ環境でテストすることができます。このプロセスは、モデルをプロダクション環境にデプロイする際に役立ちます。
📚 総括とリソースへのリンク
ビデオの最後に、カバーされたトピックの総括が行われ、依存関係のインストール、Open AI Gymでのランダム環境の作成、KerasとKeras RLを使用したディープラーニングモデルのトレーニング、そしてメモリからのエージェントの再ロードが説明されています。また、このビデオが役立った場合は、いいね、登録、ベルのアイコンをクリックするよう呼びかけられ、質問やヘルプが必要な場合はコメント欄でサポートが提供されます。コースの資料、GitHubリポジトリ、ドキュメントへのリンクが説明の下に提供されています。
Mindmap
Keywords
💡強化学習
💡ディープラーニング
💡カートポル(CartPole)
💡OpenAI Gym
💡KerasRL
💡ポリシーベース学習
💡DQN(Deep Q-Network)
💡環境(Environment)
💡報酬(Reward)
💡アクション(Action)
Highlights
介绍了强化学习与传统的监督学习和无监督学习的区别。
强化学习的核心概念可以用'AREA 51'来记忆,分别代表Action、Reward、Environment和Agent。
使用Open AI Gym创建环境,开始强化学习模型的构建。
通过TensorFlow和Keras构建深度学习模型,并将其传递给KerasRL进行强化学习训练。
介绍了如何在Python和Jupyter Notebook中工作,构建和训练强化学习模型。
展示了安装TensorFlow、Keras、KerasRL和Open AI Gym等依赖库的过程。
使用Open AI Gym中的CartPole环境来测试强化学习模型。
CartPole环境的目标是通过左右移动小车来平衡杆子,每步获得1分,最高200分。
展示了如何使用Python代码随机操作CartPole环境,并观察其表现。
介绍了如何构建深度学习模型,使用Keras的Sequential API和Dense层。
展示了如何使用KerasRL训练深度学习模型,使用DQN(深度Q网络)代理。
介绍了使用Boltzmann Q策略进行策略型强化学习。
展示了如何使用Sequential Memory类来维护DQN代理的记忆。
展示了如何编译和训练DQN模型,并通过可视化观察训练过程。
展示了训练后的DQN模型在CartPole环境中的表现,能够达到接近200分的高分。
介绍了如何保存和重新加载DQN模型的权重,以便在生产环境中部署。
展示了如何重建DQN代理并重新加载权重,测试其在CartPole环境中的表现。
视频最后提供了课程材料和GitHub仓库链接,帮助观众开始他们的强化学习模型。
Transcripts
before reinforcement learning
after reinforcement learning
what's happening guys my name is
nicholas and in this video we're going
to be going through a bit of a crash
course on reinforcement learning
now if you've ever worked with deep
learning or machine learning before you
know the two key forms are supervised
and unsupervised learning
now reinforcement learning is a little
bit different to that because you tend
to train
in a live environment now there's a
really easy way to remember the core
concepts in reinforcement learning
all you need to remember is area 51. now
you're probably thinking what the hell
does area 51 have to do with
reinforcement learning
well the area in area 51 stands for
action reward environment and agent
these are the four key things you need
in
any reinforcement learning model now in
this video we're going to be covering
all of those key concepts let's take a
deeper look as to what we're going to be
going through so in this video we're
going to cover
everything you need to get started with
reinforcement learning we're going to
start out by creating an environment
using open ai gym
we're then going to build a deep
learning model using tensorflow and
keras
this same model will then pass to
kerasrl in order to train our
reinforcement learning model
using policy-based learning now in terms
of how we're going to be doing it we're
going
to be largely working within python and
specifically
we're going to be working inside of a
jupyter notebook we'll start out by
building our environment using open ai
gym
we'll then build our deep learning model
again using tensorflow and keras
and then once we've built that model
we're then going to train it using
kerasrl
we'll then be able to take that same
model save it down into memory and
reload it for when we want to deploy it
into production
ready to get to it let's do it so
there's a couple of key things that we
need to do in order to build
our deep reinforcement learning model so
specifically we need to first up
install our dependencies then what we're
going to do is build an environment with
open ai gym with just a couple of lines
of code
so this is going to allow us to see the
environment that we're actually using
reinforcement learning in later on
then we're going to build a deep
learning model with keras so we're
specifically going to be using the
sequential api there
and then what we're going to do is train
that keras model using keras
reinforcement learning
and last but not least we're going to
delete it all and reload that agent from
memory so this is going to allow you to
deploy it into production if you want to
later on
so first up let's install our
dependencies so what we're going to need
here
is tensorflow keras kerasrl as well as
open ai
gym
so what we've done is we've installed
our four key dependencies so we've used
pip
install and specifically we've installed
tensorflow 2.3.0
we've installed open ai gym so that's
just gym
we've installed keras and we've also
installed keras rl2
so those are all our dependencies now
done and installed
now what we can actually go and do is
set up a random environment with open ai
gym
now open ai gym comes with a bunch of
pre-built
environments that you can use to test
out reinforcement learning on
so if we actually head on over to
gym.openai.com
you can see there's a bunch of random
environments so
here we've got some algorithms we've got
atari games so if you wanted to build
atari
or video game style reinforcement
learning engines you could
we're going to be working on these
classic control ones and specifically
we're going to be using cartpol and so
the whole idea behind carpol is that you
want to basically
move this cart down the bottom here in
order to balance the pole
up there so the whole idea is that for
each step you take you get a point with
a maximum of
200 points so ideally what we're going
to see when we start off is with our
random steps we're not going to get
anywhere near 200 but
once we use deep learning and
reinforcement learning we ideally should
get a much closer to actually hitting
our final result
now we've got two movements we can
either go left or right so
what we're going to see is when we
create our environment we're going to
have two actions available either left
or right
if you work in different reinforcement
learning environments you might have a
different number of actions that you can
take so for example you might go up or
down left or right
if you're working with other things so
now what we're going to do is
set up this environment so you can work
with it within python so if we go back
to our jupyter notebook
let's start setting that up so the first
thing that we need to do is import our
dependencies so
in order to do that we're going to
import openai gym and we're also going
to import the random library so we can
take a bunch of random steps
so those are our two key dependencies
imported so
and this is specifically for our open ai
gym so we've imported gym
and we've also imported random now what
we can go and do is actually set up that
environment
so that's our environment set up so what
we went and did there is we used
the open ai gym library and specifically
we used
the make method to build our carpol
environment so remember that was the
carpol environment that we saw here
we then extracted the states that we've
got so this is available through env
which is our environment that we just
set up
observation space dot shape so we're
taking a look at all the different
states that we've got available
within our environment and we've also
extracted the action so if you take a
look we're getting that from our action
space
and we can see that we're going to have
a specific number of actions so if we
take a look at our states
we've basically got four states
available and if we take a look at our
actions
we've got two actions so basically those
are left or right moving
our carpal left or right now what we can
actually go and do is actually
visualize what it looks like when we're
taking random steps within our carpol
environment
so ideally what we'll see is that our
carpals just sort of moving randomly
because we're taking random steps in
order to get a specific score so
remember with each step that we take
where our carpol hasn't fully fallen
over
we're going to get one point with a
maximum of 200 points so
let's build our random environment
all right so we've written a bit of code
there now what we're actually going to
do is
start by breaking this down from here so
the first thing that we're going to do
is render our environment so this is
going to allow us to see our cut in
action when it's moving left and right
then what we're doing is we're taking a
random step so we're either
going left or right so zero or one
basically represents one of those steps
so we're just taking a random choice
to see how that impacts our environment
then what we're doing is we're actually
applying
that action to our environment and we're
getting a bunch of stuff as a result of
that
so we're getting our state we're getting
our reward we're getting whether or not
we've completed the game so whether or
not we've failed or whether or not we've
passed
and we're also getting a bunch of
information then
based on our step we're going to get a
reward so remember if we take a step in
the correct direction and we haven't
failed we get one point
this basically allows us to accumulate
our entire reward
now if we fail or if we get to the end
of the game then
done is going to be set to true so what
we're doing is we're continuously taking
steps until we're complete
so we reset the entire environment up
here and then we're also printing out
our final reward so ideally what we'll
get is
the episode number as well as our score
so
let's go on ahead and run that and see
our episodes live and in action actually
it looks like we've got a bug there
episode
all right so you can see our carts
moving and it's moving randomly
and you can see that our pole is sort of
flailing about now what we're actually
logging out is the score each time so it
looks like
we're surpassing a specific threshold
and we're failing so we're only getting
up to a maximum of about
38 so that's our maximum score now
ideally what we want to be able to get
is
all the way up to 200 and this is where
reinforcement learning comes in
so basically our deep learning model is
going to learn the best action to take
in that specific environment in order to
maximize our score
now this all starts with a deep learning
model so let's go ahead and start
creating a deep learning model
now in order to do that we first up need
to import some dependencies and these
are largely going to be our tensorflow
keras dependencies
so let's go ahead and import those
so we've imported our dependencies so
we've specifically first up imported
numpy so this is going to allow us to
work with numpy arrays
then we've imported the sequential api
so this is going to allow us build a
sequential model with keras
then we've also imported two different
types of layers so specifically we've
imported
our dense node as well as our flatten
node and last but not least we've
imported the atom optimizer so that's
going to be the optimizer that we use
to train our deep learning model now
what we can go and do is actually go and
build that model so we're going to build
this
wrapped inside of a function so we can
reproduce this model whenever we need to
so that's our build model function
defined so what we've basically gone and
done is created
a new function called build model and to
that we're going to pass two arguments
so specifically our states
so these were the states that we
extracted from our environment up here
and we're also going to pass through our
actions so these are going to be the two
different actions that we've got in our
carpol environment
in order to build our deep learning
model we're first instantiating our
sequential model then we're passing
through the
flatten node and specifically to that
we're going to be passing through
a flat node which contains our different
states so remember our four different
states that we had
then we're adding two dense nodes to
start building out our deep learning
model with a relu activation function
and last but not least our last dense
node has our actions so this is
basically going to mean
that we pass through our states at the
top and we
pass through our actions down the bottom
so ideally what we should be able to do
is
train our model based on the states
coming through to determine the best
actions
to maximize our reward or our score that
we can see here
so let's go ahead and create an instance
of that model just by using that build
model function
and we can also visualize what the model
looks like using the model.summary
function
so you can see here that we're passing
through our four different states
we've got 24 dense nodes 24 dense nodes
so these are going to be our fully
connected layers within our neural
network
and then last but not least we're going
to be passing out our two different
actions that we want to take within our
environment now what we can go and do is
take this deep learning model and
actually train it using keras rl
so first up we need to import our keras
rl dependencies so let's go ahead and do
that
so those are our dependencies imported
so we've imported
three key things here so we've imported
out a deep
queue network agent so basically there's
a bunch of different agents within
the keras rl environment so you can see
we've got a dqm agent a naffa agent
ddpg sasa sem so all of these are
different agents that you can use to
train
your reinforcement learning model we're
going to be using dqn for this
particular video but
try testing out some of the others and
see how you go now what we
also have is a specific policy so within
reinforcement learning you've got
different styles
so you've got value-based reinforcement
learning and you've also got
policy-based reinforcement learning so
in this case we're going to be using
policy-based reinforcement learning and
the specific policy that we're going to
be using
is the boltzmann q policy which you can
see here
now the last thing that we've gone and
imported is sequential memory so for
our dqn agent we're going to need to
maintain some memory
and the sequential memory class is what
allows us to do that
so now what we can go and do is set up
our agent and again we're going to wrap
this inside of a function so we can
reproduce it when we want to reload it
from memory so let's go
ahead and build that function
so that's our function defined now what
we've basically done is we've named our
function build
agent and to that we pass through our
model so this is
our deep learning model that we
specified up here and we're also passing
through the different actions that we
can take so those were the two different
actions
left or right that we had available
within our environment
then we set up our policy we set up our
memory
and we set up our dqn agent and to that
dqn agent we actually pass through our
deep learning model
and memory our policy as well as a
number of
other keyword arguments so then what we
do is we return that dqn
agent so let's go on ahead and actually
use this dqn agent to actually now
go and train our reinforcement learning
model so first up we want to start out
by instantiating our dqm model
then we're going to compile it and then
we're going to go ahead and fit
all right and there you go so you can
see that our dqn model is now starting
to train
so what we actually did is we
instantiated our or we used our build
agent function to set up a new dqm model
and that was that up here
and we passed through our model as well
as our actions
we then compiled it and we passed
through our optimizer so this was that
atom optimizer that we imported right at
the start
and we also passed through the metrics
that we want to track so in this case
it's mean
absolute error then we use the fit
function to kick off the training
and to that we pass through our entire
environment the number of steps we want
to take
whether or not we want to visualize it
so we'll take a look at that in a second
and we also specified verbose as one so
we don't want full logging we want a
little bit of logging
now what we can do is just let that go
ahead and train to take a couple of
minutes and then
we should have a fully built
reinforcement learning model
five minutes later sweet so that's our
reinforcement
learning model now done dusted and
trained so all
up it took about 256 seconds to go and
train and you can see
in our fourth interval that we're
accumulating a reward of about 200
now what we can go and do is actually
print out and see what our total scores
were so remember when we started out up
here so just taking random steps we were
getting about a maximum score of about
51
but that's not all that great
considering that the total maximum score
for the game
is about 200. so let's go and test this
out and see what this
actually looks like or how it's actually
performing so we can do that using
the dqn.test method so let's try that
out
all right so that's looking better
already so you can see in virtually
every single episode we're getting a
score of about 200
and our mean is 200. so what we did
there in order to test that out
is we accessed our dqn model and we use
the test method
to that we pass through our actual
environment the number of games that we
want to run so in this case
they're called episodes so we ran 100
games
and whether or not we want to visualize
it then what we did is we
outputted our mean result now if we
wanted to actually visualize what the
difference is we can do that as well
and you can see our model is performing
way better so you can see it's actually
able
to balance the pole a whole lot better
than what it was before when it was just
randomly sort of
flailing about we can test that out
again so this time rather than doing
five episodes say we wanted to
um 15 for example so you can see that
our model again
it's performing way better than what it
was initially so
it's actually able to reiterate itself
and resort to balanced it and make sure
that that pole stays straight
brings a tear to my eye so good
sweet so that's all done now what
happens if we actually wanted to go and
save this model away
and use it later on say for example we
wanted to go and deploy it into
production
well what we can actually do is we can
actually save the weights from our dqm
model and then reload them later on and
to try to test them out
so we can do that using the save weights
method
from our dqm model so let's go ahead and
save our weights
then what we'll do is we'll blast away
all of the stuff that we just created
and we'll rebuild it by reloading our
weights
so we've now gone and saved our weight
so if we actually take a look in our
folder you can see that we've gone
and generated two different h5f files so
these basically allow us
to save our reinforcement learning model
weights
now if we wanted to go and rebuild our
agent first up let's start by deleting
our model deleting our environment and
deleting our dqn agent
and then what we can do is rebuild it
using all the functions that we had and
reload those weights to test it out so
if we go and do that
so you can see if we go and try to use
our dqn.test method
there's nothing there because we've then
gone and deleted it but what we can do
is we can go and rebuild that
environment and test it out so let's go
and do that
perfect so we've now gone and
reinstantiated all of our models so we
first up we built our environment
we extracted our actions and our states
just like we did before
then we used our build model and our
build agent
functions to go and rebuild our deep
learning model and
reinstantiate our dqn agent and then
last but not least we compiled it
now what we can do is actually reload
our weights into our model and then test
it out again so in order to do that we
can use the dqn
dot load weights method so before up
here we use save weights now we can
load our weights in order to re-test
this out
and the file that we're going to pass to
our load weights method is
the one that we exported out here so we
can copy that in and paste that here
and now we've gone and reloaded our
weights we can actually go
and test out our environment again so
ideally what we should get is similar
results so
again you can see it's performing well
it's performing just as well as what it
did
before we deleted our weights and now we
went and reloaded them
and that about wraps up this video so we
covered a bunch of stuff so specifically
we went and installed our dependencies
we then created a random environment
using open ai gym and we got about a
maximum score of about 51
we then built a deep learning model
using keras and then use keras rl to
train that
using policy-based reinforcement
learning and then
last but not least we went and reloaded
that agent from memory so that allows
you to work with this
inside of a production environment if
you want to go and deploy
it and that about wraps it up thanks so
much for tuning in guys hopefully you
found this video useful if you did be
sure to give it a thumbs up hit
subscribe and tick that bell so you get
notified of when i release future videos
if you do have any questions or need any
help be sure to drop a mention in the
comments below
and i'll get right back to you and all
the course materials
including the github repository as well
as links to documentation are available
in the description below
so you can get a kickstart and get up
and running with your reinforcement
learning model
thanks again for tuning in peace
Ver Más Videos Relacionados
Build train and deploy model in sagemaker | sagemaker tutorial | sagemaker pipeline
Beginner Machine Learning tutorial (TryHackMe Advent of Cyber Day 14)
【完全攻略】Google Colaboratoryの使い方【Pythonの環境構築は不要】
Training an AI to Conquer Double Dragon: Reinforcement Learning Demo
How to Learn a Language Without Studying?
【タロサックが解決】英語継続マスタープラン/症状別モチベーションを保つ方法/外敵を制する方法【ENGLISH SKILL SET】
5.0 / 5 (0 votes)