Beginner Machine Learning tutorial (TryHackMe Advent of Cyber Day 14)
Summary
TLDRこのビデオ脚本は、AIと機械学習の基本を解説し、それらがサイバーセキュリティにどのように適用されるかについても説明しています。チャレンジの14日目にCTOが楽器の製造プロセスを妨害し、欠陥品が発生する問題を機械学習で解決しようとするストーリーを通じて、観客を引き込みます。脚本では、遺伝的アルゴリズム、粒子群、神経ネットワークなど、機械学習アルゴリズムの3つのタイプに触れています。さらに、教師あり学習と教師なし学習の2つのトレーニング方法も紹介されています。最後に、データセットの重要性と、ニューラルネットワークのトレーニングと検証プロセスについても語られており、サイバーセキュリティにおけるAIと機械学習の活用事例も詳しく説明されています。
Takeaways
- 🤖 AIと機械学習は、現在最もホットなトピックの1つであり、市場で非常に注目されています。
- 📚 このチュートリアルは、TryHackMeによって提供され、初心者が参加できる完全無料で楽しい教育的なチャレンジを提供しています。
- 🧐 AIと機械学習の違いを理解することが重要で、機械学習は人間知能を模倣するプロセスです。
- 🔍 問題を効果的に解決するためには、適切な機械学習アルゴリズムが必要です。3つの主要なタイプが紹介されています:遺伝子アルゴリズム、パーティクルスウォーム、ニューラルネットワーク。
- 🧠 ニューラルネットワークは人間の脳を模倣し、多数のニューロンが入力を受け取り出力を生成するプロセスを試みます。
- 📈 ニューラルネットワークをトレーニングする方法として、教師あり学習と教師なし学習の2つの方法が説明されています。
- 🔢 入力データを正規化し、重みと掛け合わせることでニューラルネットワークはどの入力がより重要なかを判断します。
- 📉 アクティベーション関数は、出力を一定の範囲内に保つために使用され、ランダムな数字ではなく比較可能な出力を保証します。
- ➡️ フィードフォワードループとバックプロパゲーションは、ニューラルネットワークをトレーニングするプロセスで、ネットワークが正しい決定を下すために使用されます。
- 📝 データセットはニューラルネットワークに与えられる情報で、十分な情報だけを与えることでオーバートレーニングを避けます。
- 🔑 サイバーセキュリティでは、AIと機械学習が異常行動の検出やユーザー行動分析に役立ち、迅速かつ正確な予測を提供できます。
- 🏆 最後に、チャレンジを解決し、正確な予測を行った場合、フラグを獲得できます。
Q & A
AIと機械学習の違いは何ですか?
-AIは人工知能の総称であり、機械学習はその一つのプロセスです。機械学習は、人間のような意思決定を機械にさせるプロセスであり、データセットを用いて機械が正確な意思決定を下すように教えます。
遺伝子アルゴリズムとは何ですか?
-遺伝子アルゴリズムは、自然選択と進化のプロセスを模倣したアルゴリズムです。生存競争に基づく単純でありながらも、実際の状況で実装すると複雑になる可能性があります。
ニューラルネットワークとは何ですか?
-ニューラルネットワークは、人間の脳の神経細胞がどのように機能するかを模倣した機械学習の手法です。多くの入力を受信し、出力を生む細胞とニューロンを模倣しています。
教師あり学習とは何ですか?
-教師あり学習は、ニューラルネットワークに学習させたい情報を提供する学習方法です。データセットを用いて、機械が正しい意思決定を下すように学習させます。
隠れ層とは何ですか?
-隠れ層はニューラルネットワークの中間レイヤーで、数学的な計算やデータの解釈が行われます。複雑な計算を行うには多くのレイヤーが必要です。
活性化関数とは何ですか?
-活性化関数は、ニューラルネットワークの出力を一定の範囲内に収めるための関数です。これにより、異なるトイを比較することができます。
フィードフォワードループとは何ですか?
-フィードフォワードループは、ニューラルネットワークを訓練する手法の1つで、入力を正規化し、入力層にフィードし、答えを得るプロセスです。
バックプロパゲーションとは何ですか?
-バックプロパゲーションは、ニューラルネットワークが正しい答えを出したかどうかをフィードバックするプロセスです。これにより、ネットワークはより良い意思決定を下す方法を学びます。
データセットとは何ですか?
-データセットは、ニューラルネットワークにフィードする情報で、機械学習モデルを訓練するために使用されます。適切な量のデータがないと、ニューラルネットワークは新しい問題に対して適切な意思決定を下すことができません。
オーバートレインとは何ですか?
-オーバートレインは、ニューラルネットワークに過剰な情報を与えることで、ネットワークが答えを覚えてしまい、ロジックを理解しなくなってしまう状態です。バリデーションを通じて、ネットワークが十分に学習しているかどうかを判断します。
AIと機械学習がサイバーセキュリティにどのように役立つか説明してください。
-AIと機械学習は、異常行動の検出やマルウェアの検出、ユーザー行動分析など、サイバーセキュリティの様々な分野で役立ちます。機械学習は、通常のネットワークトラフィックやユーザーの行動パターンを学習し、異常を検出することができるため、サイバー攻撃の検出と対応において非常に役立ちます。
このチュートリアルで使用されたPythonスクリプトの目的は何ですか?
-このチュートリアルで使用されたPythonスクリプトは、トレーニングデータセットとテストデータセットを使用してニューラルネットワークを構築し、トレーニングとバリデーションを実行し、最後にテストデータセットに対する予測を行って結果を出力するものです。
Outlines
🤖 AIと機械学習の基礎とそのサイバーセキュリティへの応用
この段落では、AIと機械学習の基本を解説し、それらがサイバーセキュリティでどのように使われるかについて説明しています。チュートリアルは、「Advent of CBA by try hack me」の一部であり、初心者向けの無料コンペ테ーションを通じて楽しく学ぶことができます。CTOがToy Pipelineを問題にし、制御エルフを配置することで問題に対処している物語を通じて、AIと機械学習の力を使って問題を解決しようとしています。また、AIと機械学習の違いや、機械学習アルゴリズムの3つのタイプ(遺伝的アルゴリズム、粒子群、神経ネットワーク)についても触れています。
🧠 ニューラルネットワークのトレーニングとデータセットの重要性
第2段落では、ニューラルネットワークのトレーニング方法とデータセットの扱いについて詳しく説明しています。ニューラルネットワークは3つのレイヤー(入力レイヤー、隠れレイヤー、出力レイヤー)から構成され、各レイヤーで行われる計算とその意味が解説されています。また、フィードフォワードループとバックプロパゲーションというトレーニング手法や、データセットを扱う際のバリデーションとテストデータの使い方についても説明されています。
🔍 パイソンスクリプトを使った機械学習の応用と課題解決
第3段落では、具体的なPythonスクリプトを使って機械学習モデルを構築し、トレーニングデータセットとテストデータセットを使って問題を解決する方法が説明されています。データの正規化、80/20の分割、そしてトレーニングとバリデーションのコードを用いた説明がされています。また、予測結果をファイルに保存し、それをURLにアップロードして正確さを検証するプロセスも紹介されています。
🛡️ AIと機械学習を用いたサイバーセキュリティの応用
最後の段落では、AIと機械学習がサイバーセキュリティにどのように役立つかについて語られています。異常行動の検知やユーザー行動分析など、AIが提供する高速な検知能力の活用方法が説明されています。また、コースの最終課題として、AIと機械学習に関する質問への答え方を解説し、最終的なフラッグの提出方法も紹介されています。
Mindmap
Keywords
💡人工知能(AI)
💡機械学習
💡遺伝的アルゴリズム
💡パーティクルスウォーム
💡神経ネットワーク
💡教師あり学習
💡非教師あり学習
💡隠れ層
💡フィードフォワードループ
💡バックプロパゲーション
💡データセット
Highlights
AI is one of the hottest topics in the market at the moment.
This tutorial is part of the Advent of Cyber Security Awareness (CBA) by TryHackMe, a free competition for beginners.
The basics of AI and machine learning will be covered, along with their application in cybersecurity.
The difference between AI and machine learning will be explained.
Machine learning is a process to teach a machine to mimic human intelligence.
Three types of machine learning algorithms will be explored: genetic algorithm, particle swarm, and neural network.
Neural networks are the most popular and mimic how neurons work in the human brain.
Two methods of training neural networks will be discussed: supervised learning and unsupervised learning.
Supervised learning provides the neural network with labeled data to learn from.
Unsupervised learning allows the neural network to find interesting patterns in unlabeled data.
The neural network consists of an input layer, hidden layer, and output layer.
The hidden layer is where the mathematical computations and decision-making occur.
The feedforward loop and backpropagation are key processes in training a neural network.
The importance of proper data preparation and avoiding overtraining will be discussed.
The example will use a supervised learning method to classify toys as defective or not.
The process of building and training a neural network model will be demonstrated using Python code.
AI and machine learning have practical applications in cybersecurity for detecting unusual behavior and user activity.
The tutorial provides a step-by-step guide to using AI and machine learning in solving a cybersecurity challenge.
The presenter shares his experience and insights on the evolution and effectiveness of AI in cybersecurity over the years.
Transcripts
AI is one of the hottest topics in the
Market at the moment and if you ever
wanted to learn about AI then this free
tutorial is for you this is part of the
Advent of CBA by try hack me which is a
completely free competition for
beginners to participate in fun and
educational challenges I will go through
the basics of AI and machine learning
but I will also explain to you how AI
can be used in cyber security and if
you're new to the channel I'm un next
guy and I'll help people learn their
first cyber security job even if they
don't have any degree or technical IT
background I have many videos where I
show you exactly how to land your first
cyber security job like this video for
example but wait a minute it's almost
Christmas time let's get into it so in
day 14 of the challenge the CTO has made
our toy pipeline go wrong that's not
unusual of CTO by the way by infecting
elves at key positions in the toy making
process he has poisoned the pipeline and
caused the elves to make defective toys
maxky has started to combat the problem
by placing control elves in the pipeline
these elves take measurements of the
toys to try and narrow down the exact
location of problematic elves in the
pipeline by comparing the measurements
of defective and perfect toys however
this is an incredibly tedious and
lengthy process so he's looking to use
machine learning so in this story the
CTO has messed up the process of making
toys and what we want to do is use the
power of AI and machine learning to try
and detect which toys are in good
condition and which toys are defective
but first things first we need to learn
the difference between Ai and machine
learning you see so many people use the
word AI by mistake because what they
really are referring to is what we call
machine learning machine learning is
essentially a process where we try and
teach a machine to mimic human
intelligence for example I want my
machine to be able to make decisions
accurately but the problem we face is
that sometimes we end up using F
statements for example if we're
producing toys and we want all the toys
to be red but one of the toys is blue
then our machine can detect if the toy
is blue because that's an easy process
we can just have an F statement that say
if the the toy is not thread then the
toy is defective but unfortunately in
the real world the problems that we Face
are a little bit more complicated for
example in our toys problem here it's
not just the color but sometimes it's
the height sometimes it's the weight or
the dimensions so to effectively solve
this problem we need to use proper
machine learning the process of
analyzing the input and trying to find
the problem this process and this flow
of decision we call it an algorithm now
there are so many different algorithms
for machine learning which go beyond
this video but here we will explore
three types of machine learning
algorithms so the first one is called
genetic algorithm this structure
essentially mimics the process of
natural selection and evolution it's
essentially an algorithm based on the
theory of survival for the Fest fairly
simplistic idea but it can be
complicated when we try to implement it
in real life the second type is called
particle swarm so this one aims to mimic
the process of how birds flock in a
group together at a specific point and
the third one is called the neural
network the neural network is by far the
most popular one in fact this is the one
that I learned long time ago in
University and if I'm honest with you I
forgot most of the stuff that I learned
because I didn't practice or apply it in
the real world but today with the AI
boom I find myself needing to go back to
those Concepts and learn them again
which is a lot of fun for me so
basically neural networks mimic the
process of how neurons work in our brain
neural networks try to replicate our
human brain so we have so many cells and
neurons in our brain and they receive
various inputs and they produce outputs
so this is the basic idea behind neural
networks and this is the one that we
will use in our example today and I
promise you as we go through the example
it will become a lot easier to learn and
understand so the neural network much
like the human brain needs to learn
before it can make decisions and there
are many ways to teach neural networks
how to make decisions in fact so many
PHD programs are based on specific way
of teaching neural networks and trying
to find the efficiency of that way in
this example here we will be learning
two methods of training neural networks
the first one is called supervised
learning learning and the second one is
called unsupervised learning the
supervised learning is where we provide
the neural network with the information
that we want it to learn so ideally we
need to create what we refer to as data
sets these data sets are essentially
information that we feed to the neural
network in order for the neural network
to be able to make decisions based on
this data for example if I want to teach
the newal network about toys my data set
should include information about what a
good toy looks like so that when the
neuron Network sees a toy that doesn't
look like the good toy then it can tell
you what it's defective or not the
second type is called unsupervised
learning and this one is a little bit
more complex we basically let the neural
network try and find interesting things
so in our toys example instead of
specifically telling the neural network
that this is a good toy what we do
instead is we feeded large amount of
information about toys in general and
then we let the neural networks try to
make correlations and decisions based on
that data as I said this is a complex
topic and in the challenge there are
links that you can follow where you can
read more about this later on in this
example that we will go through we will
use the supervised learning method so as
you can see in this example our machine
learning model here consist of three
layers we have the input layer and then
we have a hidden layer and we have then
the output layer so at the moment we
have a neural network that doesn't have
any information so the first step is to
give it some information that
information is fed to the neural network
at the input layer so in this example
you can see that the input layer have
the height width length color scheme
make ID and the check ID and at the end
of that model we have something called
the output layer this will simply tell
us whether the toy is defective or not
but then in the middle we have the
hidden layer this is where the magic
happens this is where the math where the
computation happens this is where the
neural network tries to make sense of
the data and the hidden layer itself can
be composed of so many layers in fact
the more complicated the calculations
are the more layers we need now this
example doesn't go deep into what a no
is but if I wants to simplify it for you
a node can be a server it can be a
laptop it even can be an application
instance or a thread think of a node is
like something that does computations so
the more computational nodes we have the
more servers we have the more computing
power we have the more mathematical
computation we can perform so if we have
a complex neural network that's doing
true artificial intelligence we really
need a lot of computational power and a
lot of mathematical calculations as well
now in this example we will zoom in a
little bit on the hidden layer to see
how the calculations are performed but
don't worry it's not complex math it's
very very basic simplified for you just
to know enough to understand how neural
networks actually work we don't just
take the inputs as they are in fact we
multiply the inputs by the weight now
this is important logic we need to
understand the relationship between the
height and the weight of the toy this
will help the neural network understand
which input contribute more to the
output than others so for example one
toy could be a lot taller than other
toys but once we factor in the weight
then we can see how much that height is
actually contributing simil similarly we
don't just take the output value as it
is but we take that output value and we
put it into what we refer to as an
activation function the purpose of the
activation function is that we don't
want the outputs to be just random
numbers we want these outputs to be
within one range so we can compare the
toys to one another so in this example
we want the output to be a decimal
number between 0 and 1 or it could be
between minus1 and 1 now that we have
some idea of how the neural network
operates the next step would be to train
the neural network this means we need
information to feed into the neural
network so it can start and make
decisions for us so the first method
here is called the feed forward Loop
this is the simplest form of training so
essentially the way it works the first
step is we normalize the input normalize
the input is what we talked about in the
previous step which is we multiplies it
by weight to help the neural network
decide which inputs are more important
and then we feed the inputs to our nodes
in the input layer as you can see in the
diagram and then we simply get the
answer so the more data we give to the
neural netk took the better the decision
will be because it will know what a
defective toy looks like but that's only
half of the equation we have been
feeding input to the Neal Network now we
need to do what we refer to as back
propagation this is where we simply tell
the neural network whether the answer
was correct or not and that's how the
neural network can learn so we will give
the neural network a certain input and
then the neural network will come and
tell you well I think this toy is good
but then you will look at it and say yes
it was and then you will look at the
answer and then you will tell the neural
network whether the answer was correct
or not and that's how the neural network
can be trained to make better decisions
now the last topic that we need to talk
about before we go into solving the
challenge is the information that we
feed to the neural network which we
refer to as data sets now the type of
data that we give to the neural network
is a huge topic on its own and it's not
straightforward so as tryck me trying to
explain it if you were at school and
your teacher have explained to you that
1 + 1 = 2 and 2 + 2 = 4 but then in the
exam you get the question of 3 + 3 now
we know that the answer is six but you
can only know that if you understood the
basic principle of addition if you just
memorized 1 + 1 = 2 and 2 + 2 = 4 then
you won't be able to calculate 3 + 3 you
have to understand the logic behind the
calculation the same thing applies to
machine learning we can simply give our
neural network so much information but
then the neural network end up
memorizing all of this information the
problem is when the neural network faces
a new problem that it hasn't seen before
then it can't make a decision so the way
we train our neuron network is we need
to train it in such a way that it
understands the underlying logic it's
not just cramming and memorizing things
so the way we fix this problem is that
we need to know how much information is
enough otherwise we will end up into
what we refer to as overtraining this is
when we give the neur network way too
much information that it ends up
memorizing the answers as opposed to
understanding the logic so to achieve
that we do something we refer to as
validation we essentially validate
whether the neural network has Lear
enough or not so to perform validation
we have to split the data set into three
data sets the first set is called the
training data this is the information
that we feed to the neural network but
then we also need validation data which
is what we will use to validate whether
the neural network understood our
training or not so after each training
round we need to send the validation
data through our Network to determine
the performance if the performance
starts to decline then we know that
we're overtraining our neuron Network
and finally we have testing data this
data set is simply used to calculate the
final performance of the network the
network shouldn't see this data until
we're completely done with the training
process so with all of this background
information now we can proceed to solve
our Challenge and I promise you the hot
part is done this challenge is
straightforward and I'll show you
exactly how to solve it so we
essentially have three files one is a
python script called detector and then
we have our training data set and we
have our testing data set we need
training data set we need validation
data set and we need testing data set in
this particular example the training and
the validation data sets are both in the
same file so we only have two files for
Simplicity now what we need to do is we
need to add some lines of codes to our
python script but don't be afraid you
don't actually need to know Python
Programming or anything because these
lines of codes are given to you all you
need to do is you need to start the
machine in the top right corner and this
will split our screen in half and then
we can copy the lines of codes into the
detector script so as you can see if we
walk through the codes the first thing
we do is we import python libraries that
are needed for our neural network here
we're using two libraries one is called
pandas and the other one is called py
learn for building our neuron Network
and the next thing is we need to load
the data set in our case we have two
Excel files and then once we get the
input data remember we normalize it so
in our previous example we had to
multiply it by the weight but also in
this example we need to make all the
inputs numerical so even the color we
need to just convert it to numbers so
that we get a simple decimal output
answer and finally we need to load the
data everything will be stored in the
variable we call x and testore x stores
the testing data for us the next step is
where you will need to copy some lines
of code so for our data set remember we
have one file for both the training and
the validation so we will split the data
in this example we will use an 80/20
split so what I will do I will simply
copy this line into the script sh I use
the Vim editor because as I was
recording it was a little bit slow
remember you can just use the graphical
interface you can simp simply double
click on the script and you can copy and
paste the line you don't actually need
to use the vi editor if that's easier
for you and the next step we need to
copy these lines to normalize the data
simply copy paste this line into the
script and then we need to start the
training process again all I will do is
just copy this line into the script then
we've got our classifier code and our
validation prediction code as well and
finally we need to insert a testing
prediction code as well once you're done
remember to save the file if you've
double clicked on the file and you edit
that just make sure to click save and
then we will simply run this python
script so we simply type the command
Python 3 space and then we type
detector. py you can copy this line of
code if it's easy for you and then we
can watch the magic happen a few moments
later now when you finish the prediction
will be saved to a file then we simply
need to upload the prediction to this
URL so open this URL within your testing
machine and then we upload the file so
let's see if our prediction were
accurate if our accuracy is above 90%
person then we will be awarded with the
flag and here we go winning we got the
flag I'm not going to lie it feels good
now when you run it you may not always
get 90% accuracy and if that happens
simply run it again and the explanation
for that is within neural networks there
is some Randomness built in remember
they try to mimic a real human brain a
real neural network so this is a an
issue to look out for now when it comes
to cyber security Ai and machine
learning are not a new topic in fact
even 5 years ago I was bombarded by by
vendors telling me how they have this
sophisticated Ai and machine learning
built into their products but when I go
and test them I get extremely
questionable results it was simply bad
however today they have gotten a lot
better now some of the security tools
that can make use of AI is tools we use
to detect unusual behavior and this is
important because neural network can be
really good at detecting things a lot
faster than us humans for example if you
have a malware on your network then the
traffic will look different than the
normal traffic so your neural network
can actually learn what a normal
day-to-day operation looks like and
whenever there is something unusual then
that AI or machine learning can generate
an alert for you this is extremely
useful for cyber security professionals
who work in detecting and responding to
cyber attacks another thing which is
also popular in the industry is what we
refer to as user Behavior analytics for
example if every day I log into my email
at 9:00 a.m. from Australia but all of a
sudden out of the blue my username logs
into my email email at 3:00 a.m. from
France this is an unusual behavior
because machine learning have learned
and observed my behavior over a very
long period of time and now it can
detect if I'm doing something that's
unusual now notice these are simplistic
examples but as you've learned in this
course if you provide really good data
sets into your machine learning it can
provide you with fairly accurate
predictions now the final part of the
challenge is to answer the questions so
the first question is what is the other
term given for artificial intelligence
this is an easy one machine learning the
second question is what machine
learnings aim to mimic the process of
natural selection it's called genetic
algorithm we talked about this at the
beginning of the lesson and then what's
the name of the learning style that
makes use of labeled data to train a
machine learning structure if we scroll
up this is called supervised learning
and what's the name of the layer between
the input and output layers of neural
network this is the hidden layer this is
where the magic happens and finally what
is the name of the process used to
provide feedback to the newal network on
how close it prediction was is called
back propagation this is where we tell
the network whether the prediction was
correct or not and finally we need to
submit the flag that we got so all we
need to do is copy paste the flag here
and submit now as I said Ai and machine
learning are an extremely Hot Topic in
cyber security and if you're trying to
land your first cyber security job then
I have a step by-step road map to take
you from absolute beginner all the way
to becoming a cyber security
professional in this video and I'll see
you there
5.0 / 5 (0 votes)