Network Traffic Anomaly Detection Using Machine Learning

kummari vijay
26 Apr 202422:59

Summary

TLDRIn this presentation, Magna Kri and teammates Vijay, Wami, and Graa discuss their project on network traffic anomaly detection using machine learning. The team introduces the need for adaptive security systems in response to cyber threats. They demonstrate the use of K-means clustering and other machine learning techniques to detect anomalies in network traffic data, focusing on packet sizes and IP addresses. The presentation covers code implementation, methodology, challenges faced, results, and future work, emphasizing the need for advanced, real-time monitoring to combat evolving cyber threats effectively.

Takeaways

  • 📡 The presentation focuses on network traffic anomaly detection using machine learning to enhance network security.
  • 📊 Key sections of the presentation include introduction, methodology, code demonstration, results, challenges, future work, and references.
  • 🔒 Cybersecurity challenges arise from evolving technologies and sophisticated cyber attacks targeting network vulnerabilities.
  • đŸ’» Machine learning is being utilized to identify subtle network traffic anomalies, offering adaptive threat detection.
  • 📈 K-means clustering is the primary algorithm used for detecting anomalies in network traffic, analyzing packet size, IP addresses, and other features.
  • 🔧 The code demonstration highlights Python-based anomaly detection, employing machine learning techniques such as eigenvalues, eigenvectors, and data normalization.
  • 📉 The team utilized clustering models, including K-means and DBSCAN, to categorize network traffic data and identify anomalies.
  • 📉 Results show the model effectively detecting anomalies with a purity of 95%, although challenges with data quality and scalability remain.
  • ⚙ Future work includes improving data quality, adopting more advanced techniques, enabling real-time monitoring, and explaining AI decision-making.
  • 🔍 The conclusion emphasizes the importance of machine learning in improving cybersecurity and detecting network anomalies, with continuous improvement necessary to keep up with evolving cyber threats.

Q & A

  • What is the main objective of the project discussed in the transcript?

    -The main objective of the project is to develop a comprehensive network traffic anomaly detection system using state-of-the-art machine learning techniques to detect potential cyber threats by identifying anomalies in network traffic patterns.

  • Why are traditional security measures like firewalls insufficient for modern network security?

    -Traditional security measures such as firewalls and intrusion detection systems rely on predefined rules and signatures, making them vulnerable to sophisticated attacks that use evasion tactics. Additionally, as networks grow in complexity, manually updating rules becomes more difficult.

  • How do machine learning techniques improve network anomaly detection?

    -Machine learning algorithms learn patterns inherent in network traffic data and can detect subtle deviations from normal behavior, allowing for more adaptive and dynamic threat detection. This enables identification of potential security breaches or performance anomalies.

  • What machine learning technique is used in the project for anomaly detection?

    -The project uses K-means clustering, an unsupervised machine learning technique that groups data points into clusters based on features such as packet size and IP address. Deviations from normal clusters are flagged as potential anomalies.

  • What role do eigenvalues and eigenvectors play in the project’s anomaly detection process?

    -Eigenvalues and eigenvectors are used to analyze the properties of the Laplacian matrix in the network, helping identify the underlying structure of the network traffic data. This allows the system to detect important features and patterns in the data.

  • What are some of the challenges mentioned in detecting anomalies in network traffic?

    -Challenges include dealing with messy data (e.g., missing values, outliers), handling large volumes of data in real-time, and ensuring the anomaly detection system can effectively keep up with evolving cyber threats.

  • What are the future improvements suggested for the project?

    -Future improvements include enhancing data quality, exploring advanced machine learning techniques, implementing real-time monitoring, and focusing on explainable AI to better understand and interpret the system's decisions.

  • What clustering evaluation metrics were used to assess the quality of the K-means clusters?

    -The evaluation metrics used include purity, recall, F1 score, and entropy. These metrics help assess the effectiveness of the clustering, with high purity indicating good clustering and low entropy reflecting less randomness.

  • How was the K-means algorithm implemented in the project’s code?

    -The K-means algorithm was implemented by initializing centroids, assigning data points to the nearest centroid based on Euclidean distance, recalculating the centroids, and repeating this process until convergence. The model was run for different values of K (e.g., 7 and 15) to find the optimal number of clusters.

  • How are anomalies detected using the clustering results?

    -Anomalies are detected by identifying data points that are far from the centroids of the clusters. These points are likely to represent abnormal network traffic and are flagged for further investigation to determine if they are malicious.

Outlines

00:00

đŸ§‘â€đŸ« Introduction to Network Traffic Anomaly Detection

The speaker introduces the team (Magna, Vijay, and Wami) and provides an overview of their project on network traffic anomaly detection using machine learning. The presentation covers the contents of the talk: introduction, methodology, code demonstration, results, challenges, and future work. They explain how digital transformation has increased the need for secure networks but also opened new vulnerabilities for cyberattacks. Traditional security methods such as firewalls and intrusion detection systems, while effective, struggle to keep up with sophisticated attacks. The focus of the project is to develop a machine learning-based anomaly detection system that can dynamically detect deviations in network traffic, providing a more adaptive and efficient way to manage cyber threats.

05:01

đŸ’» Code Overview for Anomaly Detection

The speaker, Graa, introduces the code developed for anomaly detection in network traffic, focusing on analyzing data like packet sizes and IP addresses. The method used is K-means clustering, a machine learning algorithm that groups data points into clusters based on shared features. Graa explains the process of importing necessary libraries, defining functions, and using matrices to represent the network’s structure. The code analyzes network traffic and flags deviations from normal behavior as potential anomalies, which may indicate malicious activity. The speaker outlines how mathematical concepts like eigenvalues and norms are used to identify key data features, normalize the data, and apply K-means clustering.

10:02

🧼 K-Means Clustering Algorithm for Network Traffic

Vijay takes over to explain the K-means clustering algorithm, a popular unsupervised learning method for grouping similar data points. The algorithm selects initial centroids randomly and assigns data points to the nearest cluster based on distance, iterating until convergence. The speaker demonstrates how the code imports libraries, downloads datasets, and converts categorical data into numerical values for processing. The goal is to create clusters of network traffic data based on 42 features, such as packet size and service type. The class defined for K-means clustering contains various methods for centroid initialization, cluster assignment, and model fitting, ensuring effective clustering of the data.

15:03

📊 Clustering Performance Evaluation

This section delves deeper into the evaluation and implementation of the K-means clustering process. The speaker discusses clustering on training data using a list of values for K (7 and 15), which are iterated to optimize results. They explain how the code uses only a portion of the dataset for training and evaluates the model based on metrics like F1 score, entropy, and purity. These evaluations help determine the quality of clustering. By analyzing the relationship between F1 score, purity, and entropy across different cluster sizes, the speaker shows that increasing the number of clusters improves the purity of the model, although overfitting is noted. The project also uses a normalized cut method for special clustering.

20:05

🚹 Anomaly Detection and Evaluation

The speaker discusses the process of detecting anomalies in the network traffic using the K-means clustering algorithm. By clustering the data and calculating distances between data points, anomalies are identified as data points that don't fit well into any cluster. The model detected 107 anomalies, and the speaker highlights the evaluation methods used to assess the clustering’s effectiveness, including purity, recall, F1 score, and conditional entropy. They also introduce the DBSCAN algorithm, which clusters data points based on density and is used to identify anomalies in a noisier dataset. Overall, the results show a successful detection of anomalies in the training data.

⚙ Challenges and Future Work in Anomaly Detection

The speaker, Vami Sadya, outlines the challenges the team faced in building the anomaly detection system. Handling large amounts of messy data with missing values or outliers was a significant issue, as the detection system struggled to keep up. To improve the model, future work will focus on enhancing data quality, exploring advanced machine learning techniques, and implementing real-time monitoring for quicker detection of new threats. The speaker emphasizes the importance of explainable AI, which can provide clarity on why the system flags certain activities as anomalies. This will be crucial for improving the model's usability and effectiveness.

📈 Conclusion and Final Thoughts

In the conclusion, the speaker reflects on the project’s success in using machine learning techniques such as K-means, SVM, and neural networks for anomaly detection in network traffic. They highlight the importance of clean data and careful evaluation to achieve reliable results. Looking ahead, the team aims to further integrate machine learning into cybersecurity practices, continuously improving the system to match the evolving landscape of cyber threats. This project demonstrates the need for ongoing research to enhance machine learning capabilities in anomaly detection and cybersecurity.

Mindmap

Keywords

💡Network Traffic

Network traffic refers to the exchange of data packets across a computer network. In the context of the video, it is the subject of analysis for detecting anomalies that could indicate security threats. The script mentions analyzing 'packet sizes and IP addresses' to identify patterns that deviate from the norm, which are then flagged as anomalies.

💡Anomaly Detection

Anomaly detection is the identification of patterns in data that do not conform to expected behavior, which can indicate potential security breaches or system issues. The video discusses using machine learning to detect anomalies in network traffic, highlighting its importance in cybersecurity.

💡Machine Learning

Machine learning is a subset of artificial intelligence that enables systems to learn from and make decisions based on data. The script emphasizes the use of machine learning algorithms to learn patterns in network traffic data, which helps in dynamically detecting threats that traditional rule-based systems might miss.

💡K-Means Clustering

K-means clustering is a machine learning algorithm used for dividing a dataset into a specified number of clusters based on similarity. The script describes using K-means to group data points into clusters, with anomalies being data points that are far away from the majority of data points within the clusters.

💡Cyber Attacks

Cyber attacks are attempts to damage, disrupt, or gain unauthorized access to a computer system or network. The video script discusses the importance of robust network infrastructure to protect against the looming threat of cyber attacks, emphasizing the need for advanced security measures beyond traditional firewalls.

💡Data Integrity

Data integrity refers to the accuracy and consistency of data over its entire lifecycle. The script mentions that cyber attacks can compromise data integrity, which is a key concern that anomaly detection systems aim to protect against by identifying unusual activities that could lead to data corruption or loss.

💡Feature Selection

Feature selection is the process of choosing the most relevant features (or variables) from a dataset for use in model training. The script refers to selecting features like 'duration, protocol, service flag, source bytes, destination bytes' from network traffic data for analysis, which are critical for the machine learning model to identify anomalies effectively.

💡Cybersecurity

Cybersecurity is the practice of protecting systems, networks, and programs from digital attacks. The video's main theme revolves around enhancing cybersecurity through machine learning-based anomaly detection in network traffic, showcasing the ongoing need for innovative solutions to combat cyber threats.

💡Overfitting

Overfitting occurs when a model learns the detail and noise in the training data to an extent that it negatively impacts the model's performance on new data. The script suggests a potential overfitting issue where the model performs better on training data than on testing data, indicating a need for further refinement to generalize well.

💡Real-Time Monitoring

Real-time monitoring refers to the continuous observation of systems or networks to detect and respond to issues as they occur. The script mentions the future work of implementing real-time monitoring to adapt quickly to new threats, highlighting the importance of timely detection and response in cybersecurity.

💡Explainable AI

Explainable AI is a subfield of artificial intelligence that focuses on creating systems whose actions can be easily understood by humans. The script touches on the need for 'expansible AI' or explainable AI in the context of anomaly detection systems, to ensure that the reasons behind identifying certain activities as anomalies are clear and justifiable.

Highlights

Introduction to network security challenges and the importance of robust and secure network infrastructure.

The threat of cyber attacks and network breaches alongside the benefits of interconnected systems.

Traditional security measures like firewalls and intrusion detection systems are limited by predefined rules.

The growing interest in machine learning techniques for network anomaly detection.

Machine learning algorithms can learn patterns in network traffic data for dynamic threat detection.

The project's aim to develop a comprehensive network traffic anomaly detection system using state-of-the-art machine learning techniques.

The code for anomaly detection is written in Python and is based on the assumption that normal network traffic follows certain patterns.

Use of K-means clustering to group data points into clusters based on network traffic properties.

The importance of data normalization to ensure equal weight of features in the K-means algorithm.

Identification of anomalies through data points assigned to clusters far away from the norm.

Explanation of the K-means clustering algorithm and its application in grouping similar data points.

The process of centroid initialization, cluster assignment, and centroid update in K-means.

Demonstration of the code that implements K-means clustering for anomaly detection in network traffic.

Challenges faced during the project, such as detecting unusual activity in messy data and the need for real-time monitoring.

Future work includes improving data quality, exploring advanced machine learning techniques, and developing explainable AI.

Conclusion on the effectiveness of machine learning techniques like K-means, SVM, and neural networks for detecting unusual activity.

The need for ongoing research and development to enhance machine learning-based anomaly detection systems in cybersecurity.

References from Google Scholar and comparison of different models used in the project for accuracy and F1 score.

Transcripts

play00:00

hello Professor this is Magna kri so

play00:02

today my teammates Vijay wami graa and I

play00:05

are going to talk on network traffic

play00:07

anomaly detection using machine learning

play00:10

so the next

play00:15

slide uh so the contents includes

play00:17

introduction methodology code

play00:18

demonstration results challenges and

play00:21

future work conclusion and reference

play00:23

next slide

play00:27

please uh introduction to network

play00:30

security challenge uh the digital

play00:31

Technologies rapidly has uh re

play00:34

revolutionized uh how we communicate

play00:36

contact business and access information

play00:39

with this digital transformation robust

play00:41

and secure network infrastructure has

play00:44

become uh Paramount however the looming

play00:46

threat of uh cyber attacks and network

play00:49

uh breaches uh come along uh side the

play00:53

numerous benefits of the interconnector

play00:55

systems uh and the actors uh constantly

play00:59

seek to explorit

play01:02

uh vulnerabilities in network

play01:05

architectures compromising data uh

play01:08

Integrity privacy and uh system

play01:10

availability uh traditional security

play01:12

measures such as uh firewalls and uh

play01:15

intr uh detection systems uh how how

play01:19

long been the Frontline defense against

play01:22

cyber threats while these tools are

play01:24

effective to some extent they often r on

play01:27

predefined rules and signatures uh

play01:30

making them uh suspectable uh to

play01:32

evocation tactics employed by

play01:35

sofisticated attracts uh moreover as

play01:38

Network in infrastructure go grow in

play01:40

complexity and scale manually crafting

play01:43

and updating rules be uh becomes

play01:46

increasing increasingly daunting uh in

play01:49

response to these challenges uh a

play01:51

growing interest has been in uh machine

play01:54

learning techniques for uh Network

play01:55

anomaly detection uh unlike rule based

play01:58

approaches machine learning algorithms

play02:00

can learn patterns and behaviors uh

play02:03

inheritant uh in network traffic data

play02:05

enabling more uh adaptive uh and dynamic

play02:08

threat detection uh by analyzing vast

play02:11

amount of uh data uh machine learning

play02:14

models can identify uh subtle uh

play02:17

deviations from normal Behavior Uh

play02:19

indicative of potential security

play02:22

breaches or performance uh anomal

play02:25

anomalies uh this product uh like this

play02:27

project aims to uh develop a

play02:30

comprehension Network a traffic anomaly

play02:32

detection system using state of the art

play02:35

machine learning techniqu uh this report

play02:38

outlines the methodology techniques and

play02:40

uh Evolution uh criteria for developing

play02:43

the anomaly detection system next will

play02:46

be continued uh by the Le will be

play02:49

continued by graa

play03:05

braa are you

play03:07

there yes

play03:09

yes

play03:11

so code is written in the python for

play03:14

machine learning likely to detect

play03:16

anomalies in network traffic so this

play03:19

code is based on the idea that normal

play03:22

Network traffic will follow certain

play03:24

patterns so by analyzing data such as

play03:27

packet sizes and IP addresses the code

play03:30

can identify patterns that deviate from

play03:32

the norm so these deviations are then

play03:35

flagged as anomalies which could be a

play03:38

sign of malicious activity coming to the

play03:41

code snippet uh it uses a machine

play03:44

learning technique called K means

play03:46

clustering this involves grouping data

play03:48

points into a specific number of

play03:51

clusters uh that are defined by K so

play03:55

each data point is assigned to the

play03:56

cluster that is most likely resembles

play03:59

based on it features in this case the

play04:01

features are the properties of the

play04:03

network traffic such as packet size and

play04:06

IP

play04:07

address so the code first Imports the

play04:10

necessary libraries including numi for

play04:13

numerical operations and SEC learn for

play04:17

machine learning then it defines a

play04:19

function to perform the anomaly

play04:21

detection so here this function takes

play04:23

three arguments uh one is lation Matrix

play04:29

uh which the it represents the

play04:31

connections between the nodes in the

play04:32

network and the second one is B dig

play04:36

Matrix uh in which it represents the

play04:38

degree of each node in the network and

play04:41

finally the third one k which represents

play04:45

the number of clusters to use the K

play04:47

means algorithm so here the function

play04:50

first calculates the aen values and aen

play04:52

vectors of the lation Matrix aen values

play04:56

and aen vectors are mathematical

play04:59

Concepts used to analyze the properties

play05:01

of Matrix so here in this case they help

play05:04

to identify the underlying structure of

play05:06

the network traffic data and next um the

play05:10

function sorts the Aon values and Aon

play05:13

vectors uh and this is done to identify

play05:16

the most important features of the data

play05:19

now next it takes this first k a vectors

play05:23

and computes the norm of each row so the

play05:26

norm is a mathematical concept that

play05:28

represents the itude of a vector in this

play05:31

case it is used to measure the strength

play05:33

of the signal in each data

play05:36

point the function then normalizes the

play05:38

data uh this means uh it scales the data

play05:41

to a common range uh this is the

play05:44

important thing because it ensures the

play05:47

all the features have an equal weight in

play05:49

the K means algorithm finally the

play05:52

function uses the K means algorithm to

play05:54

Cluster the data points so here the C

play05:57

algorithm partitions the data points

play05:59

into clay

play06:00

K clusters and the data points are

play06:02

assigned to the cluster that they mostly

play06:05

uh reassemble uh based on their features

play06:08

so the code can be used to identify

play06:10

anomalies in the network traffic so data

play06:12

points that are assigned to clusters

play06:14

that are far away from the data points

play06:16

are likely to be anomalies these

play06:18

anomalies can be investigated further to

play06:21

determine uh if they are malicious

play06:23

something like that and of course uh

play06:27

this is a simplified code uh but the

play06:30

actual code is more complex so that will

play06:32

be includes other steps um hopefully

play06:36

this gives a basic understanding how the

play06:38

code works yeah and next will be

play06:40

continued

play06:41

by our team

play06:47

mate yeah hello hi uh my name is

play06:52

vij uh let me just Che uh to the code

play06:56

demonstration uh before that uh in our

play06:59

project we have used the the algorithm

play07:03

that we have used is C in clustering let

play07:06

me just go through the cin algorithm in

play07:09

brief so cin algorithm clustering is a

play07:12

popular unsupervised machine learning

play07:13

algorithm used for passing data set into

play07:16

set of K clusters the goal of K means

play07:19

clustering is to group the similar data

play07:21

points together and discover underlying

play07:23

patterns or structures in the data let's

play07:26

see how the algorithm actually works so

play07:28

the algorithm starts by randomly

play07:30

selecting K data points from the data

play07:32

set as the initial cluster centroids

play07:35

these centroids acts as the initial

play07:37

representative for each cluster then

play07:39

each data point in the data set is

play07:41

assigned to the nearest cent based on

play07:44

the Su distance metric this typically we

play07:46

can calculate this Distance by ukl and

play07:49

distance so the data points are grouped

play07:51

into K clusters based on which centroid

play07:53

they are closest to then after all the

play07:56

data points have been assigned to

play07:58

clusters the CID are recalculated as the

play08:01

mean all of all the data points assigned

play08:03

to each cluster this step updates the

play08:06

cented positions then we are will repeat

play08:09

the step two and step three repeatedly

play08:11

until the convergence criteria are met

play08:14

uh convergence occurs when the C no

play08:17

longer changes significantly between

play08:19

iterations when a maximum number of

play08:21

iterations reach in the final the once

play08:24

the convergence is achieved the

play08:26

algorithms output final cluster

play08:29

assignment and SIDS uh let me just take

play08:32

to the code

play08:33

demonstration

play08:39

um so this is our code uh where we

play08:43

import uh necessary libraries like numai

play08:47

and pandas M Li escal and metrics and

play08:51

scii kit special distance and we'll

play08:53

start by downloading the data sets that

play08:56

we have like we have used the three data

play08:59

sets here uh we have downloading these

play09:01

data sets from the kddc pickup

play09:06

so so here we are importing those data

play09:09

sets into uh data

play09:12

frames and in the data set we have 42 uh

play09:16

feature selection features like uh

play09:19

duration prototype service flag Source

play09:22

bites destination bites land rank

play09:25

fragment urgent and so on we have until

play09:29

42 types of features in the data set so

play09:33

in the next step we have defined a

play09:36

method called convert categorical

play09:38

columns so this function converts

play09:40

categorical columns in a data frame into

play09:42

numerical represents ensuring that

play09:45

machine learning algorithms can process

play09:47

them effectively it also provides the

play09:49

option to store and refuse mappings

play09:51

between categorial values and the

play09:52

numerical values representation across

play09:55

the multiple function

play09:57

cells and

play10:00

here we are converting those columns I

play10:02

mean category columns into numerical

play10:05

columns and then here here comes the

play10:08

clustering using K

play10:10

meanss so here in this uh code we define

play10:14

a class K means which implements K means

play10:17

clustering algorithm in the in in the

play10:20

first initialization here in the

play10:23

initialization method this I mean this

play10:25

is the Constructor this initialize the

play10:27

centroid attributes To None which will

play10:29

hold the cented values after fitting the

play10:32

model and the next one is centered

play10:34

initialization this function initialize

play10:37

centroids method randomly selects K data

play10:40

points from the training data as the

play10:42

initial centroids it ensures that each

play10:46

Cent is unique to avoid

play10:48

duplication and the next one is

play10:52

U the method cluster assignment the

play10:55

compute cluster indicates this method

play10:57

calculate the distance between each data

play10:59

point and the centroids then assign each

play11:01

data point to the nearest

play11:03

centroid and the next method that we

play11:06

have is assigning uh assigned clusters

play11:09

so this method organize the data points

play11:11

into clusters based on the assigned

play11:14

centroids and the next one we have uh

play11:18

update centroids method so this update

play11:21

centroids method recalculates the

play11:24

centroids based on the mean of the data

play11:27

points within the each cluster

play11:30

and the next one is the kin method this

play11:34

one uh iteratively perform the cluster

play11:37

using a a loop assignment and the Cent

play11:40

update steps until the converges are

play11:43

reaching the maximum number of

play11:45

iterations and the next we have uh print

play11:49

cluster info so this method is basically

play11:53

prints the method display information

play11:55

about the size of clusters for a given

play11:57

value of K

play12:00

and the next we have a compute SS so

play12:04

these compute SS SS means a sum of

play12:07

scared errors so this method calculat

play12:09

the sum of scared errors which is

play12:11

measure of how spread out of the data

play12:14

points within the Clusters and the next

play12:17

we have the model fitting so this model

play12:19

fitting uh fits the kyin model to the

play12:22

training data and it performs multiple

play12:25

restarts to find the best best set of

play12:28

cids that minimum the SSC and the next

play12:32

method that we have in the class is the

play12:33

predict method so this predict method

play12:36

assigns testing data points to the

play12:38

Clusters based on the fitting

play12:40

centroids and the next get centroids

play12:43

method so this will return uh the

play12:46

centroids learn based during the fitting

play12:49

process so overall in this uh this class

play12:53

and encapsulates the K means clustering

play12:55

algorithm that providing methods for

play12:58

fitting the model making predictions and

play13:00

accessing the learning centroids it also

play13:03

include functionality for evaluating the

play13:06

quantity of uh quality of clustering

play13:08

results using SS and printing cluster

play13:12

information and uh in the next step we

play13:14

have a um clustering definition like the

play13:19

value for the k k so here we have taken

play13:23

the value uh value of K is 7 and 15 so

play13:28

these are the optimal values for uh

play13:31

kin's clustering so we have initialized

play13:35

the K value by 7 and

play13:37

15 and in The Next Step uh we are uh

play13:41

this code will Loop that iterates over

play13:43

each value of K in the K value list and

play13:47

this this Loop

play13:48

iterates um or each uh in in in the case

play13:53

in in in this case the list of contains

play13:55

that we have initialized here is 7 and

play13:57

15 so this Loop runs the K means

play14:00

clustering algorithm multiple times each

play14:02

time with a different value of K and it

play14:05

stores the resulting cluster models in

play14:07

the kin's dictionary that we have

play14:09

defined here so allowing to access the

play14:13

models later for the further further uh

play14:15

analysis and

play14:17

visualization and here we have we have

play14:19

using

play14:20

0.15% of data set that he used for the

play14:25

training and here in the uh training

play14:29

data and training uh uh this code will

play14:33

essentially split the data into a small

play14:35

subset for training and uh discuss the

play14:39

rest of the data and the next we have a

play14:42

method called normalized cut so this

play14:44

function will essentially perform the

play14:46

special clustering using the normalized

play14:49

cut criteria which aims to parion the

play14:52

data into K clusters based on the igon

play14:55

vectors of the LA plasm Matrix special

play14:58

cluster in is a powerful technique for

play15:00

clustering data with complex structures

play15:02

or

play15:04

nonion and here we uh here this is the

play15:08

function call of for the yo method so

play15:12

when we call the function so this is the

play15:14

output that we have this output

play15:16

summarize that uh clustering process and

play15:19

provide insights into how the data

play15:21

points are grouped into clusters based

play15:23

on their Sim similarities in the lower

play15:26

Dimension space defined by the

play15:28

normalized ion vectors that we have here

play15:32

so moving on to the next

play15:36

um here we have the uh relationship

play15:41

between the Matrix we have some Matrix

play15:44

called Purity recall F1 score and

play15:46

entropy and and here we are defining the

play15:50

uh relationship between the F1 score and

play15:54

the K value here in the first graph we

play15:57

are plotting uh relationship between F1

play16:00

and F1 score and K for uh using training

play16:04

data and testing data so as you can see

play16:08

by seeing this the graph we can say that

play16:11

in both the training and testing data

play16:13

there seems to be a positive correlation

play16:15

between Purity and K this means that as

play16:18

the number of clusters increase the

play16:20

Purity also increasing so uh compared to

play16:25

uh training and testing the training

play16:26

data lines cons uh consist instantly

play16:29

scores higher in Purity than the testing

play16:31

data lies so this suggest that model

play16:33

might be overfitting the training data

play16:36

and the next we have entropy and K the

play16:39

relationship between entropy and K

play16:43

so uh this this one uh we can see by

play16:48

looking at the graph we can say the

play16:50

entropy is decreasing so the entropy

play16:53

generally decreases as the number of

play16:56

clusters increase this means that the

play16:58

data becomes more ordered or pure as we

play17:01

increase the number of clusters so this

play17:03

is because with the more clusters the

play17:06

data points are grouped into smaller and

play17:08

more specific cluster reducing

play17:11

Randomness and here in this uh and the

play17:15

next one we have

play17:17

kin so this code performs cin clustering

play17:20

on the training data assign data points

play17:23

to the cluster and evaluate the

play17:25

clustering results using some evaluation

play17:29

methods and the next one we have uh

play17:31

compute anomalies so this uh this

play17:34

compute anomalies will take uh two

play17:37

parameters that one one is the Clusters

play17:39

and the next one is uh I mean sorry

play17:41

three so cluster data and labels and it

play17:44

will return return the number of

play17:45

anomalies that we have detected so here

play17:49

uh while after the training data uh we

play17:52

have detected a, and

play17:56

uh, one 1,100 107 anomalies that we have

play18:00

detected here and we are printing those

play18:03

anomalies here so and also here in the

play18:08

normalized cut evaluation like U we have

play18:12

uh for the K value cluster sizes so we

play18:15

have the Purity is 95 uh 0.95 that means

play18:19

95% of Purity and the recall is uh 0.25

play18:24

and F1 score is 0.32 the so the

play18:27

condition entropy will is the

play18:30

0.26 and the next we have uh another uh

play18:35

print evalutions and

play18:37

um the dbsn evaluations so dbsc

play18:42

evaluation is nothing but density based

play18:45

special clustering of application with

play18:47

noise is a it's a popular clustering

play18:50

algorithm that doesn't require specific

play18:52

number of cluster beforehand instead it

play18:54

groups that together closely packed into

play18:56

based on the two parameters a epon and

play18:59

which denotes the radius with the search

play19:01

for neighboring points and Main samples

play19:04

which specify the number of points

play19:05

required to form a dens region so in the

play19:09

final results that we got here is like

play19:12

uh in the training data we have detected

play19:16

1,17

play19:18

anomalies using this uh training data

play19:22

yeah uh this is uh the working working

play19:25

uh demonstration of a project let me go

play19:28

back to the this presentation

play19:35

slide yeah so and the next continuation

play19:40

slide will be uh explained by

play19:47

Manu uh hi everyone uh this is vami

play19:50

sadya uh I'm going to explain about the

play19:52

challenges and the future work uh during

play19:55

this project uh we have faced the

play19:58

challenge

play19:59

changes like uh detecting unusual

play20:02

activity in networks is tough because

play20:04

the data we have might be messy with

play20:07

things like missing values or outer lers

play20:11

plus when we are dealing with a lot of

play20:13

data our detection system might struggle

play20:16

to keep uh keep up solving these issues

play20:18

is crucial to making sure our anomal

play20:21

detection systems can effectively

play20:22

protect digital systems from Cy cyber

play20:25

threats uh uh and the future work the

play20:28

fure work we can develop in this project

play20:31

is uh data quality Advanced Techniques

play20:33

and uh real monitoring and uh uh

play20:37

expansible AI so coming to the data

play20:40

quality we need to find better ways to

play20:42

clean up our data before we analyze it

play20:45

uh to improve the accuracy of our

play20:47

detection system and next coming to

play20:50

Advanced Techniques uh we should explore

play20:52

more advanced method in machine learning

play20:54

to better find anomaly in network data

play20:57

real real time monitoring making our

play21:00

systems able to adapt quickly to new

play21:02

threads by monitoring network activity

play21:05

in real time uh expansible AI uh it's

play21:09

important to be able to understand why

play21:12

our detection systems think something is

play21:15

wrong so we need techniques that can exp

play21:18

explain the decisions clearly and now

play21:21

coming to conclusion part uh VI can you

play21:24

please go to next slide um confirmation

play21:27

of efficient

play21:29

we found that using machine learning

play21:31

techniques like kin svm and neural

play21:34

networks is really effective for

play21:37

sparting unusual activity in networks uh

play21:40

making making sure our data is clean and

play21:42

evaluating our detection systems

play21:45

carefully is crucial for reable results

play21:48

uh looking forward we need to uh keep

play21:50

improving our system to keep up with the

play21:53

Challen with the changing world of cyber

play21:56

security this means um integrated

play21:58

machine machine learning more deeply

play22:00

into how we protect digital systems this

play22:04

work emphasizes the ongoing need for

play22:06

research and development to enhance the

play22:09

capabilities of machine learning based

play22:11

anomaly detection systems in cyber

play22:13

security so next

play22:17

slide yeah uh these are the references

play22:20

uh we have taken from the Google Scholar

play22:23

uh sites and uh uh we have uh taken the

play22:28

uh supervisor learning machine learning

play22:30

and everything we have

play22:32

covered for this project and we have

play22:35

compared with the other models too uh

play22:39

and we have given and we have taken the

play22:41

accuracy F1 score everything uh we are

play22:44

compared with every model in this uh

play22:47

project uh uh as explained by Vijay uh

play22:51

thank you

Rate This
★
★
★
★
★

5.0 / 5 (0 votes)

Étiquettes Connexes
Machine LearningNetwork SecurityAnomaly DetectionCyber ThreatsData AnalysisK-Means ClusteringCyber AttacksML AlgorithmsData IntegrityCybersecurity
Besoin d'un résumé en anglais ?