Fake Profile Detection on Social Networking Websites using Machine Learning | Python IEEE Project
Summary
TLDRThe video presents a Python project focused on detecting fake profiles on social networking sites, specifically Instagram, using machine learning. It introduces the project's foundation, based on a 2023 conference paper, and describes enhancements, including the use of Random Forest and Decision Tree classifiers. The project achieves high accuracy, with Random Forest showing better results. The dataset consists of 576 records with 12 features. The video walks through the project execution, from dataset upload to model training, prediction, and performance analysis, ultimately demonstrating fake and real account detection.
Takeaways
- 🌐 The project focuses on detecting fake profiles on social networking websites, particularly Instagram, using machine learning techniques.
- 🔎 The system aims to identify fake accounts that may be used for fraudulent activities, cyberbullying, or other malicious purposes.
- 📈 The project uses two machine learning models: the Random Forest classifier and the Decision Tree classifier, with the former achieving higher accuracy.
- 📊 The dataset used for training and testing the models contains 576 records with 12 distinct features, such as profile picture, username, and number of followers.
- 🏁 The Random Forest classifier model achieved a training score of 100% and a test score of 93%, outperforming the Decision Tree classifier.
- 📝 The project is implemented in Python, using Flask for the web framework, and HTML, CSS, and JavaScript for the front end.
- 💻 The system architecture includes data preprocessing, feature selection, model application, and performance analysis.
- 📋 The project's user interface allows users to upload a dataset, preview it, and then train the models to predict whether an account is fake or real.
- 📈 The performance analysis section provides detailed metrics like recall, precision, F1 score, and confusion matrices for both classifiers.
- 📊 The project includes static charts for visualizing the accuracy comparison between the two models and the distribution of fake and real accounts in the dataset.
Q & A
What is the main focus of the project discussed in the video?
-The project focuses on detecting fake profiles on social networking websites, specifically Instagram, using machine learning algorithms such as Random Forest and Decision Tree classifiers.
Which machine learning algorithms are used in the proposed project?
-The proposed project uses two machine learning algorithms: Random Forest Classifier and Decision Tree Classifier.
How does the proposed project differ from the base paper?
-While the base paper uses the SG Boost algorithm and does not focus on a specific platform, the proposed project enhances the system by targeting Instagram specifically and uses Random Forest and Decision Tree classifiers instead.
What are the accuracy scores achieved by the Random Forest and Decision Tree models in the project?
-The Random Forest model achieved a training score of 100% and a test score of 93%, while the Decision Tree model achieved a training score of 92% and a test score of 92%.
What kind of dataset is used in the project?
-The dataset used in the project contains 576 records with 12 distinct features, such as profile picture status, length of the username, number of posts, number of followers, and whether the account is labeled as fake or real.
What are some of the key features of the dataset used for training the models?
-Key features of the dataset include profile picture status, ratio of numbers in the username, length of the full name, description length, external URL status, account privacy status, number of posts, number of followers, and number of followings.
What are the main advantages of the proposed system compared to the existing system?
-The main advantages include focusing specifically on Instagram, using more effective machine learning algorithms (Random Forest and Decision Tree), and achieving higher accuracy in detecting fake accounts compared to the base system that uses SG Boost.
What software and tools are used to develop the project?
-The project is developed using Python 3.10.9, the Flask web framework, and front-end technologies like HTML, CSS, and JavaScript.
What is the purpose of the performance analysis section in the project?
-The performance analysis section compares the precision, recall, F1 score, and confusion matrix of both the Random Forest and Decision Tree models, highlighting their effectiveness in detecting fake accounts.
How does the project execute the detection process after setting up the environment?
-After setting up the environment, the user runs the source code, uploads the dataset, trains the models, and then inputs specific account details to predict whether an account is fake or real using the trained machine learning models.
Outlines
🔍 Introduction to Fake Profile Detection Using Machine Learning
The video introduces a Python project focused on detecting fake profiles on social networking websites using machine learning. The project is based on a 2023 conference paper that initially proposed using the SGBoost algorithm for classification. However, the video’s project aims to enhance the paper’s system by focusing specifically on Instagram, using Random Forest and Decision Tree classifiers. It explains that fake profiles are often used for malicious activities, and outlines the accuracy of both models, with Random Forest outperforming Decision Tree.
⚙️ System Requirements and Project Setup
This section covers the technical setup of the project, including system and software requirements. The project uses Python 3.10.9, Flask for web development, and HTML, CSS, and JavaScript for the front end. Viewers are instructed to ensure the required Python libraries are installed before running the code. The process of executing the project is also detailed, from navigating to the source code directory to running the app and accessing the project through a browser.
📊 Data Set, Model Training, and Fake Account Detection
This part explains the dataset used, which contains 576 records with 12 features such as profile picture, username, description, and followers. The dataset is used to train the models to detect fake accounts. Two models are tested: Random Forest and Decision Tree classifiers. Viewers are shown how to upload the dataset, train the models, and make predictions on whether a given Instagram account is fake or real based on several features. Both models predict similar results, with Random Forest being slightly more accurate.
📈 Performance Analysis and Chart Comparisons
The performance of both models (Random Forest and Decision Tree) is analyzed, showing metrics such as recall, precision, and F1 scores. Confusion matrices are also presented to demonstrate the performance of each model in classifying accounts as real or fake. The section concludes with a comparison of the accuracy of both models, with Random Forest achieving 93% accuracy and Decision Tree 92%. Additionally, a chart compares the percentage of fake and real accounts in the dataset, which is composed of 60% fake and 40% real accounts.
Mindmap
Keywords
💡Fake Profile Detection
💡Machine Learning
💡Random Forest Classifier
💡Decision Tree Classifier
💡Instagram Fake Account Detection
💡SGBoost Algorithm
💡Dataset
💡Profile Features
💡Accuracy
💡Performance Analysis
Highlights
Introduction to the Python project on fake profile detection using machine learning.
Discussion on the rising usage of social networking sites and issues like fake accounts.
Base paper titled 'Fake Profile Detection on Social Networking Websites using Machine Learning' utilizing SGBoost algorithm.
The enhanced project focuses on Instagram fake account detection using machine learning.
Two models implemented: Random Forest classifier and Decision Tree classifier.
Random Forest classifier achieved a training score of 100% and a test score of 93%.
Decision Tree classifier achieved both training and test scores of 92%.
Dataset contains 576 records with 12 distinct features like profile pic, username length, and followers.
Comparison of the enhanced system with the base paper, highlighting improvements.
Step-by-step instructions for executing the Python project in a command prompt.
A static login page is used, without any database integration.
Prediction model tests for Instagram accounts with both Random Forest and Decision Tree classifiers.
Performance analysis showing precision, recall, and F1 scores for both models.
A static chart comparing accuracy scores of Random Forest (93%) and Decision Tree (92%).
Final visualization of the dataset showing a 60% fake and 40% real account ratio.
Transcripts
[Music]
hi in this video we going to see about a
python project which is entitled as fake
profile detection on social networking
websites using machine learning which is
an i 2023 conference paper before seeing
the execution of the project let me
leave about this project so as we know
that the internet is growing everywhere
all over the world the same way the
usage of social networking is also
happens all over the world and all over
the people the people come to social
networking sites to spend their time for
various reasons but some of the fake
users will be creating some frake
accounts to sell
some products or cheat some people for
making money or they may threaten or
make cyber bullying so there are various
reasons for creating this kinds of fake
profiles so here in the base paper the
authors have proposed a system for
detecting the fake profile detection on
social networking website using machine
learning and they have used SG
boost algorithm for making the
classification and here in the base
paper we not going to implement the same
as mentioned in the base paper so we are
going to enhance some features other
than the base paper so here the
drawbacks that is mentioned in the base
paper is like they have prescribed about
the social networking website but they
are not prescribed about any particular
website so they are generally mention
about the social networking websites
only so now we are going to enhance the
system so now let us see about other
enhancement so here you can see the I
base paper title is fake profile
eduction on social networking website
using machine learning or our proposed
project title is Instagram fake account
detection using machine learning so we
are going to concern about the Instagram
social networking website only and here
you can see the it base paper abstract
let us see about the proposed abstract
so here in the proposed system so we'll
be implementing two different
models so the we have used two different
models for our proposed systems the
first model using random Forest
classifier and the second model using
the decision tree classifier so the
first model that is using random Forest
classifier we have achieved the train
score of 100% and test score of
93% and the second model of dentry
classifier we have achieved train score
of 92% and death score of 92% so from
the two models we'll be proving that the
random Forest classifier is performing
well on it so now coming to the coming
back to the abstract part so here you
can see the
abstract that we have mentioned about
the Instagram fake account detection is
in machine learning using python we have
done that and we have used the two
distinct algorithms like random forest
classifier and dentry so here you can
see the accuracy of the models and
coming to the data set part so we have
used the data set which contains 576
data set records and there will be 12
distinct features on it so there are
various features on it so I'll show you
the data set part now so this is the
data set that we have used so as
mentioned you can see scroll down and
you can see which contains around 576
data center records and here you can see
the two 12 uh distinct features like
profile
pick numbers by length of the user full
name words numbers by length of full
name name is equal to username
description length external URL private
number of post number of followers
number of followers and fake that is
labeled as Z or one so these all this is
the data set that you're going to train
up the system coming back to the
abstract document so here as mentioned
so we'll be using Python and we'll be uh
finding out the account is fake or not
this is the about the existing system so
we are considering the base paper as
existing so as mentioned the base paper
they are used SG boost algorithm so we
have described about the existing system
part here and coming to the next next
part that is the disadvantages of
existing system so we have listed the
disadvantages of the existing system
that is using the SG boost and coming to
the propose system so we have mentioned
about the propose system what we have
done in
it and next part is the advantages of
the propos system so these are the
advantages that we have listed of about
our propose system the system
architecture you can find that input
data set and we are going to make the
free processing and feature selection
and we have app apply the random forest
classifier and add classifier the
predicted result is it is a fake or real
account and we'll be showing the
performance analysis and the graph part
of it in the system requirements you can
see the hardware and software
requirements as mentioned we have
developed using python the version that
we have used is Python
3.10.9 and web framework is flask and
the front end part we have done using
HTML Cs and JavaScript and this is the
reference of the project is the base
paper of the
project
before execution make sure that you have
fulfilled the requirement that is
mentioned the reement file with the
exact verion of the Python and the
library is installed in your system now
let us see the execution of the project
so just go to the source code location
copy the source code location now go to
the command
[Music]
prompt now go to the drive location
where you have pasted the code in my
case I have pasted my code in F drive
I'll go to the F drive now now type CD
and give space and paste the location
that we are copied and click enter so
now we are into the source code location
now type Python app.py and click enter
and kindly wait for a few
seconds so now you can see the URL just
copy this URL go to any of your browser
I'm going to Google Chrome now paste the
URL that we are copied and click enter
so now you can see the home screen or
welcome screen of the project with the
project title Instagram fake account
deduction using machine learning
so first click this login
menu so once if you click this login
menu it will be navigated to the login
page kindly note that this is a static
login page because we have not used any
database in the project so just enter
the default username and password as
admin and admin and then click the login
button now you can see the login success
message and click okay so now it will be
navigated to the upload part where you
need to upload the data set just select
this choose file now go to the source
code location
and select the upload. CSV file and then
click this upload
button so once I've uploaded the page
will be navigated to the preview page so
where you can preview the data set that
we have uploaded so as
mentioned shown earlier you can see the
features like ID profile pick number
length user full name words so all the
12 features you can see here and if you
scroll down till the end you can find
the complete data set has been loaded
into the pre view part with the all the
575 data set records now you can just
click this click to train not test
button and kindly wait for few
seconds so now you can see the training
finish message and click okay so now it
will be navigated to the important part
that is the prediction part so where you
need to enter the details and check
whether the prediction result with the
account is a fake or not so here you can
see the two models that is model with
random forest classifier and d three
classifier so you can check with the any
any one model by selecting the model and
I'll show you with the few cases now so
now let me enter the profile pick as
yes ratio of numbers by length username
is uh
0.55 full name words as one ratio of
numbers full length full name is
0.44 and username is equal to name is
equal to username no description length
is
zero external URL is
no account private is no total number of
post is 33 total number of followers is
166 and total follows is 59
56 and here you can select the model as
mentioned I I'll show you with the
random forest classifier and click this
predict
button so now you can see the prediction
result is the account is the Instagram
account is it is a fake account so what
are the details that is mentioned that
is a fake account the model that is we
have used is random Forest classifier so
I'll show you with the as we are not
using any database the values that we
have entered has been reset now so I
I'll just enter the same values again
and I'll show you with the other model
also
[Music]
[Music]
quickly so now let me select the model
as decision tree classifier and click
the predict button so now you can see
the addition Tre model also is
classified this account and predicted
the result is all is a fake account so
in this way you can just check with both
the models uh
whether random for us at D Tre maximum
both will be showing the same result as
we got only the few difference
percentage in the accuracy so let me
show you with the other case now with
the profile picture
yes now ratio of numbers
zero full name words
two ratio of numbers name Z
zero user name is equal to usern name no
description length is
63
external URL is
[Music]
no account private is no total number of
posts
is
378 total number of followers is
34
670 total follows is
1,878 so now let us check with the model
random for us and click the predict
button and now you can see the model
predicted the Instagram account is real
so I shown you with a a case of real now
for examp example I'll take an account
of ours so let me type JP infotech
Instagram let me take this
example so
here if you wanted to check again just
click this prediction menu again so now
it is reseted now we are in the
prediction menu so now the profile
picture is available so I'll click
profile picture yes so ratio of numbers
I have not used any numbers in the
username so it is zero f full name words
so full name words we have three words
as full name words so here also we don't
have a number so let me give zero
username name is equal to username so I
have the username and the name is same
so I'll give it as yes description
length so description length will be
around
150 characters maybe I'll just enter
approximate of 150 characters and uh
external URL yes I have given external
URL JP info.org so I'll just enter that
yes and account is private no this is
not a private account this is a public
account so I'll click yes total number
of post I have posted is one1 post so
I'll just enter1 and total followers is
275 followers so I just enter 275
followers and following zero I just
enter zero so now let us check for this
case and click the predict
button and now you can see the model
random Forest classified this Instagram
account is a real one so in this
scenario you can check with uh some
other cases also so now let me come to
the prediction menu again so now let me
enter the case with profile picture
yes ratio number
0.07 full name words as one there are
zero other one is no description length
are
zero external URL no private as yes
total number of poost are zero total
number of followers is 47 and total
follows is 98 and let me click the
predict so now you can see this account
is predicted as a fake account so in
this way you can check with the other
cases so this is not limited only to the
things that I have mentioned there are
as I mentioned you there are around uh
500 data set records you can check with
each and every one with both the models
and you can just find the results of it
so now to make the video shorter I move
to the next part that is the performance
analysis part so just click this
performance
analysis so now you can see the
performance analysis of both the models
That Is Random forest classifier and for
the dentry classifier so in the
performance analysis of random Forest
classifier you can see the recall
Precision in F1 score so the recall
value Precision value and F1 score for
the both the cases 0o and one has been
shown and here you can see the confusion
Matrix of the random Forest classifier
which contain two one print label of the
both the cases 0 and one and coming to
the next model that is dision Tre
classify performance analysis you can
see the recall PR and F1 score of this
di Tre classifier for both cases 0 and
one and you can see the confusion Matrix
for the dentry classifier which contains
the two and PR label for the both
cases and finally is the chart part so
just click this chart chart menu to be n
the chart but kindly note that this
chart is also a static chart because we
are not used any database in the project
so first chart shows the comparison of
the accuracy score so here you can see
we have used two different model like
random foration Tre classifier So Random
for us has achieved the accuracy of 93%
and tree of about 92% so that has been
compared here and we have proved that
our random fors performs better well
than other model and coming to the next
chart which contains
the uh the fake and real percentage that
is the data set the which we have
trained up with contains the fake
account of 60% and real account of 40%
data set record so that is being
depicted here manually so that is what I
said this is a static chart so the chart
part con sometimes the comparison of
accuracy score and the data set
comparison and now let me
loog and this is all about the project
Instagram fake account detection using
machine learning using Python and thank
you for watching
Посмотреть больше похожих видео
Project 06: Heart Disease Prediction Using Python & Machine Learning
Detection of Stress in IT Employees using Machine Learning Technique | Python Final Year Project
MACHINE LEARNING BASED PREDICTION OF CHRONIC KIDNEY DISEASE AND PERSONALISED DIETARY RECOMMENDATIONS
What is a Machine Learning Engineer
Human activity detection
Machine Learning Tutorial Python - 9 Decision Tree
5.0 / 5 (0 votes)