Fake Profile Detection on Social Networking Websites using Machine Learning | Python IEEE Project

JP INFOTECH PROJECTS
27 Oct 202316:15

Summary

TLDRThe video presents a Python project focused on detecting fake profiles on social networking sites, specifically Instagram, using machine learning. It introduces the project's foundation, based on a 2023 conference paper, and describes enhancements, including the use of Random Forest and Decision Tree classifiers. The project achieves high accuracy, with Random Forest showing better results. The dataset consists of 576 records with 12 features. The video walks through the project execution, from dataset upload to model training, prediction, and performance analysis, ultimately demonstrating fake and real account detection.

Takeaways

  • 🌐 The project focuses on detecting fake profiles on social networking websites, particularly Instagram, using machine learning techniques.
  • 🔎 The system aims to identify fake accounts that may be used for fraudulent activities, cyberbullying, or other malicious purposes.
  • 📈 The project uses two machine learning models: the Random Forest classifier and the Decision Tree classifier, with the former achieving higher accuracy.
  • 📊 The dataset used for training and testing the models contains 576 records with 12 distinct features, such as profile picture, username, and number of followers.
  • 🏁 The Random Forest classifier model achieved a training score of 100% and a test score of 93%, outperforming the Decision Tree classifier.
  • 📝 The project is implemented in Python, using Flask for the web framework, and HTML, CSS, and JavaScript for the front end.
  • 💻 The system architecture includes data preprocessing, feature selection, model application, and performance analysis.
  • 📋 The project's user interface allows users to upload a dataset, preview it, and then train the models to predict whether an account is fake or real.
  • 📈 The performance analysis section provides detailed metrics like recall, precision, F1 score, and confusion matrices for both classifiers.
  • 📊 The project includes static charts for visualizing the accuracy comparison between the two models and the distribution of fake and real accounts in the dataset.

Q & A

  • What is the main focus of the project discussed in the video?

    -The project focuses on detecting fake profiles on social networking websites, specifically Instagram, using machine learning algorithms such as Random Forest and Decision Tree classifiers.

  • Which machine learning algorithms are used in the proposed project?

    -The proposed project uses two machine learning algorithms: Random Forest Classifier and Decision Tree Classifier.

  • How does the proposed project differ from the base paper?

    -While the base paper uses the SG Boost algorithm and does not focus on a specific platform, the proposed project enhances the system by targeting Instagram specifically and uses Random Forest and Decision Tree classifiers instead.

  • What are the accuracy scores achieved by the Random Forest and Decision Tree models in the project?

    -The Random Forest model achieved a training score of 100% and a test score of 93%, while the Decision Tree model achieved a training score of 92% and a test score of 92%.

  • What kind of dataset is used in the project?

    -The dataset used in the project contains 576 records with 12 distinct features, such as profile picture status, length of the username, number of posts, number of followers, and whether the account is labeled as fake or real.

  • What are some of the key features of the dataset used for training the models?

    -Key features of the dataset include profile picture status, ratio of numbers in the username, length of the full name, description length, external URL status, account privacy status, number of posts, number of followers, and number of followings.

  • What are the main advantages of the proposed system compared to the existing system?

    -The main advantages include focusing specifically on Instagram, using more effective machine learning algorithms (Random Forest and Decision Tree), and achieving higher accuracy in detecting fake accounts compared to the base system that uses SG Boost.

  • What software and tools are used to develop the project?

    -The project is developed using Python 3.10.9, the Flask web framework, and front-end technologies like HTML, CSS, and JavaScript.

  • What is the purpose of the performance analysis section in the project?

    -The performance analysis section compares the precision, recall, F1 score, and confusion matrix of both the Random Forest and Decision Tree models, highlighting their effectiveness in detecting fake accounts.

  • How does the project execute the detection process after setting up the environment?

    -After setting up the environment, the user runs the source code, uploads the dataset, trains the models, and then inputs specific account details to predict whether an account is fake or real using the trained machine learning models.

Outlines

00:00

🔍 Introduction to Fake Profile Detection Using Machine Learning

The video introduces a Python project focused on detecting fake profiles on social networking websites using machine learning. The project is based on a 2023 conference paper that initially proposed using the SGBoost algorithm for classification. However, the video’s project aims to enhance the paper’s system by focusing specifically on Instagram, using Random Forest and Decision Tree classifiers. It explains that fake profiles are often used for malicious activities, and outlines the accuracy of both models, with Random Forest outperforming Decision Tree.

05:01

⚙️ System Requirements and Project Setup

This section covers the technical setup of the project, including system and software requirements. The project uses Python 3.10.9, Flask for web development, and HTML, CSS, and JavaScript for the front end. Viewers are instructed to ensure the required Python libraries are installed before running the code. The process of executing the project is also detailed, from navigating to the source code directory to running the app and accessing the project through a browser.

10:02

📊 Data Set, Model Training, and Fake Account Detection

This part explains the dataset used, which contains 576 records with 12 features such as profile picture, username, description, and followers. The dataset is used to train the models to detect fake accounts. Two models are tested: Random Forest and Decision Tree classifiers. Viewers are shown how to upload the dataset, train the models, and make predictions on whether a given Instagram account is fake or real based on several features. Both models predict similar results, with Random Forest being slightly more accurate.

15:05

📈 Performance Analysis and Chart Comparisons

The performance of both models (Random Forest and Decision Tree) is analyzed, showing metrics such as recall, precision, and F1 scores. Confusion matrices are also presented to demonstrate the performance of each model in classifying accounts as real or fake. The section concludes with a comparison of the accuracy of both models, with Random Forest achieving 93% accuracy and Decision Tree 92%. Additionally, a chart compares the percentage of fake and real accounts in the dataset, which is composed of 60% fake and 40% real accounts.

Mindmap

Keywords

💡Fake Profile Detection

Fake profile detection refers to identifying fraudulent or false accounts on social networking websites. In the video, the project focuses on detecting fake Instagram accounts using machine learning models. This is crucial in combating cyber threats like scams, bullying, and identity fraud.

💡Machine Learning

Machine learning involves algorithms and statistical models that allow computers to make predictions or decisions based on data. In the video, machine learning models like Random Forest and Decision Tree classifiers are used to detect fake Instagram accounts by analyzing various features of user profiles.

💡Random Forest Classifier

Random Forest is a machine learning algorithm that builds multiple decision trees and merges their results to improve accuracy. In the video, it is one of the models used for fake profile detection and has achieved a training score of 100% and a test score of 93%, making it the more effective of the two models.

💡Decision Tree Classifier

A Decision Tree Classifier is a model that splits data into branches to predict outcomes. The video describes using this model for fake account detection with a training and test score of 92%. While effective, it is slightly less accurate than the Random Forest model in this project.

💡Instagram Fake Account Detection

Instagram Fake Account Detection is the focus of the project in the video. It refers to identifying fraudulent accounts specifically on the Instagram platform, using machine learning techniques to analyze user behavior and profile details like username, posts, and followers.

💡SGBoost Algorithm

SGBoost (Stochastic Gradient Boosting) is an algorithm that enhances the performance of machine learning models by focusing on difficult-to-predict cases. The video mentions that the base paper used SGBoost, but the project in the video opts for Random Forest and Decision Tree classifiers instead for improved performance.

💡Dataset

A dataset is a collection of data used for training machine learning models. The video mentions a dataset with 576 records and 12 features, including profile pictures, username length, and the number of followers. This data is used to train the model to predict whether an account is fake or real.

💡Profile Features

Profile features refer to characteristics of a social media profile used for machine learning analysis. In the video, these include attributes like the number of followers, the length of usernames, the presence of an external URL, and whether the account is private. These features help determine the likelihood that an account is fake.

💡Accuracy

Accuracy in machine learning refers to the percentage of correct predictions made by the model. In the video, the accuracy scores of the Random Forest (93%) and Decision Tree (92%) classifiers are highlighted, indicating how well each model performs in detecting fake Instagram accounts.

💡Performance Analysis

Performance analysis evaluates how well a machine learning model performs by analyzing metrics like precision, recall, and F1 score. The video shows a comparison of performance for both the Random Forest and Decision Tree classifiers, demonstrating that Random Forest slightly outperforms the other model.

Highlights

Introduction to the Python project on fake profile detection using machine learning.

Discussion on the rising usage of social networking sites and issues like fake accounts.

Base paper titled 'Fake Profile Detection on Social Networking Websites using Machine Learning' utilizing SGBoost algorithm.

The enhanced project focuses on Instagram fake account detection using machine learning.

Two models implemented: Random Forest classifier and Decision Tree classifier.

Random Forest classifier achieved a training score of 100% and a test score of 93%.

Decision Tree classifier achieved both training and test scores of 92%.

Dataset contains 576 records with 12 distinct features like profile pic, username length, and followers.

Comparison of the enhanced system with the base paper, highlighting improvements.

Step-by-step instructions for executing the Python project in a command prompt.

A static login page is used, without any database integration.

Prediction model tests for Instagram accounts with both Random Forest and Decision Tree classifiers.

Performance analysis showing precision, recall, and F1 scores for both models.

A static chart comparing accuracy scores of Random Forest (93%) and Decision Tree (92%).

Final visualization of the dataset showing a 60% fake and 40% real account ratio.

Transcripts

play00:02

[Music]

play00:12

hi in this video we going to see about a

play00:16

python project which is entitled as fake

play00:19

profile detection on social networking

play00:22

websites using machine learning which is

play00:24

an i 2023 conference paper before seeing

play00:28

the execution of the project let me

play00:30

leave about this project so as we know

play00:33

that the internet is growing everywhere

play00:36

all over the world the same way the

play00:38

usage of social networking is also

play00:41

happens all over the world and all over

play00:43

the people the people come to social

play00:46

networking sites to spend their time for

play00:49

various reasons but some of the fake

play00:52

users will be creating some frake

play00:54

accounts to sell

play00:57

some products or cheat some people for

play01:01

making money or they may threaten or

play01:04

make cyber bullying so there are various

play01:06

reasons for creating this kinds of fake

play01:08

profiles so here in the base paper the

play01:10

authors have proposed a system for

play01:14

detecting the fake profile detection on

play01:16

social networking website using machine

play01:17

learning and they have used SG

play01:21

boost algorithm for making the

play01:25

classification and here in the base

play01:27

paper we not going to implement the same

play01:29

as mentioned in the base paper so we are

play01:31

going to enhance some features other

play01:34

than the base paper so here the

play01:36

drawbacks that is mentioned in the base

play01:38

paper is like they have prescribed about

play01:40

the social networking website but they

play01:42

are not prescribed about any particular

play01:44

website so they are generally mention

play01:46

about the social networking websites

play01:49

only so now we are going to enhance the

play01:52

system so now let us see about other

play01:54

enhancement so here you can see the I

play01:56

base paper title is fake profile

play01:57

eduction on social networking website

play01:59

using machine learning or our proposed

play02:01

project title is Instagram fake account

play02:03

detection using machine learning so we

play02:04

are going to concern about the Instagram

play02:07

social networking website only and here

play02:10

you can see the it base paper abstract

play02:12

let us see about the proposed abstract

play02:15

so here in the proposed system so we'll

play02:18

be implementing two different

play02:21

models so the we have used two different

play02:24

models for our proposed systems the

play02:27

first model using random Forest

play02:29

classifier and the second model using

play02:31

the decision tree classifier so the

play02:33

first model that is using random Forest

play02:36

classifier we have achieved the train

play02:38

score of 100% and test score of

play02:40

93% and the second model of dentry

play02:44

classifier we have achieved train score

play02:46

of 92% and death score of 92% so from

play02:49

the two models we'll be proving that the

play02:52

random Forest classifier is performing

play02:54

well on it so now coming to the coming

play02:57

back to the abstract part so here you

play02:59

can see the

play03:01

abstract that we have mentioned about

play03:02

the Instagram fake account detection is

play03:04

in machine learning using python we have

play03:06

done that and we have used the two

play03:07

distinct algorithms like random forest

play03:10

classifier and dentry so here you can

play03:11

see the accuracy of the models and

play03:15

coming to the data set part so we have

play03:17

used the data set which contains 576

play03:21

data set records and there will be 12

play03:23

distinct features on it so there are

play03:25

various features on it so I'll show you

play03:28

the data set part now so this is the

play03:31

data set that we have used so as

play03:34

mentioned you can see scroll down and

play03:36

you can see which contains around 576

play03:38

data center records and here you can see

play03:41

the two 12 uh distinct features like

play03:45

profile

play03:46

pick numbers by length of the user full

play03:49

name words numbers by length of full

play03:52

name name is equal to username

play03:55

description length external URL private

play03:58

number of post number of followers

play04:00

number of followers and fake that is

play04:02

labeled as Z or one so these all this is

play04:04

the data set that you're going to train

play04:05

up the system coming back to the

play04:08

abstract document so here as mentioned

play04:11

so we'll be using Python and we'll be uh

play04:15

finding out the account is fake or not

play04:17

this is the about the existing system so

play04:19

we are considering the base paper as

play04:21

existing so as mentioned the base paper

play04:23

they are used SG boost algorithm so we

play04:26

have described about the existing system

play04:27

part here and coming to the next next

play04:29

part that is the disadvantages of

play04:31

existing system so we have listed the

play04:33

disadvantages of the existing system

play04:35

that is using the SG boost and coming to

play04:38

the propose system so we have mentioned

play04:40

about the propose system what we have

play04:42

done in

play04:43

it and next part is the advantages of

play04:47

the propos system so these are the

play04:49

advantages that we have listed of about

play04:51

our propose system the system

play04:53

architecture you can find that input

play04:55

data set and we are going to make the

play04:57

free processing and feature selection

play04:58

and we have app apply the random forest

play05:00

classifier and add classifier the

play05:02

predicted result is it is a fake or real

play05:04

account and we'll be showing the

play05:06

performance analysis and the graph part

play05:08

of it in the system requirements you can

play05:10

see the hardware and software

play05:11

requirements as mentioned we have

play05:13

developed using python the version that

play05:14

we have used is Python

play05:16

3.10.9 and web framework is flask and

play05:19

the front end part we have done using

play05:20

HTML Cs and JavaScript and this is the

play05:23

reference of the project is the base

play05:25

paper of the

play05:28

project

play05:30

before execution make sure that you have

play05:32

fulfilled the requirement that is

play05:33

mentioned the reement file with the

play05:34

exact verion of the Python and the

play05:35

library is installed in your system now

play05:38

let us see the execution of the project

play05:40

so just go to the source code location

play05:42

copy the source code location now go to

play05:44

the command

play05:48

[Music]

play05:50

prompt now go to the drive location

play05:52

where you have pasted the code in my

play05:54

case I have pasted my code in F drive

play05:55

I'll go to the F drive now now type CD

play05:58

and give space and paste the location

play06:00

that we are copied and click enter so

play06:01

now we are into the source code location

play06:04

now type Python app.py and click enter

play06:08

and kindly wait for a few

play06:11

seconds so now you can see the URL just

play06:14

copy this URL go to any of your browser

play06:17

I'm going to Google Chrome now paste the

play06:20

URL that we are copied and click enter

play06:22

so now you can see the home screen or

play06:24

welcome screen of the project with the

play06:26

project title Instagram fake account

play06:28

deduction using machine learning

play06:29

so first click this login

play06:32

menu so once if you click this login

play06:34

menu it will be navigated to the login

play06:36

page kindly note that this is a static

play06:38

login page because we have not used any

play06:40

database in the project so just enter

play06:42

the default username and password as

play06:44

admin and admin and then click the login

play06:47

button now you can see the login success

play06:49

message and click okay so now it will be

play06:51

navigated to the upload part where you

play06:53

need to upload the data set just select

play06:56

this choose file now go to the source

play06:58

code location

play07:00

and select the upload. CSV file and then

play07:04

click this upload

play07:06

button so once I've uploaded the page

play07:09

will be navigated to the preview page so

play07:11

where you can preview the data set that

play07:13

we have uploaded so as

play07:15

mentioned shown earlier you can see the

play07:17

features like ID profile pick number

play07:20

length user full name words so all the

play07:22

12 features you can see here and if you

play07:24

scroll down till the end you can find

play07:27

the complete data set has been loaded

play07:28

into the pre view part with the all the

play07:31

575 data set records now you can just

play07:34

click this click to train not test

play07:35

button and kindly wait for few

play07:45

seconds so now you can see the training

play07:47

finish message and click okay so now it

play07:49

will be navigated to the important part

play07:51

that is the prediction part so where you

play07:52

need to enter the details and check

play07:55

whether the prediction result with the

play07:59

account is a fake or not so here you can

play08:01

see the two models that is model with

play08:05

random forest classifier and d three

play08:07

classifier so you can check with the any

play08:10

any one model by selecting the model and

play08:12

I'll show you with the few cases now so

play08:14

now let me enter the profile pick as

play08:17

yes ratio of numbers by length username

play08:23

is uh

play08:25

0.55 full name words as one ratio of

play08:30

numbers full length full name is

play08:34

0.44 and username is equal to name is

play08:37

equal to username no description length

play08:41

is

play08:42

zero external URL is

play08:46

no account private is no total number of

play08:50

post is 33 total number of followers is

play08:55

166 and total follows is 59

play09:00

56 and here you can select the model as

play09:03

mentioned I I'll show you with the

play09:05

random forest classifier and click this

play09:07

predict

play09:08

button so now you can see the prediction

play09:11

result is the account is the Instagram

play09:13

account is it is a fake account so what

play09:16

are the details that is mentioned that

play09:17

is a fake account the model that is we

play09:19

have used is random Forest classifier so

play09:21

I'll show you with the as we are not

play09:24

using any database the values that we

play09:26

have entered has been reset now so I

play09:28

I'll just enter the same values again

play09:30

and I'll show you with the other model

play09:32

also

play09:33

[Music]

play09:44

[Music]

play09:54

quickly so now let me select the model

play09:57

as decision tree classifier and click

play09:58

the predict button so now you can see

play10:02

the addition Tre model also is

play10:04

classified this account and predicted

play10:06

the result is all is a fake account so

play10:09

in this way you can just check with both

play10:11

the models uh

play10:13

whether random for us at D Tre maximum

play10:18

both will be showing the same result as

play10:19

we got only the few difference

play10:21

percentage in the accuracy so let me

play10:24

show you with the other case now with

play10:27

the profile picture

play10:30

yes now ratio of numbers

play10:34

zero full name words

play10:37

two ratio of numbers name Z

play10:42

zero user name is equal to usern name no

play10:45

description length is

play10:47

63

play10:49

external URL is

play10:51

[Music]

play10:53

no account private is no total number of

play10:58

posts

play11:00

is

play11:02

378 total number of followers is

play11:05

34

play11:07

670 total follows is

play11:11

1,878 so now let us check with the model

play11:14

random for us and click the predict

play11:17

button and now you can see the model

play11:19

predicted the Instagram account is real

play11:23

so I shown you with a a case of real now

play11:28

for examp example I'll take an account

play11:31

of ours so let me type JP infotech

play11:36

Instagram let me take this

play11:40

example so

play11:42

here if you wanted to check again just

play11:44

click this prediction menu again so now

play11:46

it is reseted now we are in the

play11:48

prediction menu so now the profile

play11:50

picture is available so I'll click

play11:51

profile picture yes so ratio of numbers

play11:54

I have not used any numbers in the

play11:56

username so it is zero f full name words

play12:00

so full name words we have three words

play12:03

as full name words so here also we don't

play12:05

have a number so let me give zero

play12:08

username name is equal to username so I

play12:10

have the username and the name is same

play12:13

so I'll give it as yes description

play12:16

length so description length will be

play12:18

around

play12:19

150 characters maybe I'll just enter

play12:22

approximate of 150 characters and uh

play12:26

external URL yes I have given external

play12:28

URL JP info.org so I'll just enter that

play12:32

yes and account is private no this is

play12:34

not a private account this is a public

play12:35

account so I'll click yes total number

play12:38

of post I have posted is one1 post so

play12:41

I'll just enter1 and total followers is

play12:45

275 followers so I just enter 275

play12:49

followers and following zero I just

play12:52

enter zero so now let us check for this

play12:54

case and click the predict

play12:56

button and now you can see the model

play12:59

random Forest classified this Instagram

play13:01

account is a real one so in this

play13:04

scenario you can check with uh some

play13:07

other cases also so now let me come to

play13:09

the prediction menu again so now let me

play13:12

enter the case with profile picture

play13:15

yes ratio number

play13:19

0.07 full name words as one there are

play13:23

zero other one is no description length

play13:26

are

play13:27

zero external URL no private as yes

play13:32

total number of poost are zero total

play13:35

number of followers is 47 and total

play13:38

follows is 98 and let me click the

play13:42

predict so now you can see this account

play13:45

is predicted as a fake account so in

play13:49

this way you can check with the other

play13:51

cases so this is not limited only to the

play13:54

things that I have mentioned there are

play13:56

as I mentioned you there are around uh

play13:59

500 data set records you can check with

play14:03

each and every one with both the models

play14:05

and you can just find the results of it

play14:08

so now to make the video shorter I move

play14:10

to the next part that is the performance

play14:12

analysis part so just click this

play14:13

performance

play14:14

analysis so now you can see the

play14:17

performance analysis of both the models

play14:19

That Is Random forest classifier and for

play14:21

the dentry classifier so in the

play14:24

performance analysis of random Forest

play14:26

classifier you can see the recall

play14:28

Precision in F1 score so the recall

play14:31

value Precision value and F1 score for

play14:33

the both the cases 0o and one has been

play14:35

shown and here you can see the confusion

play14:37

Matrix of the random Forest classifier

play14:40

which contain two one print label of the

play14:44

both the cases 0 and one and coming to

play14:46

the next model that is dision Tre

play14:47

classify performance analysis you can

play14:49

see the recall PR and F1 score of this

play14:51

di Tre classifier for both cases 0 and

play14:54

one and you can see the confusion Matrix

play14:56

for the dentry classifier which contains

play14:58

the two and PR label for the both

play15:02

cases and finally is the chart part so

play15:04

just click this chart chart menu to be n

play15:07

the chart but kindly note that this

play15:09

chart is also a static chart because we

play15:12

are not used any database in the project

play15:15

so first chart shows the comparison of

play15:17

the accuracy score so here you can see

play15:20

we have used two different model like

play15:21

random foration Tre classifier So Random

play15:24

for us has achieved the accuracy of 93%

play15:27

and tree of about 92% so that has been

play15:30

compared here and we have proved that

play15:32

our random fors performs better well

play15:35

than other model and coming to the next

play15:37

chart which contains

play15:39

the uh the fake and real percentage that

play15:43

is the data set the which we have

play15:45

trained up with contains the fake

play15:48

account of 60% and real account of 40%

play15:51

data set record so that is being

play15:53

depicted here manually so that is what I

play15:56

said this is a static chart so the chart

play15:58

part con sometimes the comparison of

play16:00

accuracy score and the data set

play16:04

comparison and now let me

play16:07

loog and this is all about the project

play16:10

Instagram fake account detection using

play16:12

machine learning using Python and thank

play16:14

you for watching

Rate This

5.0 / 5 (0 votes)

Etiquetas Relacionadas
Machine LearningSocial MediaFake DetectionData AnalysisPython ProjectRandom ForestDecision TreeCybersecurityAccount ValidationML Algorithms
¿Necesitas un resumen en inglés?