Fake Profile Detection on Social Networking Websites using Machine Learning | Python IEEE Project

JP INFOTECH PROJECTS

27 Oct 202316:15

Summary

TLDRThe video presents a Python project focused on detecting fake profiles on social networking sites, specifically Instagram, using machine learning. It introduces the project's foundation, based on a 2023 conference paper, and describes enhancements, including the use of Random Forest and Decision Tree classifiers. The project achieves high accuracy, with Random Forest showing better results. The dataset consists of 576 records with 12 features. The video walks through the project execution, from dataset upload to model training, prediction, and performance analysis, ultimately demonstrating fake and real account detection.

Takeaways

🌐 The project focuses on detecting fake profiles on social networking websites, particularly Instagram, using machine learning techniques.
🔎 The system aims to identify fake accounts that may be used for fraudulent activities, cyberbullying, or other malicious purposes.
📈 The project uses two machine learning models: the Random Forest classifier and the Decision Tree classifier, with the former achieving higher accuracy.
📊 The dataset used for training and testing the models contains 576 records with 12 distinct features, such as profile picture, username, and number of followers.
🏁 The Random Forest classifier model achieved a training score of 100% and a test score of 93%, outperforming the Decision Tree classifier.
📝 The project is implemented in Python, using Flask for the web framework, and HTML, CSS, and JavaScript for the front end.
💻 The system architecture includes data preprocessing, feature selection, model application, and performance analysis.
📋 The project's user interface allows users to upload a dataset, preview it, and then train the models to predict whether an account is fake or real.
📈 The performance analysis section provides detailed metrics like recall, precision, F1 score, and confusion matrices for both classifiers.
📊 The project includes static charts for visualizing the accuracy comparison between the two models and the distribution of fake and real accounts in the dataset.

Q & A

What is the main focus of the project discussed in the video?
-The project focuses on detecting fake profiles on social networking websites, specifically Instagram, using machine learning algorithms such as Random Forest and Decision Tree classifiers.
Which machine learning algorithms are used in the proposed project?
-The proposed project uses two machine learning algorithms: Random Forest Classifier and Decision Tree Classifier.
How does the proposed project differ from the base paper?
-While the base paper uses the SG Boost algorithm and does not focus on a specific platform, the proposed project enhances the system by targeting Instagram specifically and uses Random Forest and Decision Tree classifiers instead.
What are the accuracy scores achieved by the Random Forest and Decision Tree models in the project?
-The Random Forest model achieved a training score of 100% and a test score of 93%, while the Decision Tree model achieved a training score of 92% and a test score of 92%.
What kind of dataset is used in the project?
-The dataset used in the project contains 576 records with 12 distinct features, such as profile picture status, length of the username, number of posts, number of followers, and whether the account is labeled as fake or real.
What are some of the key features of the dataset used for training the models?
-Key features of the dataset include profile picture status, ratio of numbers in the username, length of the full name, description length, external URL status, account privacy status, number of posts, number of followers, and number of followings.
What are the main advantages of the proposed system compared to the existing system?
-The main advantages include focusing specifically on Instagram, using more effective machine learning algorithms (Random Forest and Decision Tree), and achieving higher accuracy in detecting fake accounts compared to the base system that uses SG Boost.
What software and tools are used to develop the project?
-The project is developed using Python 3.10.9, the Flask web framework, and front-end technologies like HTML, CSS, and JavaScript.
What is the purpose of the performance analysis section in the project?
-The performance analysis section compares the precision, recall, F1 score, and confusion matrix of both the Random Forest and Decision Tree models, highlighting their effectiveness in detecting fake accounts.
How does the project execute the detection process after setting up the environment?
-After setting up the environment, the user runs the source code, uploads the dataset, trains the models, and then inputs specific account details to predict whether an account is fake or real using the trained machine learning models.