Cara Crawling Data di platform X/Twitter

Fikri Maulana

24 Mar 202420:04

Summary

TLDRIn this tutorial, Pikri explains the process of data crawling from Twitter using Python, demonstrating how to gather tweets based on specific keywords and filters. The video covers obtaining authentication tokens, setting parameters for data retrieval, and managing Twitter's rate limits. Pikri emphasizes the importance of responsible data usage, especially for academic purposes, and provides examples of how to analyze collected data, including metrics like likes and retweets. The tutorial also touches on combining data from multiple months for comprehensive analysis, showcasing practical applications for sentiment analysis and recommendation systems.

Takeaways

😀 Data crawling involves automatically gathering data from platforms like Twitter using programming languages such as Python.
😀 A typical user can collect about 1,000 data points per day from Twitter, subject to limitations set by the platform.
😀 To extract data, users need an authentication token, which can be obtained through the Twitter interface.
😀 Users can filter the data they collect by keywords, dates, and specific Twitter accounts to narrow their search results.
😀 Data collected from Twitter is often stored in a CSV file format, allowing for easy analysis and manipulation using tools like Pandas.
😀 To avoid rate limits imposed by Twitter, it's advisable to spread data collection across multiple accounts or limit daily data requests.
😀 Users can aggregate data over time by collecting tweets from different months and merging the resulting CSV files for comprehensive analysis.
😀 The collected data can include various metrics such as tweet creation date, tweet text, retweets, favorites, and language.
😀 The tutorial emphasizes ethical considerations, advising against using crawled data for harmful purposes and promoting its use for research and analysis.
😀 Users are encouraged to analyze their collected data for insights, such as total likes and retweets, or to create visualizations to better understand trends.

Q & A

What is data crawling?
-Data crawling is the process of automatically collecting data from websites or platforms, such as Twitter, using programming languages like Python.
How much data can be collected from Twitter in a day?
-According to the new regulations from Twitter, it is possible to collect up to 5,000 data points per day using an account.
What programming language is used in the example provided for data crawling?
-The example provided in the transcript uses Python for data crawling.
What kind of data can be extracted from Twitter?
-The data extracted includes the tweet creation date, tweet ID, tweet content, number of replies, retweets, likes, and the language of the tweet.
What is an authtoken, and how can it be obtained?
-An authtoken is a key needed for authentication to access Twitter's API. It can be obtained by logging into Twitter, inspecting the cookies in the browser, and copying the authtoken value.
How can users filter the tweets they want to collect?
-Users can filter tweets by using specific keywords, account usernames, and date ranges in their queries.
What should users do if they reach the Twitter rate limit?
-If users reach the Twitter rate limit, they should wait 10 to 15 minutes before making additional requests to avoid being temporarily blocked.
How is the collected data saved in the example?
-The collected data is saved in a CSV file format, which can be easily managed and analyzed using data analysis tools.
What can be done with the collected Twitter data after crawling?
-The collected data can be used for various purposes, such as sentiment analysis, research, or creating data visualizations.
What is the importance of not sharing the authtoken?
-The authtoken should not be shared because it is a sensitive credential that grants access to the Twitter API and can be misused if exposed.