Cara Mendapatkan Data (Crawl) Twitter X - 30 Maret 2024

Helmi Satria

30 Mar 202418:04

Summary

TLDRThis video provides an updated tutorial on how to crawl data from Twitter (or X) using the custom-made tool 'Tweet Harvest'. It explains how to set up the tool via Google Colab, acquire Twitter authentication tokens, and gather data based on specific keywords. The creator also addresses common issues with data collection, including rate limits, and gives tips for overcoming these challenges. The tool is intended for research purposes only, and viewers are advised not to use it for commercial activities. The video concludes by guiding viewers on how to save and process the collected data.

Takeaways

🔍 To crawl Twitter/X data based on a specific search keyword, the tool Tweet Harvest can be used.
🚫 If the previous method for crawling data is not working, updating to the latest version (v2.60) of Tweet Harvest should resolve the issue.
⏳ Due to Twitter's rate limits, you can only retrieve about 500 tweets every 10-15 minutes. Over a full day, this could result in collecting around 2,700 tweets, but manual stopping is recommended to avoid account blocking.
🔑 To use Tweet Harvest, you need to obtain a Twitter OAuth token, which acts like a password. This can be done by inspecting the Twitter web page's cookies after logging in.
⚙️ Tweet Harvest is built using Node.js, and requires certain installations, such as Pandas and Node.js, to work on platforms like Google Colab.
📝 The CSV output file will contain detailed tweet data, including timestamps, tweet content, likes, and usernames, which can be accessed and analyzed further in tools like Excel or Google Sheets.
📊 While the Tweet Harvest tool can gather data for research purposes, it should not be used for business or monetization purposes.
🗂 The keyword, language, and number of tweets to be crawled can be customized, allowing for various types of research (e.g., analyzing tweets related to specific events or people).
📅 Data can be filtered by time range, and you can search tweets from specific users with specific content (e.g., tweets mentioning COVID-19 from a particular public figure).
⛔ If the crawler hits a rate limit, it will pause for 10 minutes before continuing. Errors are expected, but they do not prevent the tool from working.

Q & A

What is Tweet Harvest, and how does it work?
-Tweet Harvest is a tool created by the speaker for crawling data from Twitter (now X). It collects tweets based on specific search keywords. It uses Node.js for its backend and scrolls through Twitter to gather data such as tweet text, likes, replies, and more. The tool outputs the data into a CSV file for further analysis.
Why can't some users get Tweet Harvest to work?
-Users may experience issues with Tweet Harvest if they're using outdated versions of the tool. The speaker recommends using the latest stable version (260 as of March 30) for optimal performance. Additionally, issues could stem from incorrect setup or limits imposed by Twitter’s API.
What are the rate limits imposed by Twitter for data collection using Tweet Harvest?
-Twitter imposes rate limits that restrict the number of tweets that can be collected within a given time frame. The speaker mentions a limit of about 500 tweets every 10-15 minutes. Users need to wait for these limits to reset before continuing to collect data.
How much data can be collected in a day using Tweet Harvest?
-The speaker states that they have been able to collect around 2,700 tweets in a day, but this number is not fixed. It largely depends on how long the user is willing to wait for rate limits to reset, as well as how long the tool is allowed to run.
How do you set up Tweet Harvest for a new project?
-To set up Tweet Harvest, users need to obtain their Twitter auth token, which serves as a kind of password for accessing their account’s API. After obtaining the token, users input their search keyword, desired tweet language, and other parameters into the tool before running it to collect data.
What are some examples of keywords that can be used in Tweet Harvest?
-Users can input any keyword relevant to their research. For example, the speaker mentions collecting tweets about Indonesian President Jokowi by setting the keyword as 'Jokowi' and setting the language to Indonesian ('lang: id'). Other keywords could include topics like elections, COVID, or political figures.
What kind of data does Tweet Harvest collect?
-Tweet Harvest collects a variety of data from Twitter, including the tweet text, number of likes, replies, retweets, the time of creation, whether an image is attached, the username of the person who tweeted, and the tweet's URL. All this data is stored in a CSV file for further processing.
How does the tool handle rate limits when collecting a large number of tweets?
-Tweet Harvest automatically stops collecting tweets once it hits Twitter’s rate limit (around 500 tweets per 10-15 minutes). After the rate limit is reached, the tool waits for 10 minutes before continuing to scroll and collect more data.
Can the tool be used for purposes other than research?
-The speaker explicitly advises against using Tweet Harvest for commercial purposes or any activities beyond research, such as thesis work or dissertation. The tool was built specifically to aid academic research and should not be used for business or profit-driven activities.
How can users further process the collected Twitter data?
-After collecting the data in a CSV file, users can process it further using tools like Excel, Google Sheets, or even programming languages like Python. They can perform sentiment analysis, social network analysis, or other forms of data analysis based on their needs.