Cara mendapatkan & crawl data twitter - Juli 2023
Summary
TLDRThe video explains how recent Twitter rate limits and login restrictions introduced by Elon Musk have broken many existing Twitter scraping tools and scripts. The creator shares a new method for collecting Twitter data using a custom open-source tool called Tweet Harvest through Google Colab. He walks viewers through installing the tool, retrieving a Twitter auth token from browser cookies, and running searches to scrape tweets based on keywords, likes, language, and date filters. Throughout the tutorial, he emphasizes responsible use of the tool for educational purposes such as research and academic projects, warning viewers not to misuse it for large-scale or unethical data scraping.
Takeaways
- 😀 Twitter has implemented rate limits, restricting how many tweets users can read per day depending on account type.
- 😀 Unverified accounts can only read 600 tweets/day, new accounts 300 tweets/day, while verified accounts have higher limits.
- 😀 Searching Twitter without logging in is no longer possible, breaking previous data scraping methods.
- 😀 Existing tools for crawling Twitter data are largely non-functional due to these new limitations.
- 😀 The creator developed a new tool called 'Tweet Harvest', which is open source and hosted on GitHub.
- 😀 Users must log in and obtain a Twitter OAuth token from browser cookies to use the tool.
- 😀 Tweet Harvest saves collected tweets into CSV files, capturing details like username, tweet content, likes, retweets, quotes, URL, and language.
- 😀 The tool supports customization for keywords, language, date ranges, minimum likes, and file naming.
- 😀 There are built-in safeguards such as limits per batch and short delays to prevent hitting Twitter’s rate limits.
- 😀 The tool is intended strictly for research, academic projects, or learning purposes and should not be used for mass scraping.
- 😀 Users are warned to keep their OAuth token confidential as it functions like a sensitive password.
- 😀 Instructions and scripts are available via Google Colab, with blog posts providing detailed guidance.
Q & A
What recent changes did Elon Musk implement on Twitter that affect how users access tweets?
-Elon Musk introduced rate limits on Twitter. Verified accounts can see up to 10,000 tweets per day, unverified accounts up to 1,000 tweets per day, and new accounts only 500 tweets per day. Additionally, users must now be logged in to access Twitter's search functionality.
Why are previous Twitter data scraping tools no longer fully functional?
-Previous tools fail because Twitter now requires login to access the search page, and the rate limits restrict how many tweets can be read per day. This breaks many scripts that relied on unlimited, unauthenticated access.
What is the purpose of the 'Tweet Harvest' script mentioned in the transcript?
-'Tweet Harvest' is a Google Colab script designed to crawl Twitter data safely for educational or research purposes. It allows users to collect tweets based on keywords, language, date range, and minimum likes, while respecting Twitter's new rate limits.
How does a user obtain the required Twitter OAuth token for the script?
-Users log into Twitter, navigate to the search page for their keyword, open the browser's developer tools, go to the Application tab, locate the cookies for twitter.com, and copy the value of the 'OAuth token'. This token acts as a credential to authorize the data collection.
What are the main components of the CSV output generated by the script?
-The CSV file includes tweet text, username, tweet URL, date posted, number of likes, number of retweets, and number of quotes. This allows for structured analysis of the collected Twitter data.
What ethical considerations are emphasized for using the 'Tweet Harvest' tool?
-The tool is intended only for learning, research, or thesis purposes. Users are explicitly warned not to misuse it for mass scraping or any extreme-level data collection. OAuth tokens should remain private to prevent unauthorized access.
How does the script manage the rate limits when collecting tweets?
-The script respects Twitter's rate limits by controlling the number of tweets it collects per run. For example, after collecting a set number of tweets (e.g., 100), it pauses briefly (e.g., 10 seconds) before continuing, ensuring compliance with Twitter's restrictions.
Can the script collect tweets in languages other than English?
-Yes, users can specify the language parameter in the script. For instance, setting it to 'id' allows the script to collect tweets in Indonesian.
What are the key steps to run the 'Tweet Harvest' script in Google Colab?
-The steps include: visiting the blog for the script link, opening the Google Colab notebook, entering the OAuth token, specifying keywords, language, date range, and minimum likes, then running 'Runtime > Run all' to execute the data collection and generate the CSV file.
How does the script allow users to customize their Twitter data collection?
-Users can set parameters such as the keyword to search for, language, date range for tweets, minimum number of likes, and output CSV filename. This flexibility enables tailored data collection for specific research or learning objectives.
Why is it important not to share the OAuth token obtained for the script?
-The OAuth token functions like a personal credential or password. If someone else accesses it, they could impersonate the user on Twitter, potentially violating privacy and security. Therefore, it must be kept confidential.
What is the main educational benefit of using the 'Tweet Harvest' tool?
-The tool helps students, researchers, and learners practice data collection, analysis, and research skills using real-world social media data, while adhering to ethical guidelines and platform limits.
Outlines

This section is available to paid users only. Please upgrade to access this part.
Upgrade NowMindmap

This section is available to paid users only. Please upgrade to access this part.
Upgrade NowKeywords

This section is available to paid users only. Please upgrade to access this part.
Upgrade NowHighlights

This section is available to paid users only. Please upgrade to access this part.
Upgrade NowTranscripts

This section is available to paid users only. Please upgrade to access this part.
Upgrade NowBrowse More Related Video

DEF CON 24 - Hunter Scott - RT to Win: 50 lines of Python made me the luckiest guy on Twitter

Twitter OSiNT (Ethical Hacking)

How I get Tweet data for FREE in 2024 as a data scientist

Elon Musk Is An Idiot (and so are Zuck and SBF)

The Rise and Rule of Elon Musk

VPN INSIDER'S MYTH-BUSTING! Separating Fact from Fiction: How VPNs Really Work
5.0 / 5 (0 votes)