System Design: How to design Twitter? Interview question at Facebook, Google, Microsoft

Success in Tech

24 Sept 201726:35

Summary

TLDRThis video script discusses a system design approach to creating a platform like Twitter. It emphasizes the importance of clarifying the problem statement and focusing on core features such as tweeting, timelines, and following. The speaker explains the limitations of a naive relational database solution and introduces Twitter's use of in-memory databases like Redis for fast read access and eventual consistency. The script explores the architecture's trade-offs, including the high memory usage for performance and the challenges of handling tweets from users with millions of followers. It concludes with potential follow-up topics such as search functionality, push notifications, and advertising integration.

Takeaways

🤔 When designing a system like Twitter, clarify the problem statement and focus on 2-3 core features for detailed design rather than attempting to cover everything.
📝 Core features of Twitter include tweeting, timelines (user timeline and home timeline), and the following mechanism.
🚫 A naive approach using relational databases like MySQL for tweets and users can lead to performance issues due to large SELECT statements required for home timeline generation.
💡 Twitter uses an in-memory database like Redis to store pre-computed timelines for fast read access, prioritizing availability over strict consistency (eventual consistency).
🔄 The 'fan-out' mechanism is employed by Twitter to distribute a new tweet to all followers' timelines, updating them in real-time in the Redis cluster.
🔢 Redis replication is used to ensure that each user's timeline is stored on multiple machines to enhance availability and fault tolerance.
👥 The architecture must handle the computational load of updating millions of timelines when a tweet is made by a celebrity with a large following.
📱 For user access, a load balancer directs requests to the fastest available Redis machine that has the user's pre-computed timeline in memory.
🔑 A hash lookup is used to quickly determine which Redis machines store a particular user's timeline.
🔍 Additional system features to consider include search functionality, push notifications, and advertisement placements based on user analytics.
🔗 The video provides a link to a talk by Twitter's VP of Engineering for further insights into Twitter's architecture and solutions to scaling challenges.

Q & A

What is the primary focus of the video script?
-The primary focus of the video script is to discuss the system design of Twitter, specifically focusing on core features such as tweeting, timelines, and the following mechanism.
Why is it important to clarify the problem statement when designing a system like Twitter?
-Clarifying the problem statement is important because it helps to identify the core features that the design will cover and prevents the designer from running in one direction without a clear focus, which is crucial given the broad nature of the question.
What are the two types of timelines mentioned in the script?
-The two types of timelines mentioned are the 'user timeline', which contains a user's own tweets and retweets, and the 'home timeline', which contains tweets from people the user follows.
Why is a relational database like MySQL considered a naive solution for Twitter's system design?
-A relational database like MySQL is considered naive because it would require performing large SELECT statements to fetch and merge tweets from users a person follows, which is inefficient and not scalable as the database grows.
What is the concept of 'fan-out' as mentioned in the script?
-'Fan-out' is a concept where Twitter takes a user's tweet and pre-computes it into the timelines of all the users following the original tweeter, storing these in an in-memory database for quick access.
Why does Twitter use Redis as an in-memory database for storing timelines?
-Twitter uses Redis because it is a fast, in-memory data structure store, which allows for quick read access to the timelines, meeting the requirement for fast reads and high availability.
How does Twitter handle the issue of updating millions of followers' timelines when a celebrity tweets?
-Twitter handles this by incorporating an SQL approach for very famous users with millions of followers, where their tweets are not pre-computed but merged during load time to avoid massive computational loads.
What is the significance of eventual consistency in the context of Twitter's system design?
-Eventual consistency is significant because it prioritizes availability over strict consistency, meaning it's acceptable for some users to see a tweet slightly later than others, as long as the system remains accessible.
How does the architecture ensure that Bob's home timeline is quickly accessible when he accesses Twitter?
-The architecture ensures quick access by pre-computing and storing Bob's home timeline in multiple Redis machines in memory, with replication for availability, and using a load balancer to direct the request to the fastest responding Redis machine.
What are some additional features or considerations that could be discussed in the context of Twitter's system design?
-Additional features or considerations include search functionality, push notifications, and advertisement placement, which are all important aspects of the Twitter platform that would require their own system design considerations.