30: LinkedIn Mutual Connection Search | Systems Design Interview Questions With Ex-Google SWE
Summary
TLDRThis video explores optimizing LinkedIn's mutual connection search feature. The host discusses the problem of searching mutual connections based on criteria like education or employer, introduces graph databases and their limitations, and proposes a solution involving caching mutual connections for each user. The video delves into database sharding, denormalization, and batch updates for efficiency, highlighting the trade-offs between read and write speeds.
Takeaways
- 🎥 The video discusses strategies for searching mutual connections on LinkedIn.
- 🌐 The presenter was inspired by another channel, 'Systems Design Fight Club', and aims to provide additional details on the subject.
- 🔍 The focus is on finding mutual connections based on specific criteria such as shared schools or current jobs.
- 📈 The video provides capacity estimates assuming a billion users on LinkedIn, with an average of 500 connections per user.
- 💾 It's estimated that each user has about 25,000 mutual connections on average.
- 📊 Discusses the concept of using graph databases for this type of problem, comparing non-native and native graph databases.
- 📚 The video introduces the idea of 'caching every single result for all users' to optimize read speed.
- 🗄️ Explains the process of updating mutual connections in the database when a new connection is made.
- 🔧 Talks about the challenges of partitioning the LinkedIn social graph due to its highly interconnected nature.
- 🛠️ Concludes with a strategy that involves batching profile updates to mutual connections databases to reduce the number of writes.
Q & A
What is the main topic of the video?
-The main topic of the video is about searching for mutual connections on LinkedIn and the technical challenges and solutions associated with implementing such a feature.
What is a 'buzzer beater' in the context of the video?
-A 'buzzer beater' in the context of the video refers to the host's hope to finish the video recording before it gets dark, like a last-second successful play in a game right before the buzzer ends the match.
What inspired the creation of this video?
-The video was inspired by another channel called 'Systems Design Fight Club', where the host felt that a problem discussed was missing some details and wanted to expand upon it.
What is the problem requirement that the video aims to address?
-The problem requirement is to enable searching for mutual connections on LinkedIn, specifically focusing on connections that are not directly connected to the user but share a common connection.
Why does the host decide not to implement a full text search?
-The host decides not to implement a full text search to keep the problem simple. Instead, the focus is on searching by hardcoded strings related to education or current employer.
What capacity estimates does the host make for LinkedIn?
-The host estimates that LinkedIn has a billion users, with each user having an average of 500 connections, leading to approximately 25,000 mutual connections per user on average.
Why is the concept of graph databases relevant to the problem?
-Graph databases are relevant because the problem involves identifying second-degree connections (mutual connections) which naturally aligns with graph traversal methods like depth-first search.
What are the challenges associated with using graph databases for this problem?
-Challenges include the slow handling of large datasets due to binary search in non-native graph databases, the difficulty of partitioning a highly interconnected social graph, and the overhead of maintaining consistency across partitions.
What alternative solution is proposed instead of using a graph database?
-The alternative solution proposed is to cache every single result for all users' mutual connections, which involves pre-organizing all mutual connections for each user and storing them in a database with a schema optimized for read speed.
How does the host suggest handling profile updates to optimize for performance?
-The host suggests batching profile updates and caching them in memory, then using a daily batch job to update the mutual connection databases, rather than fanning out updates in real-time which would lead to excessive writes.
What is the role of Kafka in the proposed system?
-Kafka is used as a message broker to handle new connection messages, ensuring they are replayable and can be asynchronously processed without requiring immediate database writes, thus enhancing fault tolerance and scalability.
Outlines

This section is available to paid users only. Please upgrade to access this part.
Upgrade NowMindmap

This section is available to paid users only. Please upgrade to access this part.
Upgrade NowKeywords

This section is available to paid users only. Please upgrade to access this part.
Upgrade NowHighlights

This section is available to paid users only. Please upgrade to access this part.
Upgrade NowTranscripts

This section is available to paid users only. Please upgrade to access this part.
Upgrade NowBrowse More Related Video

How to make linkedin profile better | Referrals | Profile Building | Jobs search [2023]

LinkedIn job search hacks and tips | how to apply jobs in 2024 | internships | shashwat tiwari

How to get a Software engineer job in 2024

How to Search Resumes for Free on Linkedin by Using Boolean (Hindi)

How To Get B2B Leads & Clients On LinkedIn - Module 1 - Lesson 3 - LinkedIn Unlocked

TIKTOK ALGORITHM EXPLAINED FOR 2024 (How To Grow on TikTok FAST in 2024)
5.0 / 5 (0 votes)