How Notion Scaled to 100 Million Users Without Their Database Exploding
Summary
TLDRIn 2021, Notion faced severe database performance issues due to their monolithic Postgres setup reaching its limits. They resolved this by sharding their data using workspace IDs, distributing load across multiple databases, and implementing PgBouncer for connection pooling. Later, they expanded the sharding system to 96 databases to further balance load and handle future growth. Notion's migration process included dark reads for testing, logical replication for syncing data, and a meticulously planned failover to the new setup, resulting in improved performance, increased capacity, and future-proofing their architecture.
Takeaways
- 🚀 Notion's popularity surged in 2021, leading to severe performance issues as their Postgres database struggled with massive data volume.
- 🧱 Notion’s core data model involves 'blocks' for all content, creating complex tree-like structures that resulted in high database load due to numerous rows.
- 🐢 The monolithic Postgres setup caused delays, with the VACUUM process unable to keep up, leading to bloated tables and performance degradation.
- 💥 A major threat was transaction ID wraparound, which could put the database into a read-only state, risking data integrity and user access.
- 🔨 To address this, Notion implemented horizontal scaling, sharding tables by workspace ID to better distribute load across multiple databases.
- 📊 Notion’s initial setup involved 32 physical database instances, each containing 15 logical shards, for a total of 480 shards, with PgBouncer for connection pooling.
- ⚙️ During data migration, Notion used audit logs and a catch-up script to sync data to the new sharded setup, carefully managing the process to avoid data inconsistencies.
- 💡 As their user base grew, Notion faced new challenges with their 32-shard setup and further expanded to 96 databases, redistributing load and reducing hardware strain.
- 🔗 They implemented logical replication to keep data in sync between the old and new setups, optimizing for a faster transition by delaying index creation.
- 👍 The final architecture provided increased capacity, reduced CPU and IOPS usage, and positioned Notion to handle future growth more effectively, with an average utilization drop to around 20% during peak times.
Q & A
What caused Notion's performance issues in 2021?
-Notion's Postgres database became unbearably slow due to its massive size, reaching terabytes of data volume. The sheer number of blocks stored in Postgres, combined with their unique block-based data model, created performance bottlenecks, especially with the VACUUM process.
What is a 'block' in Notion's data model, and why is it important?
-In Notion, everything is a 'block,' which can represent text, images, or entire pages. Each block is stored as a row in Postgres with a unique ID, and blocks can be nested to create complex tree-like structures. This flexibility allows Notion to handle various types of content, but it also leads to massive data generation, impacting performance.
What issue did Notion encounter with Postgres VACUUM, and why is it important?
-The VACUUM process in Postgres reclaims storage from deleted or outdated data (dead tuples). Notion's VACUUM process was stalling, which led to bloated tables and degraded database performance. Without effective VACUUMing, the database struggled to handle data efficiently.
What is transaction ID wraparound in Postgres, and why was it a threat to Notion?
-Postgres assigns each transaction a unique ID. These IDs are finite, and if they are exhausted, the database enters a read-only mode to prevent data loss. For Notion, this could have been disastrous because it would mean users wouldn't be able to edit or delete content.
What solution did Notion first consider to address their performance issues, and why didn't it work?
-Notion initially considered scaling up their Postgres instance, akin to adding more hardware resources. However, there are physical limits to how much you can scale a single machine, and costs rise significantly beyond a certain point, making this approach inefficient.
How did Notion implement horizontal scaling to solve their performance issues?
-Notion decided to shard their data horizontally. They partitioned their tables (such as blocks, workspaces, and comments) across multiple databases using the workspace ID as the partition key. This distributed the data load across 32 physical database instances, each with 15 logical shards, for a total of 480 shards.
What role did PgBouncer play in Notion's database architecture?
-PgBouncer acted as a lightweight connection pooler between Notion's application and the database. It helped reduce the overhead of creating and tearing down database connections by maintaining a pool of active connections. This optimized performance by reusing existing connections.
How did Notion handle the migration of old data to their new sharded database architecture?
-Notion considered several options for migrating old data, including logical replication and audit logs. They eventually chose to create an audit log when saving to the old database and used a catch-up script to sync data with the new sharded system. The data migration process took three days and involved comparing records to avoid overwriting more recent data.
What challenges did Notion face with their sharding setup by late 2022?
-By late 2022, some of Notion's database shards were experiencing high CPU utilization (over 90%) and nearing the limits of their provisioned disk bandwidth (IOPS). Additionally, they were reaching connection limits in their PgBouncer setup, and they anticipated even higher traffic with the upcoming New Year.
How did Notion resolve the issues with their 32-shard setup?
-To address the performance bottlenecks, Notion expanded from 32 to 96 databases, each managing fewer logical shards. They also optimized their PgBouncer setup by creating multiple clusters to prevent overloading the system, ensuring more manageable connection counts and load distribution.
Outlines
هذا القسم متوفر فقط للمشتركين. يرجى الترقية للوصول إلى هذه الميزة.
قم بالترقية الآنMindmap
هذا القسم متوفر فقط للمشتركين. يرجى الترقية للوصول إلى هذه الميزة.
قم بالترقية الآنKeywords
هذا القسم متوفر فقط للمشتركين. يرجى الترقية للوصول إلى هذه الميزة.
قم بالترقية الآنHighlights
هذا القسم متوفر فقط للمشتركين. يرجى الترقية للوصول إلى هذه الميزة.
قم بالترقية الآنTranscripts
هذا القسم متوفر فقط للمشتركين. يرجى الترقية للوصول إلى هذه الميزة.
قم بالترقية الآنتصفح المزيد من مقاطع الفيديو ذات الصلة
Intro to Replication - Systems Design "Need to Knows" | Systems Design 0 to 1 with Ex-Google SWE
What is DATABASE SHARDING?
Service discovery and heartbeats in micro-services 👍📈
How indexes work in Distributed Databases, their trade-offs, and challenges
20 System Design Concepts Explained in 10 Minutes
The Problem With UUIDs
5.0 / 5 (0 votes)