Microservices Gone Wrong at DoorDash
Summary
TLDRThe video discusses DoorDash's transition from a monolithic architecture to microservices, highlighting the scalability issues and challenges they faced. It covers problems like cascading failures, retry storms, and death spirals, and how they were mitigated using strategies like load shedding, circuit breakers, and predictive autoscaling.
Takeaways
- 🚀 DoorDash transitioned from a monolithic architecture to a microservices architecture to handle increased traffic and business growth.
- 🌐 The shift to microservices allowed DoorDash to use different programming languages for different services, enhancing performance and scalability.
- 👨💻 Microservices enable separate teams to own and deploy services independently, improving the manageability of the codebase.
- 📈 In 2020, DoorDash experienced a significant traffic spike due to the pandemic, highlighting the need for scalable infrastructure.
- 🔍 The company faced challenges such as cascading failures, retry storms, and death spirals, which are common issues in microservices architecture.
- 🛑 Cascading failures occur when a single component failure propagates through the system, causing widespread issues.
- 🔁 Retry storms happen when services overwhelmed by requests implement retry logic, unintentionally increasing the load and exacerbating the problem.
- 🔄 Death spirals are autoscaling issues where failed nodes are replaced, but the new nodes take time to become operational, leading to a feedback loop of failure.
- 🛠️ DoorDash implemented countermeasures like load shedding, circuit breakers, and predictive autoscaling to mitigate the issues faced.
- ⏲️ Predictive autoscaling anticipates traffic patterns to scale services proactively, reducing the risk of overload and system failure.
Q & A
Why did DoorDash switch from a monolith architecture to a microservices architecture?
-DoorDash switched to a microservices architecture to handle increased traffic and company growth, especially after experiencing a significant traffic spike at the beginning of the pandemic in 2020.
What was DoorDash's initial backend language, and why was it considered interesting to have reached their scale with it?
-DoorDash initially used Python as their backend language. It was considered interesting because Python is often dismissed as a backend language, but DoorDash managed to scale their service significantly with it.
What benefits did DoorDash expect to gain by adopting microservices?
-By adopting microservices, DoorDash expected to gain benefits such as the ability to write different services in different languages, better scalability of the codebase due to features like static typing, and the ability for separate teams to own and deploy services independently.
What is a cascading failure in the context of microservices?
-A cascading failure occurs when an issue in one component of a microservices architecture propagates back through the system, causing a chain reaction of failures. For example, if a database layer experiences high latency, it can cause timeouts in subsequent services that depend on it.
How does a retry storm occur in a microservices architecture?
-A retry storm occurs when a service is overloaded and begins to fail requests. If retry logic is implemented, the service will attempt to resend the failed requests multiple times, unintentionally increasing the load on the service and exacerbating the issue.
What is a death spiral in the context of microservices?
-A death spiral is a negative feedback loop that occurs when autoscaling services fail and are replaced, but the replacements take time to become operational, causing an increase in load on the remaining services until they too fail, leading to a continuous cycle of failure.
What is a metastable failure and how can it occur?
-A metastable failure is a state where a system has become unstable and continues to fail even after the initial cause of the failure has been resolved. This can occur due to various failure types, such as retry storms, and requires manual intervention to restore the system to a stable state.
What countermeasure did DoorDash implement to deal with overloads called load shedding?
-Load shedding is a countermeasure where a service experiencing increased load will intentionally drop certain less important requests to maintain performance for more critical ones, based on metrics like CPU utilization.
How do circuit breakers function as a countermeasure in microservices?
-Circuit breakers function by stopping certain requests to a downstream service when errors are detected, similar to load shedding but based on error rates. This prevents further strain on the system and allows it to recover.
What is predictive autoscaling and how did DoorDash use it?
-Predictive autoscaling is a method of scaling services based on anticipated traffic patterns rather than reactive increases in load. DoorDash used it by programming their system to scale services up or down based on predictable patterns like daytime and nighttime usage.
What challenges does a microservices architecture present in terms of debugging?
-A microservices architecture presents challenges in debugging due to the complexity of tracing requests through multiple services, which can involve significant latency and interdependencies. Tools like distributed tracing are necessary to effectively debug these systems.
Outlines
Esta sección está disponible solo para usuarios con suscripción. Por favor, mejora tu plan para acceder a esta parte.
Mejorar ahoraMindmap
Esta sección está disponible solo para usuarios con suscripción. Por favor, mejora tu plan para acceder a esta parte.
Mejorar ahoraKeywords
Esta sección está disponible solo para usuarios con suscripción. Por favor, mejora tu plan para acceder a esta parte.
Mejorar ahoraHighlights
Esta sección está disponible solo para usuarios con suscripción. Por favor, mejora tu plan para acceder a esta parte.
Mejorar ahoraTranscripts
Esta sección está disponible solo para usuarios con suscripción. Por favor, mejora tu plan para acceder a esta parte.
Mejorar ahoraVer Más Videos Relacionados
Собеседование о микросервисах | CQRS, Event Sourcing | Circuit Breaker, Retry | Мониторинг | Jetbulb
Microservices explained - the What, Why and How?
Capgemini Java Interview 2024 | Java | Spring Boot | Microservices | Database
Advantages of adopting a microservices-based architecture
DevOps Interview For Experience : First Round Selected
Top 5 Most Used Architecture Patterns
5.0 / 5 (0 votes)