2024 - Joyce Lin - So You Think You Can Deploy: Antipatterns in Continuous Deployment

DevOps Days Rockies
30 Sept 202405:07

Summary

TLDRIn this insightful discussion, the speaker explores the challenges of continuous deployment in hardware, highlighting lessons learned from notable incidents involving Fisker, Google Nest, Facebook, Rivian, and CrowdStrike. Key takeaways emphasize the importance of considering network constraints, maintaining effective logging and monitoring, and ensuring robust testing practices. The complexities of automation in hardware are discussed, stressing that while it can enhance efficiency and reduce errors, it also introduces risks. Ultimately, the speaker underscores the inevitability of change in both hardware and software, advocating for a balanced approach to deployment strategies.

Takeaways

  • 🚀 Transitioning from software to robotics highlights the unique challenges of continuous deployment in physical devices.
  • 🔌 Continuous deployment in hardware is complicated by factors such as low power, connectivity issues, and bandwidth limitations.
  • 🛠️ Fisker's delay in OTA updates reflects the difficulties car manufacturers face in deploying updates to physical devices.
  • ❄️ Google Nest's deployment disaster emphasizes the importance of considering network constraints during updates.
  • 🌍 Facebook's global outage reveals the necessity for robust logging and monitoring systems in distributed networks.
  • 🔑 Human error can significantly impact deployment success, as seen with Rivan's incorrect security certificate issue.
  • 📋 Comprehensive testing is crucial; testing basics like invalid inputs and regression testing should not be overlooked.
  • 🛡️ Redundancy and fallback mechanisms are essential for maintaining functionality when hardware communication fails.
  • 🦺 Canary testing is a practical approach to identify issues before wider rollouts of updates to devices.
  • ⚖️ A balance must be maintained between automation and control, as automation can introduce complexity and risks.

Q & A

  • What prompted the speaker to reflect on continuous deployment in robotics?

    -The speaker transitioned from software to robotics six months ago and began considering how continuous deployment applies to physical devices in the real world.

  • What issue did MKBHD face when reviewing the Fisker electric car?

    -Fisker requested MKBHD to delay the review until they could send an update via a field tech, but he chose to review the available version instead, resulting in a poor review.

  • What key lesson was learned from the Fisker situation?

    -The experience highlighted that hardware updates can be challenging due to factors like machine use, power availability, and connectivity issues.

  • What disaster occurred with Google Nest updates?

    -Google Nest deployed an update that caused many devices to crash, particularly during winter, as they failed to account for connectivity drops during the update process.

  • What were the implications of Facebook's global infrastructure outage?

    -The outage affected communication between data centers and rendered their logging and monitoring ineffective, highlighting the need for robust systems in distributed environments.

  • What mistake led to Rivian's infotainment system lockup?

    -A fat-finger error in selecting the wrong security certificate during an OTA update, despite extensive pre-deployment testing, caused the issue.

  • What does the speaker emphasize about the role of human factors in technology deployment?

    -The speaker notes that the human aspect is crucial; errors in technology use can significantly impact reputation and effectiveness.

  • What is the main takeaway from the CrowdStrike incident discussed?

    -The CrowdStrike incident illustrated the importance of thorough testing, as missing basic tests led to widespread issues and significant financial damages.

  • How does the speaker suggest addressing the challenges of continuous deployment?

    -The speaker recommends strategies such as small, resumable updates, comprehensive logging, and monitoring, as well as dual boot systems for firmware updates.

  • What trade-off does the speaker mention regarding automation in deployment?

    -The speaker highlights the trade-off between the efficiency of automation and the control it relinquishes, especially in unpredictable environments.

Outlines

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now

Mindmap

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now

Keywords

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now

Highlights

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now

Transcripts

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now
Rate This

5.0 / 5 (0 votes)

Related Tags
Continuous DeploymentTech LessonsHardware ChallengesDeployment DisastersSoftware UpdatesIncident ManagementError PreventionTech IndustryAutomation ControlDistributed Systems