Migrating a Cloud to OVN without outage
Summary
TLDRJake from Arc, an Australian research club, shares insights on their cloud infrastructure migration experience, focusing on the complexities of switching network drivers within a federated cloud environment. He discusses challenges such as database synchronization, managing multiple drivers, and troubleshooting migration failures. Through custom solutions like shim drivers and automation scripts, Arc minimized downtime and streamlined the process. Despite unforeseen issues like users hardcoding IP addresses, the migration was ultimately successful, offering valuable lessons on scalability, testing, and post-migration monitoring, shaping future strategies for cloud infrastructure changes.
Takeaways
- 😀 Arc is a research club in Australia, founded in 2012, focused on providing cloud-based services for research institutions.
- 😀 The club operates a federated cloud, where each research institution manages its own computing and storage resources, while centralizing APIs and other services.
- 😀 The cloud system uses Neutron for networking, with a focus on different network types and mechanisms, and does not support mixing multiple network types within one instance.
- 😀 Migration from older software and technologies, such as SDN (Software-Defined Networking) to newer drivers, has been a key challenge, involving database synchronization and careful management of resources.
- 😀 Migration requires setting Neutron into 'read-only' mode during the transfer process to prevent new resources from being created during migration.
- 😀 A key challenge in migration is ensuring that all virtual machine (VM) instances are bound correctly to the new network driver without affecting service continuity.
- 😀 The migration process involved developing custom scripts and tools to help users transition their projects and networks smoothly, minimizing downtime.
- 😀 One significant issue encountered during migration was the hardcoding of IP addresses by some users, which caused problems after the network transition.
- 😀 A shim driver was introduced to handle conditions where a user’s project had to be switched between different network drivers, based on project tags, to ensure continued functionality.
- 😀 The migration process took longer than expected, with a planned 3-month timeline stretching to 6 months, highlighting the complexity of handling large-scale cloud migrations in a federated environment.
- 😀 Future plans include migrating to a more unified driver solution as older network technologies (like SDN) phase out, to simplify management and enhance scalability.
- 😀 The presentation highlighted lessons learned, such as the importance of proper testing and avoiding manual errors, with a suggestion that big-bang migrations might not always be feasible due to unforeseen issues.
Q & A
What is the primary purpose of Arc, the research club from Australia?
-Arc is a research club that provides cloud computing and storage resources to various research institutions. Its goal is to support collaborative research by offering a federated cloud infrastructure.
What significant milestone did Arc reach in 2014?
-In 2014, Arc began using a software-defined networking (SDN) solution, though it later had to migrate to a new driver after the initial provider was acquired and discontinued.
What is the federated cloud model used by Arc?
-Arc's cloud is federated, meaning each research institution controls its own compute and storage resources, while a central API handles resource requests and directs them to the appropriate site.
What were the challenges faced during the migration process?
-Challenges included difficulties syncing databases, dealing with multiple network drivers, and the inability to test production-like conditions fully. Additionally, user issues such as hardcoded IP addresses caused disruptions.
Why was a 'Big Bang' migration approach initially considered, and what was the result?
-The 'Big Bang' approach was considered as a way to migrate everything at once to minimize effort. However, it ended up taking longer than expected, with the process stretching over six months instead of the anticipated three months.
What role did Neutron play in the migration process?
-Neutron, a core component in OpenStack, was used to manage the network configuration. During migration, Neutron was temporarily set to 'read-only' mode to prevent new resource creation while transitioning to the new driver.
What problem did the users face with hardcoded IP addresses after the migration?
-Some users had hardcoded IP addresses in their configurations, which became inaccessible after the migration, as their systems were expecting to use the old network configuration.
How did Arc handle the multiple driver issue during the migration?
-Arc developed a custom shim driver that allowed for conditional use of either the legacy or new driver depending on the project tag, enabling a smoother transition without disrupting users' workflows.
What was the strategy used to prevent migration failures during the transition?
-Arc implemented a monitoring system that allowed them to pause the migration process if issues were detected, providing an opportunity to troubleshoot before proceeding further.
What is the future direction for Arc regarding migration and cloud infrastructure?
-Arc plans to continue migrating to a new driver infrastructure, with a focus on scaling the system effectively. They are also considering future migrations to ensure long-term support and stability of their cloud network.
Outlines
This section is available to paid users only. Please upgrade to access this part.
Upgrade NowMindmap
This section is available to paid users only. Please upgrade to access this part.
Upgrade NowKeywords
This section is available to paid users only. Please upgrade to access this part.
Upgrade NowHighlights
This section is available to paid users only. Please upgrade to access this part.
Upgrade NowTranscripts
This section is available to paid users only. Please upgrade to access this part.
Upgrade NowBrowse More Related Video
5.0 / 5 (0 votes)