Apache Kafka 101: Schema Registry (2023)

Confluent

23 Nov 202006:45

Summary

TLDRIn this video, Tim Berglund from Confluent introduces the Confluent Schema Registry, a tool designed to manage the evolution of message formats in Kafka topics. As new applications and consumers emerge, and business requirements change, the Schema Registry ensures compatibility and agreement on message schemas. It operates as a standalone server, maintaining a database of schemas and providing APIs for producers and consumers to check compatibility. The tool supports JSON, Avro, and protobuf formats, facilitating schema evolution with minimal runtime failures and promoting collaboration through Interface Description Languages.

Takeaways

📚 The Confluent Schema Registry is a standalone server process that helps manage the evolution of message formats in Kafka topics.
🌐 It operates independently of Kafka brokers, appearing as a producer or consumer within the Kafka cluster.
🗂️ The Schema Registry maintains a database of all schemas written into topics, which is stored in an internal Kafka topic and cached for quick access.
🔄 It supports schema evolution, ensuring compatibility between new and existing message formats as business requirements change.
🛡️ The Registry enforces compatibility rules, preventing the production of messages that would violate these rules and cause runtime failures.
🔧 Producers and consumers interact with the Schema Registry via a REST API to check schema compatibility before producing or consuming messages.
🚫 If a consumer encounters a message with an incompatible schema, the Registry instructs it not to consume the message, avoiding potential errors.
💾 Schemas are assigned immutable IDs, allowing for caching and reducing the need for repeated REST calls, which improves performance.
📈 The Schema Registry currently supports three serialization formats: JSON Schema, Avro, and Protobuf, catering to different serialization needs.
🛠️ It provides tooling and an Interface Description Language (IDL) for developers to define and manage schema changes in a source-controllable manner.
🔄 The process of schema change collaboration is streamlined, often involving a pull request mechanism, ensuring all stakeholders are aware and can discuss changes.
📈 For non-trivial systems, using the Schema Registry is considered essential to manage schema evolution and ensure system reliability.

Q & A

What is Confluent Schema Registry?
-Confluent Schema Registry is a standalone server process that maintains a database of all schemas written into Kafka topics, ensuring compatibility and evolution of message formats.
How does the Schema Registry help with evolving message formats in Kafka?
-The Schema Registry allows producers and consumers to check compatibility of message schemas with previous versions, ensuring that changes adhere to defined compatibility rules and preventing runtime failures due to schema incompatibilities.
What is the role of the Schema Registry in Kafka's ecosystem?
-The Schema Registry acts as an application within the Kafka ecosystem, providing a REST API for producers and consumers to validate schema compatibility and maintain a database of schemas in an internal Kafka topic.
How does the Schema Registry handle schema evolution?
-It provides a mechanism for producers to submit new schemas for validation against compatibility rules and for consumers to reject messages with incompatible schemas, thus managing schema evolution and preventing data incompatibility issues.
What are the benefits of using the Schema Registry for producers?
-Producers can ensure that their messages adhere to the expected schema versions and compatibility rules, preventing the production of incompatible data and potential runtime errors.
How does the Schema Registry assist consumers in processing messages?
-Consumers can use the Schema Registry to verify that the message schema they are about to consume is compatible with the version they expect, avoiding the consumption of incompatible data.
What is the significance of caching in the Schema Registry's operation?
-Caching reduces the need for repeated REST API calls, improving performance by allowing producers and consumers to locally store and quickly access schema information after the initial validation.
What serialization formats does the Schema Registry currently support?
-The Schema Registry supports three serialization formats: JSON, Avro, and protobuf, catering to different serialization needs and preferences.
How does the Schema Registry facilitate collaboration around schema changes?
-By using an Interface Description Language (IDL) like Avro, the Schema Registry enables a centralized approach to schema definition and change management, allowing teams to collaborate through version control systems like pull requests.
What is the importance of the Schema Registry in non-trivial systems?
-In complex systems, the Schema Registry is essential for managing schema evolution, ensuring compatibility across diverse applications and teams, and preventing data serialization issues.
How does the Schema Registry help in detecting breaking changes during the development process?
-It provides tooling that allows developers to check for breaking changes at build time, before deployment, ensuring that schema changes do not introduce runtime incompatibilities.