Using Native OpenTelemetry Instrumentation to Make Client Libraries Better - Liudmila Molkova

CNCF [Cloud Native Computing Foundation]

29 Jun 202418:49

Summary

TLDRLila Malova from Microsoft discusses the importance of observability in Azure SDKs for library owners, who often lack visibility into their libraries' post-release behavior. She emphasizes the need for detailed telemetry to diagnose issues efficiently. Malova illustrates how open Telemetry can be leveraged during development, integration testing, and performance testing to optimize libraries and improve user experience. She concludes by highlighting the necessity of embracing network issues and the value of user feedback for refining library instrumentation.

Takeaways

😀 Lila Malova is a new member of the OpenTelemetry technical committee and a maintainer of OpenTelemetry semantic conventions.
🔍 OpenTelemetry is used in Azure SDKs to improve observability, which is typically considered from the user's perspective, but library owners also need observability to understand what happens after their libraries are released.
🤔 Library owners often lack visibility into their libraries' performance and usage post-release due to privacy concerns and the absence of self-collecting telemetry.
🛠️ Developers can act as 'user zero' by collecting and analyzing telemetry during the development and testing of their libraries to gain insights into their performance and identify areas for improvement.
📈 Observability during development time is crucial as developers have the context and control to make meaningful changes and optimizations based on the telemetry data.
🔄 OpenTelemetry can help identify inefficiencies in library operations, such as unnecessary HTTP requests or authentication issues, by analyzing traces and logs.
🧩 Integration testing can be improved with observability, as it helps pinpoint the causes of flakiness and bugs in retry policies and configurations.
🚀 Performance testing benefits from OpenTelemetry by allowing developers to simulate realistic scenarios, including network issues, and to monitor the service under load.
📊 Telemetry data from performance and reliability testing can reveal insights such as excessive buffer allocation, thread pool size misconfiguration, and memory leaks.
🔧 Observability helps in debugging and fixing issues that arise during testing, leading to better performance and reliability of the libraries.
📝 Library owners should be their own 'user zero' to understand their libraries deeply, but also need feedback from actual users to refine the telemetry and ensure it is useful for end users.

Q & A

Who is Lila Malova and what is her role at Microsoft?
-Lila Malova is a new member of the OpenTelemetry technical committee at Microsoft. She is a maintainer of OpenTelemetry semantic conventions.
What is the primary focus of Lila Malova's talk?
-Lila Malova's talk focuses on how OpenTelemetry is used in Azure SDKs to improve their observability and the importance of observability for library owners.
What does Lila Malova suggest about the observability of libraries after they are released?
-Lila Malova suggests that library owners often lack visibility into what happens to their libraries after release. They typically do not collect telemetry for themselves due to privacy concerns and the large volume of data involved.
Why is detailed telemetry important for library owners?
-Detailed telemetry is important for library owners because it helps them understand the issues users face, avoid back-and-forth communication, and collect comprehensive data to reproduce and fix issues efficiently.
How can library developers use telemetry during the development phase?
-Library developers can use telemetry during the development phase to collect feedback, analyze data, and optimize their libraries. They can be the 'users' who decide how to collect and analyze telemetry data.
What is an example of how telemetry can help in identifying issues during the development of a library?
-An example given is the observation of a complex operation downloading multiple layers of an image from a container registry, where repeated 401 errors were detected. This telemetry data allowed developers to identify and optimize the authentication flow.
What role does observability play in integration testing?
-In integration testing, observability helps in debugging tests and identifying bugs in retry policies and configurations. It is crucial for understanding the root cause of test flakiness.
How can performance testing benefit from OpenTelemetry?
-Performance testing can benefit from OpenTelemetry by providing detailed insights into network issues, resource utilization, and system behavior under load. It allows for more realistic testing scenarios and easier identification of performance bottlenecks.
What are some of the performance improvements identified through telemetry in the script?
-Some performance improvements identified include reducing buffer allocation size, optimizing thread pool size, and fixing bugs related to message prefetching, which in one case resulted in a thousandfold reduction in memory usage.
What is the importance of being the 'user zero' for library developers?
-Being 'user zero' allows library developers to gain firsthand experience with their libraries, collect telemetry data, and understand user needs. However, it's also important to gather feedback from 'user one', 'user two', and beyond to correct initial mistakes and improve the library further.
How does OpenTelemetry help in long-term performance and reliability testing?
-OpenTelemetry helps in long-term performance and reliability testing by providing detailed telemetry data over an extended period. This data allows developers to pinpoint issues and understand system behavior under various conditions, including regular network issues.

Outlines

plate

This section is available to paid users only. Please upgrade to access this part.

Mindmap

plate

This section is available to paid users only. Please upgrade to access this part.

Keywords

plate

This section is available to paid users only. Please upgrade to access this part.

Highlights

plate

This section is available to paid users only. Please upgrade to access this part.

Transcripts

plate

This section is available to paid users only. Please upgrade to access this part.

Browse More Related Video

Public Libraries: The Next Level | Rebecca Raven | TEDxBrampton

What is an SDK? (Software Development Kit)

The TRUTH About Golang Backend Frameworks

Clojure in production: what do we use in real-world services?

What makes a great library? ⏲️ 6 Minute English

Good News Rhode Island: Public Libraries, Pt. 1

Rate This

★

★

★

★

★

5.0 / 5 (0 votes)

Related Tags

Open TelemetryAzure SDKsObservabilityPerformanceIntegration TestingDeveloper InsightsLibrary MaintenanceTechnical CommitteeFeedback LoopDebugging Tools