Behind the Code: Design Choices in OpenTelemetry .NET Metrics API and SDK - Cijo Thomas, Microsoft

CNCF [Cloud Native Computing Foundation]

29 Jun 202419:40

Summary

TLDRSi from Microsoft's observability platform team discusses the design challenges and solutions in implementing the metrics SDK for open elementary.net. The presentation covers the goals of trustworthiness, performance, and zero-allocation in the API, and how the SDK optimizes for high throughput and scalability. Si also shares insights on avoiding attribute sorting costs and maintaining zero-allocation for interprocess communication, highlighting the ongoing journey of performance improvements and the need for community engagement.

Takeaways

📊 The speaker, Si from Microsoft's observability platform team, discusses the design decisions and challenges faced while implementing metrics SDK in open elementary.net.
🔍 The primary goal of the SDK is to aggregate raw measurements into summarized data, such as counting the number of red apples and yellow bananas sold, ensuring predictable output regardless of input size.
🛡️ Trustworthiness is a key principle for the metrics system, meaning metrics should be accurate and not misattribute data, like counting red apples as yellow bananas.
⚡️ Performance is critical for metrics, with expectations of high throughput and low latency, aiming for measurements in the nanosecond range and zero heap allocation in .NET.
🔒 The SDK aims for no contention and scalability with hardware, avoiding issues like thread contention that can degrade performance as CPU cores increase.
📈 The API side of the SDK is designed to accept raw measurements with a variable number of attributes without allocation, using dedicated overloads for up to three attributes and a custom 'tag list' structure for more.
🔑 The SDK's core operation involves two steps: identifying the correct metric point for a raw measurement and performing the aggregation, such as summing up counts for a counter.
🔄 An initial simplistic implementation using a dictionary for aggregation was found to be inefficient under high contention, leading to the need for more advanced optimization strategies.
🔒 To avoid contention, the SDK was optimized to use a read-optimized dictionary where the dictionary only contains reads, and updates are performed in a separate memory location.
🔄 The 'attribute sorting problem' was addressed by caching the first order of attributes used by the user and only sorting when the order changes, avoiding unnecessary sorting costs.
📈 The final performance numbers show significant improvement, with the ability to handle up to 50 million measurements per second on a 16-core machine with minimal latency and zero allocation.

Q & A

What is the primary role of the observability platform team at Microsoft?
-The observability platform team at Microsoft is responsible for working on the development and implementation of metrics SDK in open elementary, focusing on design decisions and challenges faced during the process.
What is the simplified version of the problem statement for implementing open Elementary metrics?
-The simplified problem statement involves taking raw measurements, such as counting the number of fruits sold with various attributes like color, and producing summarized data, such as the number of red apples or yellow bananas sold.
What are the key goals and principles for the metrics system mentioned in the script?
-The key goals and principles include trustworthiness of the metrics, ensuring accurate counting without misclassification; performance, requiring extreme efficiency with minimal latency and no heap allocation; and resource control, maintaining predictable output regardless of input to prevent issues like memory leaks or crashes.
How does the API side of open Elementary handle raw measurements to achieve zero allocation?
-The API side provides dedicated overloads for up to three attributes to avoid allocation, and for more than three attributes, it uses a custom data structure called 'tag list' that is stack-allocated and avoids heap allocation.
What is the core operation of the SDK in terms of handling raw measurements?
-The SDK's core operation involves two steps: first, determining which metric point or time series the incoming measurement should accumulate into, and second, performing the actual aggregation, such as a mathematical sum for counters.
Why did the initial simplistic implementation using a dictionary for the SDK face performance issues?
-The initial implementation faced performance issues due to contention when multiple threads were updating the dictionary concurrently, leading to scalability problems and reduced throughput as the number of CPU cores increased.
How does the SDK optimize the dictionary to avoid contention and improve scalability?
-The SDK optimizes the dictionary by storing metric points in a separate memory location and having the dictionary point to it, turning the dictionary into a read-heavy structure with concurrent reads that do not require locks, thus avoiding contention.
What is the 'attribute sorting problem' and how is it addressed in the script?
-The 'attribute sorting problem' arises when the order of attributes provided by the user affects the aggregation of measurements. The script addresses this by caching the first order used by the user and avoiding sorting unless the order changes, thus optimizing performance.
How does the API ensure zero allocation when handing over measurements to the SDK?
-The API uses a data structure called 'stack-only' or 'intrinsic' that is always allocated on the stack, ensuring no heap allocation. It also uses thread-local storage for temporary data structures to avoid heap allocation during the lookup process.
What are some of the ongoing challenges and future improvements mentioned for the .NET implementation?
-Ongoing challenges include memory reclamation, which is still an experimental feature in .NET, improving the performance of histograms, introducing the notion of bound instruments, and reducing contention by moving towards thread-locked aggregation and harvesting in the background thread or collection cycle.