Time Matters: Choosing Between Delta and Cumulative Metric Temporalities for Effective... Tom Braack

CNCF [Cloud Native Computing Foundation]
29 Jun 202410:56

Summary

TLDRIn this informative session, the speaker explores the concept of temporality in monitoring systems, comparing the use of cumulative and delta metrics. They discuss the memory usage, sample loss, and periodicity implications of each, highlighting the importance of choosing the right temporality for backend compatibility and application needs. The talk also covers the challenges and tools for converting between the two, emphasizing the benefits of using cumulative metrics for stability and resilience against network issues.

Takeaways

  • 📈 The speaker introduces the topic of 'temporality' in the context of time series data and metrics, discussing two methods of transmission: absolute values (cumulative) and changes (delta).
  • 🔍 A survey is conducted to gauge the audience's familiarity with open Telemetry and temporality decisions in metric data handling.
  • 📊 An example graph is presented to illustrate the difference between cumulative and delta temporality, emphasizing their mathematical equivalence and practical differences.
  • 💡 The memory usage implications of each method are discussed, with delta requiring less memory due to its stateless nature, suitable for short-lived processes like serverless functions.
  • 🔄 The impact of sample loss on each temporality is explored, noting that cumulative data remains formally correct despite lost data points, while delta can lead to overcounting.
  • ⚠️ The importance of considering periodicity in monitoring systems is highlighted, as delta temporality may not send data points when there is no change, unlike cumulative.
  • 🔧 The open Telemetry collector's processors for converting between cumulative and delta are mentioned, with a note on their stateful nature and the operational complexity they introduce.
  • 🛠️ The speaker advises against converting temporality unless necessary, due to the added complexity and potential issues with out-of-order data samples.
  • 🔧 A tip is provided for setting the open Telemetry SDKs to emit data in either delta or cumulative format using an environment variable.
  • 📝 The recommendation to default to cumulative unless there is a specific reason to use delta is given, citing cumulative's resilience to network issues.
  • 🌐 An announcement is made about Graal Labs adding support for datadog receiver on the open Telemetry collector, which will handle Delta metrics and may necessitate conversion.
  • 🎉 The speaker concludes with an invitation to a Graal happy hour event in Seattle, providing a QR code for interested attendees.

Q & A

  • What is the main topic of the presentation?

    -The main topic of the presentation is temporality in the context of monitoring and telemetry data, specifically discussing the differences between using delta and cumulative values for time series data.

  • What is the purpose of the initial survey conducted by the speaker?

    -The initial survey is conducted to gauge the audience's familiarity with open Telemetry, PR (Prometheus) compatible backends, and their experience with temporality decisions in telemetry data.

  • What is the difference between sending absolute values and changes for time series data over the Internet?

    -Sending absolute values means transmitting the counter values at specific intervals, while sending changes involves transmitting the increments or decrements that occur between those intervals. Both are mathematically equivalent but behave differently in edge cases.

  • What does the speaker mean by 'State' in the context of SDKs emitting metrics?

    -In this context, 'State' refers to the information that an SDK needs to remember when emitting metrics. For delta, the SDK only needs to remember measurements since the last emission, while for cumulative, it must remember all measurements since the process started.

  • Why might reduced memory usage be beneficial for certain applications?

    -Reduced memory usage is beneficial for applications with limited resources, such as short-lived serverless functions, where maintaining state for a long period is not feasible.

  • What is the impact of sample loss on cumulative and delta temporality?

    -Sample loss means that a measurement does not make it over the network. For cumulative, the next point will correct the series, maintaining formal correctness. For delta, losing a sample can result in overcounting, as subsequent changes are added without the lost value.

  • What is the issue with using delta temporality for monitoring systems that rely on periodic activity?

    -For delta, if there is no activity within a certain period, a zero value will not be sent because the information about that period is lost. This can cause issues for monitoring systems that require consistent periodic data.

  • How can the open telemetry collector convert between cumulative and delta temporality?

    -The open telemetry collector has processors for converting between cumulative to delta and delta to cumulative temporality. However, these processors are stateful and require careful configuration to ensure accurate conversion.

  • What are the operational complexities introduced by converting between temporalities?

    -Converting between temporalities adds complexity to the monitoring pipeline, as it requires stateful processors, potentially additional layers of collectors for load balancing, and handling out-of-order samples, especially for delta to cumulative conversion.

  • What advice does the speaker give regarding the use of temporality in telemetry?

    -The speaker advises to avoid converting between temporalities if possible. If the backend expects delta, use delta throughout the telemetry pipeline; if it expects cumulative, use cumulative. The choice should be based on the backend requirements or specific application needs.

  • What is Graal Labs doing to support the open telemetry ecosystem?

    -Graal Labs is adding initial support to the datadog receiver on the open telemetry collector, allowing the use of the open telemetry ecosystem with datadog, which typically uses delta metrics. This feature will be upstreamed in the coming weeks.

Outlines

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now

Mindmap

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now

Keywords

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now

Highlights

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now

Transcripts

plate

This section is available to paid users only. Please upgrade to access this part.

Upgrade Now
Rate This

5.0 / 5 (0 votes)

Related Tags
TemporalityMetricsOpenTelemetryDataFormatSDKCumulativeDeltaMonitoringNetworkIssuesConversionTelemetryPipeline