Understanding Prometheus Metric Types | Meaning and Usage (Gauge, Counter, Summary, Histogram)

Prometheus Monitoring with Julius | PromLabs

13 Jan 202311:19

Summary

TLDRIn this video, Julius explains the four main metric types in Prometheus: gauges, counters, summaries, and histograms. He covers how each metric type is used in monitoring applications, detailing their unique properties and how to implement them in Prometheus with PromQL queries. Gauges measure current values, counters track cumulative events, summaries calculate percentiles, and histograms track value distributions across buckets. The video also emphasizes key PromQL functions like rate(), histogram_quantile(), and techniques for handling data aggregation across instances. A comprehensive guide for anyone looking to understand Prometheus metrics and improve observability in their systems.

Takeaways

😀 Gauges are metrics that can go up or down, such as memory usage or queue length, and are typically used for current measurements.
😀 Counters are cumulative metrics that only increase over time, like tracking the total number of HTTP requests handled or total duration spent handling requests.
😀 Summaries allow tracking distributions like request latencies and calculate percentiles, but should not be aggregated across multiple instances or dimensions.
😀 Histograms track distributions in defined buckets (e.g., request durations) and can be used to calculate approximate percentiles using the `histogram_quantile()` function.
😀 The `Set()` method for gauges allows setting a specific value, while `Inc()` and `Dec()` can increment or decrement the value by a given amount.
😀 Counters cannot be decreased or set to specific values; they only support incrementing via `Inc()` or `Add()`.
😀 In PromQL, counters should be queried using `rate()`, `increase()`, or `irate()` functions to calculate the rate of change over time, rather than looking at their absolute values.
😀 Histograms in Prometheus are cumulative, meaning each bucket contains the count of all lower-range buckets, and can be queried for distribution analysis.
😀 For summaries, the data is broken down into individual time series for each quantile (e.g., 90th percentile), and these should not be aggregated across dimensions.
😀 New native histograms are coming soon to Prometheus, offering a more efficient way to store an entire histogram in a single time series sample.
😀 When using histograms, be mindful of the trade-off between bucket resolution and resource usage, as more buckets increase resolution but also increase the load on your Prometheus server.

Q & A

What is a gauge metric in Prometheus?
-A gauge metric represents a current measurement or count that can go up or down. Examples include metrics like memory usage, queue length, or temperature. You can update its value using methods like Set(), or relative increments or decrements.
How does Prometheus handle the gauge metric type in queries?
-In Prometheus, the current value of a gauge is meaningful by itself and can often be graphed directly. However, you can apply functions like time() to compute how long ago an event happened based on the value of a timestamped gauge.
What is a counter metric and how is it different from a gauge?
-A counter metric tracks a cumulative count over time, such as the total number of HTTP requests handled. Unlike gauges, counters only increase and never decrease, with the exception of a counter reset when the process restarts.
Why shouldn't you directly query the absolute value of a counter metric in Prometheus?
-Since counters are cumulative and reset when the application restarts, the absolute value of a counter doesn't provide meaningful information. Instead, you should use functions like rate(), irate(), or increase() to calculate the rate of increase over a time window.
How do you use a summary metric in Prometheus?
-A summary metric tracks the distribution of values as quantiles (percentiles). You specify the quantiles and error margins when creating the summary and then use the Observe() method to record individual values. Summaries expose time series for each quantile and total counts and sums.
What is the limitation of using summaries when aggregating data across multiple dimensions?
-Summaries calculate quantiles based on the local data, and there's no valid way to aggregate quantiles across multiple service instances or label dimensions. For cross-instance aggregation, histograms should be used instead.
How does a histogram metric differ from a summary in Prometheus?
-A histogram tracks the distribution of values by counting them into pre-defined bucket ranges, whereas a summary directly computes quantiles. Histograms are cumulative, with each bucket containing counts of all lower ranges, allowing for easier aggregation across instances.
What is the main trade-off when configuring histogram buckets?
-The main trade-off when configuring histogram buckets is between cost and resolution. More fine-grained buckets provide better resolution but increase the cost (e.g., memory usage). Too many buckets may overwhelm the Prometheus server.
What PromQL function is used to compute percentiles from histogram data?
-The histogram_quantile() function is used to compute approximated percentiles from histogram data, considering the 'le' label for each bucket's upper boundary. To ensure accurate results, you should wrap the histogram buckets with functions like rate() or increase() before using histogram_quantile().
What is the upcoming feature related to histograms in Prometheus?
-Prometheus is introducing a native histogram type, which will store an entire histogram in a single sample, offering better efficiency. This new feature is still in an experimental phase and is expected to differ from the traditional histogram implementation.