Prometheus Monitoring System Histogram and Summary

Business systems designed monitoring metrics of type Summary, calculating the average duration: request_duration_milliseconds_sum / request_duration_milliseconds_count.

Reviewing the data, a particular interface was found to have very high average duration, and when examining the time series chart, the average duration suddenly increased – effectively, one request took a long time, which pulled up the overall average. The goal was to identify exactly when this request occurred, but due to the low number of requests within the period, the data retrieved remained empty.

Q&A

✅ Why does _sum and _count have data?

  • _sum and _count are the core metrics of the Summary type, and Prometheus always collects and records these values;
  • They are cumulative counters, suitable for use with rate() or increase();
  • Regardless of request latency changes, as long as there are requests, _sum and _count will always have data.

❌ Why {quantile="0.99"} might not display in a Time Series Chart

Even if Summary is configured with quantile="0.99", this time series may not exist or be missing: Metrics are definitely configured, and the data hasn’t expired. 📉 The request volume is too small, preventing quantile calculation, due to the sliding window mechanism; after this period of time, it will no longer be included in the statistical range. Quantiles (such as p99) are calculated through sampling statistics:

  • If the request count over a certain period is too low (e.g., 1~2 requests), the calculation of p99 is unstable or lacks representativeness;
  • Prometheus client SDK will choose not to expose this quantile time series to avoid misleading;
  • Therefore, you’ll see _sum and _count accumulating normally, but quantile="0.99" has no data.

Histogram and Summary Differences

Histogram

  • How it Works:
    A histogram will bucket data, recording the number of samples falling into each bucket. For example, if buckets are defined as [10ms, 50ms, 100ms, 500ms, 1s], each request latency would be assigned to the corresponding bucket.
  • Advantages:
    • Can aggregate data from multiple instances (e.g., multiple service node request latencies) in Prometheus.
    • Suitable for calculating percentiles (such as P50, P95, P99) and observing latency distributions.
    • Provides flexible querying capabilities, supporting dynamic percentile calculation through PromQL.
  • Disadvantages:
    • Requires predefining the bucket range; an inappropriate choice can lead to uneven data distribution (e.g., all requests falling into one bucket).
    • The more buckets you have, the greater the storage and computational overhead.
  • Suitable Scenarios:
    • Aggregating data from multiple instances.
    • Dynamically adjusting percentiles or analyzing latency distributions.

Summary

  • How it Works: The Summary component calculates percentiles (such as P50, P95, P99) directly on the client and reports the results to Prometheus. It also records the total number and sum of samples for calculating averages.
  • Advantages:
    • Does not require predefined buckets, providing percentile results directly.
    • Suitable for precise percentile calculations in single instances.
  • Disadvantages:
    • Percentile calculation is performed on the client side, preventing aggregation of data from multiple instances in Prometheus.
    • Adjusting percentiles (e.g., changing from P95 to P99) requires modifying the code and redeploying.
  • Use Cases:
    • Single instance monitoring where precise percentile accuracy is a high priority.
    • When aggregation of data from multiple instances is not required.

Key Difference Comparison

Feature Histogram Summary
Quantile Calculation Calculated dynamically within Prometheus Calculated directly on the client side

Key Differences Comparison

Feature Histogram Summary
Multi-Instance Aggregation Supported Not Supported

Key Differences Comparison

Feature Histogram Summary
Bin Definition Requires pre-defined Does not require

Key Differences Comparison

Feature Histogram Summary
Storage Overhead Depends on the number of buckets Fixed overhead

Key Differences Comparison

Feature Histogram Summary
Flexibility High (dynamically adjustable bins) Low (requires code modification to adjust bins)

Summary

  • If you need to aggregate data from multiple instances or require flexible quantile adjustments, choose Histogram.
  • If you only need the precise quantiles for a single instance and the quantiles are fixed, choose Summary.
  • In your scenario, given that the service is distributed, it’s recommended to prioritize using Histogram so that all instance data can be aggregated in Prometheus and dynamically calculate quantiles and distributions of latency.

Sliding Window Concept and Its Relationship with Histograms and Summaries

Sliding Window Concept

A sliding window is a time-windowing mechanism used to analyze changes in data over a period. It dynamically reflects the system’s real-time state by continuously moving a temporal range. The key characteristics of a sliding window are:

  • Fixed Time Range: The length of the window is fixed, such as the last 1 minute or 5 minutes.
  • Real-Time Updates: As time passes, the window slides, old data is removed from the window, and new data is added to the window.
  • Common Uses: Used for calculating real-time metrics (such as request rates, averages, percentiles, etc.).

In Prometheus, sliding windows are typically implemented using query functions (like rate(), avg_over_time()).

Sliding Window and Histogram Relationship

  • Histogram Data Structure: A histogram will bin sample data and record the count for each bucket. Prometheus periodically scrapes these counts.
  • Sliding Window Implementation: In Prometheus, sliding windows can be applied to histogram data using query statements. For example:
    • rate(http_request_duration_seconds_bucket[5m]): Calculates the request rate within each bucket over the past 5 minutes.
    • histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])): Calculates the 95th percentile of the request duration over the past 5 minutes.
  • Advantages:
    • Sliding windows can dynamically reflect the recent request latency distribution.
    • The binning mechanism of histograms combined with sliding windows allows for efficient calculation of percentiles and distributions.

Sliding Window and Summary Relationship

  • Summary Data Structure: Summary calculates percentiles directly on the client side and reports them to Prometheus. It also records the total sample count and sum.
  • Sliding Window Implementation: In Prometheus, sliding windows can be applied to Summary data using query statements. For example:
    • rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m]): Calculates the average request duration over the past 5 minutes.
  • Limitations:
    • Summary percentiles are calculated on the client side and cannot be recalculated in Prometheus, therefore support for sliding windows with percentiles is limited.
    • Sliding windows cannot directly operate on Summary percentiles when aggregating data from multiple instances.

Sliding Window Applicability

  • Real-time Monitoring: Sliding windows are suitable for monitoring system real-time status, such as request rates over the last minute and latency distributions.
  • Anomaly Detection: By using a sliding window, it’s possible to quickly identify short-term anomalies (e.g., sudden increase in request latency).
  • Dynamic Analysis: Sliding windows can dynamically reflect changes in system trends rather than static global statistics.

Summary

  • Histogram combined with a sliding window can dynamically calculate percentiles (such as P95, P99) and request latency distributions, suitable for monitoring distributed systems.
  • Summary combined with a sliding window can calculate simple metrics such as averages, but lacks flexibility regarding percentiles and does not support multi-instance aggregation.

In your scenario, due to the need to monitor extreme request latencies (such as P99) and the average latency of most requests, it is recommended to use Histogram and combine it with a sliding window query to dynamically analyze system performance.

A financial IT programmer's tinkering and daily life musings
Built with Hugo
Theme Stack designed by Jimmy