On the HistogramMetric of codahale

  java

Basic concept

mean(Average value)

Mean is calculated on all data. It has excellent mathematical properties and is the most widely used concentrated trend measure in practice. Its main d isadvantage is that it is easily influenced by extreme values of data. For data with skewed distribution, mean is less representative. As harmonic mean and geometric mean of mean deformation, they are representative values applicable to special data. Harmonic mean is mainly used for data that cannot directly calculate mean, while geometric mean is mainly used for calculating mean of ratio data. These two measure values are as easily influenced by extreme values as mean.

median(Median)

Median is the representative value in the middle of a group of data. It is not affected by the extreme value of data. For data with skewed distribution, median is better representative than mean.
In a group of ordered data, if the number of data is odd, the median is the middle number. If the number of data is even, the median is the average of the two values in the middle.

percentile(Percentile)

The p-th percentile is a value such that at least p% of data items are less than or equal to this value, and at least (100-p)% of data items are greater than or equal to this value.

Category 4 Reservoir

ExponentiallyDecayingReservoir(Exponential sampling)

An exponentially-decaying random reservoir of {@code long} s. Uses Cormode et al’s forward-decaying priority reservoir sampling method to produce a statistically representative sampling reservoir, exponentially biased towards newer entries.

UniformReservoir(Random sampling)

A random sampling reservoir of a stream of {@code long}s. Uses Vitter’s Algorithm R to produce a statistically representative sample.

SlidingWindowReservoir(Only the latest n pieces of data are stored.)

A {@link Reservoir} implementation backed by a sliding window that stores the last {@code N}

  • measurements.

SlidingTimeWindowReservoir(Specify time window to reset data)

A {@link Reservoir} implementation backed by a sliding window that stores only the measurements made

Summary

On Instantaneous Value

Except for SlidingTimeWindowReservoir, the rest cannot directly reflect instantaneous values and are “averaged.” Assuming that there is a value at the beginning and all subsequent values are 0, then they will only reflect the initial value and cannot reflect the subsequent change to 0. Only when the subsequent value continues to change will the “delay” be reflected.

About snapshot

Snapshot’s percentile defaults to 75thPercentile, 95thPercentile, 98thPercentile, 99thPercentile, 999thPercentile.

  • The index of 95+ can obviously reflect the change of extreme value

  • 75thPercentile is relatively flat.
    In the case of small extreme value changes, SlidingTimeWindowReservoir will be closer to the actual situation, where the time window corresponds to the reporting interval. Even if the extreme value changes greatly, compared with several other Reservoir, SlidingTimeWindowReservoir is still relatively close to the actual data, and the curve will have obvious changes, unlike other periods that may be smooth.

doc