Talk about FailureDetector of apache gossip

  gossip

Order

This article mainly studies the FailureDetector of apache gossip.

FailureDetector

incubator-retired-gossip/gossip-base/src/main/java/org/apache/gossip/accrual/FailureDetector.java

public class FailureDetector {

  public static final Logger LOGGER = Logger.getLogger(FailureDetector.class);
  private final DescriptiveStatistics descriptiveStatistics;
  private final long minimumSamples;
  private volatile long latestHeartbeatMs = -1;
  private final String distribution;

  public FailureDetector(long minimumSamples, int windowSize, String distribution) {
    descriptiveStatistics = new DescriptiveStatistics(windowSize);
    this.minimumSamples = minimumSamples;
    this.distribution = distribution;
  }

  /**
   * Updates the statistics based on the delta between the last
   * heartbeat and supplied time
   *
   * @param now the time of the heartbeat in milliseconds
   */
  public synchronized void recordHeartbeat(long now) {
    if (now <= latestHeartbeatMs) {
      return;
    }
    if (latestHeartbeatMs != -1) {
      descriptiveStatistics.addValue(now - latestHeartbeatMs);
    }
    latestHeartbeatMs = now;
  }

  public synchronized Double computePhiMeasure(long now) {
    if (latestHeartbeatMs == -1 || descriptiveStatistics.getN() < minimumSamples) {
      return null;
    }
    long delta = now - latestHeartbeatMs;
    try {
      double probability;
      if (distribution.equals("normal")) {
        double standardDeviation = descriptiveStatistics.getStandardDeviation();
        standardDeviation = standardDeviation < 0.1 ? 0.1 : standardDeviation;
        probability = new NormalDistributionImpl(descriptiveStatistics.getMean(), standardDeviation).cumulativeProbability(delta);
      } else {
        probability = new ExponentialDistributionImpl(descriptiveStatistics.getMean()).cumulativeProbability(delta);
      }
      final double eps = 1e-12;
      if (1 - probability < eps) {
        probability = 1.0;
      }
      return -1.0d * Math.log10(1.0d - probability);
    } catch (MathException | IllegalArgumentException e) {
      LOGGER.debug(e);
      return null;
    }
  }
}
  • The constructor of FailureDetector receives three parameters, namely, minimum samples, windowsize, and distribution.
  • Among them, minimumSamples indicates the minimum number of statistical values required before the phi value is actually calculated, windowSize indicates the size of the statistical window, distribution indicates which distribution to use, normal indicates NormalDistribution, and others indicate ExponentialDistribution
  • FailureDetector uses the DescriptiveStatistics of Apache commonmath as the time window statistics of Heartbeat Interval; NormalDistribution and exploratory distribution are used to complete the cumulative distribution probability of normal distribution and ExponentialDist ribution. finally, phi value is calculated by using the formula-1.0d * math.log10 (1.0d-probability)

Summary

  • The Phi Accrual Failure Detector by Hayashibara et alThis paper proposes an accurate failure detector method based on phi value.
  • There are roughly two implementations of Failure Detector in the industry. one is based on NormalDistribution, represented by akka. One is based on the ExponentialDistribution represented by cassandra
  • Apache gossip’s FailureDetector fully supports both NormalDistribution and ExponentialDistribution

doc