Eureka’s Surprise Group Effect

  springcloud

Order

Recently, several services were released in one breath, involving about 9 instances updated at the same time, and a total of 16 service instances registered. eureka started the self-protection mode, but it did not recover after several minutes.

Analysis

The setting of eureka.instance.leaserenewalintervalinsends is 10 seconds. Eureka is two instances, Eureka.server.renewalpercentthreshold is 0.85, while Eureka.server.renewalthreshold updateintervalms is 900000.

According to this calculation, the normal threshold is 27, and when 9 services are restarted, the instantaneous registered instances are 16+9=25, then the threshold is updated to 42 at this time, and when starting, the old 9 instances are continuously closed while the new 9 instances are continuously not started, then the actual number of heartbeats that can be sent per minute is 7*6=42.

Whether to turn on self-protection

eureka-core-1.4.12-sources.jar! /com/netflix/eureka/registry/PeerAwareInstanceRegistryImpl.java

@Override
    public boolean isLeaseExpirationEnabled() {
        if (!isSelfPreservationModeEnabled()) {
            // The self preservation mode is disabled, hence allowing the instances to expire.
            return true;
        }
        return numberOfRenewsPerMinThreshold > 0 && getNumOfRenewsInLastMin() > numberOfRenewsPerMinThreshold;
    }

The number of heartbeats received in the last minute is not greater than the threshold, then self-protection is turned on at this time.

Summary

This is somewhat similar to the thundering herd problem:

If a large number of other services are retried in the same retry window after the failed service is brought back online, it is easy to put great pressure on the system at this time. This situation is also called the Thundering herd, which can be easily avoided by using a randomized retry window. If the infrastructure does not implement a disconnect switch, it is recommended to use a randomized retry window in conjunction with an Exponential backoff to further disperse the request.

Similarly, for applications that use eureka as service discovery, care should be taken in the deployment of production, either not too many instances should be deployed each time, or relevant parameters of eureka, such as renewalPercentThreshold or renewalthreshold updateintervalms, should be modified.


For the latest content, please pay attention to WeChat public number

图片描述