Programmer Notes | Detailed Understanding of Eureka Cache Mechanism

  Cache, Microservices, spring-cloud

Introduction

Eureka is Netflix’s open source service for service registration and discovery. Spring Cloud Eureka carries out secondary packaging based on Eureka, which adds a more humanized UI and is more convenient to use. However, due to the existence of more caches in Eureka itself, the service status update lags behind. The most common situation is that the status is not updated in time after the service is offline, and the service consumer calls to the offline service, causing the request to fail.Based on Spring Cloud Eureka 1.4.4.RELEASE, this paper introduces Eureka’s caching mechanism under the premise of default region and zone.

I. AP characteristics

According to CAP theory, Eureka is an AP system, which gives priority to availability (A) and partition fault tolerance (P), does not guarantee strong consistency (C), and only guarantees final consistency. Therefore, more caches are designed in the architecture.

II. Service Status

Eureka service status enum class:com.netflix.appinfo.InstanceInfo.InstanceStatus

state explain state explain
UP Online OUT_OF_SERVICE Failure
DOWN Offline UNKNOWN Unknown
STARTING Starting

Iii. Eureka Server

In the Eureka high availability architecture, Eureka server can also register with other servers as a Client. Multiple nodes register with each other to form Eureka clusters, and the clusters are regarded as peer. When Eureka Client registers, renews and updates its status with the Server, it receives the node’s update of its service registration information and synchronizes to other peer nodes one by one.

[Note] If server-A registers with server-B node in one direction, server-A regards server-B as peer node, and the data accepted by server-A will be synchronized with server-B, but the data accepted by server-B will not be synchronized with Server-A.

3.1 Cache Mechanism

Eureka Server has three variables: (registry、readWriteCacheMap、readOnlyCacheMap) Save service registration information. By default, the scheduled task synchronizes readWriteCacheMap to readOnlyCacheMap every 30s, cleans up nodes that have not been renewed for more than 90s every 60s, Eureka Client updates service registration information from readOnlyCacheMap every 30s, and UI updates service registration information from registry.

Cache

Cache Type explain
registry ConcurrentHashMap Real-time updateThe UI side requests the service registration information here.
readWriteCacheMap Guava Cache/LoadingCache Real-time update, class ResponseCacheImpl member variable, cache time 180 seconds
readOnlyCacheMap ConcurrentHashMap Periodic update, class ResponseCacheImpl member variable, default per30sFrom readWriteCacheMap, Eureka client updates service registration information from here by default and can be configured to update directly from readWriteCacheMap.

Cache related configuration

Configuration Default explain
eureka.server.useReadOnlyResponseCache true Client slavereadOnlyCacheMapUpdate data, false skips readOnlyCacheMap and updates directly from readWriteCacheMap
eureka.server.responsecCacheUpdateIntervalMs 30000 ReadWriteCacheMap Update to readOnlyCacheMap Period, Default30s
eureka.server.evictionIntervalTimerInMs 60000 Clean up non-renewal node (evict) cycle, default60s
eureka.instance.leaseExpirationDurationInSeconds 90 Timeout for clearing non-renewed nodes, default90s

Key classes

Class name explain
com.netflix.eureka.registry.AbstractInstanceRegistry Save service registration information and hold registry and responseCache member variables.
com.netflix.eureka.registry.ResponseCacheImpl Hold readWriteCacheMap and readOnlyCacheMap member variables

Iv. eurekachent

Eureka Client has two roles:service providerAndService consumers, as a service consumer, is generally used in conjunction with the Ribbon or Feign(Feign uses the Ribbon internally). After Eureka Client is started, it will register with the Server as a service provider immediately, and renew); every 30s by default; As a service consumer, immediately update the service registration information with the Server in full quantity, and by default update the service registration information every 30s increments; The Ribbon delays 1s to obtain the used service registration information from the Client. By default, the used service registration information is UPdated every 30s, and only the services with the status up are saved.

Level 2 cache

Cache Type explain
localRegionApps AtomicReference Periodic update, class DiscoveryClient member variable, Eureka Client saves service registration information, and updates the Server completely immediately after startup. By default, every30sIncremental update
upServerListZoneMap ConcurrentHashMap Periodic update, class LoadBalancerStats member variable, Ribbon saved for use and in stateUPThe service registration information of the is updated to the Client with a delay of 1s after being started. by default, every30sUpdate

Cache related configuration

Configuration Default explain
eureka.instance.leaseRenewalIntervalInSeconds 30 Eureka Client renewal period, default30s
eureka.client.registryFetchIntervalSeconds 30 Eureka Client incremental update cycle, default30s(Incremental update under normal circumstances, full update in case of timeout or inconsistency with Server side, etc.)
ribbon.ServerListRefreshInterval 30000 Ribbon update cycle, default30s

Key classes

Class name explain
com.netflix.discovery.DiscoveryClient Eureka Client is responsible for registration, renewal and update. The method initScheduledTasks () initializes the renewal and update timing tasks respectively
com.netflix.loadbalancer.PollingServerListUpdater The Ribbon updates the service registration information used, and start initializes the update timing task.
com.netflix.loadbalancer.LoadBalancerStats Ribbon, saved for use and in statusUPService registration information for

V. Maximum Perceived Time of Service Consumers under Default Configuration

Eureka Client Time explain
Upper line 30(readOnly)+30(Client)+30(Ribbon)=90s Readwrite-> readonly-> client-> ribbon each 30s
Normal logoff 30(readonly)+30(Client)+30(Ribbon)=90s If the service goes offline normally (kill or kill -15 kills the process), it will give the process a chance to clean up. DiscoveryClient.shutDOWN () will update its status to down to the Server, and then send DELETE request to log off itself. registry and readWriteCacheMap will be updated in real time, so UI will no longer display the service instance.
Abnormal logoff 30+60(evict)*2+30+30+30=240s Abnormal downline of service (kill -9 killing process or process crash) will not trigger the DiscoveryClient.shutdown () method. Eureka Server will rely on cleaning up more than 90s of unresigned service every 60s to delete the service instance from registry and readWriteCacheMap

Consider the following

  • The service did not notify Eureka Client to go offline directly at 0s;
  • At 29s, the first overdue evict examination did not exceed 90s; ;
  • At 89s, the second overdue evict did not exceed 90s; ;
  • At 149s, the third expiration check evict did not renew the contract for more than 90s, so the service instance was deleted from registry and readWriteCacheMap.
  • At 179s, the scheduled task is updated from readWriteCacheMap to readOnlyCacheMap;
  • Eureka Client updated from readOnlyCacheMap of Eureka Server at 209s;
  • The Ribbon was updated from eurekclient at 239s.

Therefore, the maximum perceived time of service consumers will approach 240s indefinitely under extreme conditions.

VI. Countermeasures

When choosing to use Eureka, the service registration center stated that it has accepted its characteristics of priority guaranteed availability (a), partition fault tolerance (p) and no guarantee of strong consistency (c). If strong consistency (C) needs to be guaranteed first, CP systems such as ZooKeeper should be considered as service registration centers. Multi-nodes are generally configured in distributed systems, and the delay in updating the status of a single node’s service on-line has no effect. Here, the countermeasures for the delay in updating the status after the service off-line are mainly considered.

6.1 Eureka Server

  • 1.Shorten readOnlyCacheMap Update Cycle. Shortening the timing task period can reduce the lag time.

    Eureka.server.responsecache updateintervalms: 10000 # eureka serverreadonlycachemap update cycle
  • 2.Close readOnlyCacheMap. Small and medium-sized systems can consider this scheme. Eureka Client updates service registration information directly from readWriteCacheMap.

    Eureka.server.usereadonlyresponseache: false # whether readOnlyCacheMap is used

6.2 Eureka Client

  • 1.Service Consumers Use Fault Tolerance Mechanism. For example, Spring Cloud Retry and Hystrix, Ribbon, Feign and Zuul can all configure Retry. When a service consumer visits a node that has already been offline, it will normally report ConnectTimeout, and then the next node can be retried through the Retry mechanism.
  • 2.Service Consumers Shorten Update Cycle. Eureka Client and Ribbon L2 Cache Affect Status Update. Shortening these two timed task cycles can reduce latency, such as configuration:

    Eureka.client.registryfetchintervalseconds: 5 # eureka client update cycle
    Serverlist refreshinterval: 2000 # ribbon update cycle
  • 3.The service provider guarantees the normal offline of the service.. When the service goes offline, use the kill or kill -15 command to avoid using the kill -9 command. when the kill or kill -15 command kills the process, it will trigger the shutdown () method of Eureka Client, actively delete the registration information in the Server’s registry and readWriteCacheMap, without relying on the Server’s evict cleanup.
  • 4.Service provider delays logoff. Before the service goes offline, the interface is called to make the service state stored in Eureka Server DOWN or OUT_OF_SERVICE before going offline. The time difference between the two is determined according to the caching mechanism and configuration. For example, by default, if the service is called after the interface is delayed for 90 seconds before going offline, the service consumer will not call the offline service instance.

7. Gateway Realizes Real-time Perception of Service Downline

In software engineering, there is not a problem that the middle layer cannot solve, while the gateway is the middle layer between service providers and service consumers. Take Spring Cloud Zuul Gateway as an example. The gateway, as Eureka Client, saves service registration information. The service consumer forwards the request to the service provider through the gateway, and only needs to notify the gateway to disable the service in its own service list when the service provider goes offline. In order to maintain the independence of gateways, an independent service can receive offline notification and coordinate gateway clusters.The next article will describe in detail how the gateway realizes real-time sensing of service offline.Please look forward to it!

Author: Feng Yongbiao
Source of content:Yixin Institute of Technology