Eureka is Netflix’s open source service for service registration and discovery. Spring Cloud Eureka carries out secondary packaging based on Eureka, which adds a more humanized UI and is more convenient to use. However, due to the existence of more caches in Eureka itself, the service status update lags behind. The most common situation is that the status is not updated in time after the service is offline, and the service consumer calls to the offline service, causing the request to fail.Based on Spring Cloud Eureka 1.4.4.RELEASE, this paper introduces Eureka’s caching mechanism under the premise of default region and zone.
I. AP characteristics
According to CAP theory, Eureka is an AP system, which gives priority to availability (A) and partition fault tolerance (P), does not guarantee strong consistency (C), and only guarantees final consistency. Therefore, more caches are designed in the architecture.
II. Service Status
Eureka service status enum class:
Iii. Eureka Server
In the Eureka high availability architecture, Eureka server can also register with other servers as a Client. Multiple nodes register with each other to form Eureka clusters, and the clusters are regarded as peer. When Eureka Client registers, renews and updates its status with the Server, it receives the node’s update of its service registration information and synchronizes to other peer nodes one by one.
[Note] If server-A registers with server-B node in one direction, server-A regards server-B as peer node, and the data accepted by server-A will be synchronized with server-B, but the data accepted by server-B will not be synchronized with Server-A.
3.1 Cache Mechanism
Eureka Server has three variables: (registry、readWriteCacheMap、readOnlyCacheMap) Save service registration information. By default, the scheduled task synchronizes readWriteCacheMap to readOnlyCacheMap every 30s, cleans up nodes that have not been renewed for more than 90s every 60s, Eureka Client updates service registration information from readOnlyCacheMap every 30s, and UI updates service registration information from registry.
|registry||ConcurrentHashMap||Real-time updateThe UI side requests the service registration information here.|
|readWriteCacheMap||Guava Cache/LoadingCache||Real-time update, class ResponseCacheImpl member variable, cache time 180 seconds|
|readOnlyCacheMap||ConcurrentHashMap||Periodic update, class ResponseCacheImpl member variable, default per30sFrom readWriteCacheMap, Eureka client updates service registration information from here by default and can be configured to update directly from readWriteCacheMap.|
Cache related configuration
||true||Client slavereadOnlyCacheMapUpdate data, false skips readOnlyCacheMap and updates directly from readWriteCacheMap|
||30000||ReadWriteCacheMap Update to readOnlyCacheMap Period, Default30s|
||60000||Clean up non-renewal node (evict) cycle, default60s|
||90||Timeout for clearing non-renewed nodes, default90s|
||Save service registration information and hold registry and responseCache member variables.|
||Hold readWriteCacheMap and readOnlyCacheMap member variables|
Eureka Client has two roles:service providerAndService consumers, as a service consumer, is generally used in conjunction with the Ribbon or Feign(Feign uses the Ribbon internally). After Eureka Client is started, it will register with the Server as a service provider immediately, and renew)； every 30s by default; As a service consumer, immediately update the service registration information with the Server in full quantity, and by default update the service registration information every 30s increments; The Ribbon delays 1s to obtain the used service registration information from the Client. By default, the used service registration information is UPdated every 30s, and only the services with the status up are saved.
Level 2 cache
|localRegionApps||AtomicReference||Periodic update, class DiscoveryClient member variable, Eureka Client saves service registration information, and updates the Server completely immediately after startup. By default, every30sIncremental update|
|upServerListZoneMap||ConcurrentHashMap||Periodic update, class LoadBalancerStats member variable, Ribbon saved for use and in stateUPThe service registration information of the is updated to the Client with a delay of 1s after being started. by default, every30sUpdate|
Cache related configuration
||30||Eureka Client renewal period, default30s|
||30||Eureka Client incremental update cycle, default30s(Incremental update under normal circumstances, full update in case of timeout or inconsistency with Server side, etc.)|
||30000||Ribbon update cycle, default30s|
||Eureka Client is responsible for registration, renewal and update. The method initScheduledTasks () initializes the renewal and update timing tasks respectively|
||The Ribbon updates the service registration information used, and start initializes the update timing task.|
||Ribbon, saved for use and in statusUPService registration information for|
V. Maximum Perceived Time of Service Consumers under Default Configuration
|Upper line||30(readOnly)+30(Client)+30(Ribbon)=90s||Readwrite-> readonly-> client-> ribbon each 30s|
|Normal logoff||30(readonly)+30(Client)+30(Ribbon)=90s||If the service goes offline normally (kill or kill -15 kills the process), it will give the process a chance to clean up. DiscoveryClient.shutDOWN () will update its status to down to the Server, and then send DELETE request to log off itself. registry and readWriteCacheMap will be updated in real time, so UI will no longer display the service instance.|
|Abnormal logoff||30+60(evict)*2+30+30+30=240s||Abnormal downline of service (kill -9 killing process or process crash) will not trigger the DiscoveryClient.shutdown () method. Eureka Server will rely on cleaning up more than 90s of unresigned service every 60s to delete the service instance from registry and readWriteCacheMap|
Consider the following
- The service did not notify Eureka Client to go offline directly at 0s;
- At 29s, the first overdue evict examination did not exceed 90s； ;
- At 89s, the second overdue evict did not exceed 90s； ;
- At 149s, the third expiration check evict did not renew the contract for more than 90s, so the service instance was deleted from registry and readWriteCacheMap.
- At 179s, the scheduled task is updated from readWriteCacheMap to readOnlyCacheMap;
- Eureka Client updated from readOnlyCacheMap of Eureka Server at 209s;
- The Ribbon was updated from eurekclient at 239s.
Therefore, the maximum perceived time of service consumers will approach 240s indefinitely under extreme conditions.
When choosing to use Eureka, the service registration center stated that it has accepted its characteristics of priority guaranteed availability (a), partition fault tolerance (p) and no guarantee of strong consistency (c). If strong consistency (C) needs to be guaranteed first, CP systems such as ZooKeeper should be considered as service registration centers. Multi-nodes are generally configured in distributed systems, and the delay in updating the status of a single node’s service on-line has no effect. Here, the countermeasures for the delay in updating the status after the service off-line are mainly considered.
6.1 Eureka Server
1.Shorten readOnlyCacheMap Update Cycle. Shortening the timing task period can reduce the lag time.
Eureka.server.responsecache updateintervalms: 10000 # eureka serverreadonlycachemap update cycle
2.Close readOnlyCacheMap. Small and medium-sized systems can consider this scheme. Eureka Client updates service registration information directly from readWriteCacheMap.
Eureka.server.usereadonlyresponseache: false # whether readOnlyCacheMap is used
6.2 Eureka Client
- 1.Service Consumers Use Fault Tolerance Mechanism. For example, Spring Cloud Retry and Hystrix, Ribbon, Feign and Zuul can all configure Retry. When a service consumer visits a node that has already been offline, it will normally report ConnectTimeout, and then the next node can be retried through the Retry mechanism.
2.Service Consumers Shorten Update Cycle. Eureka Client and Ribbon L2 Cache Affect Status Update. Shortening these two timed task cycles can reduce latency, such as configuration:
Eureka.client.registryfetchintervalseconds: 5 # eureka client update cycle Serverlist refreshinterval: 2000 # ribbon update cycle
- 3.The service provider guarantees the normal offline of the service.. When the service goes offline, use the kill or kill -15 command to avoid using the kill -9 command. when the kill or kill -15 command kills the process, it will trigger the shutdown () method of Eureka Client, actively delete the registration information in the Server’s registry and readWriteCacheMap, without relying on the Server’s evict cleanup.
- 4.Service provider delays logoff. Before the service goes offline, the interface is called to make the service state stored in Eureka Server DOWN or OUT_OF_SERVICE before going offline. The time difference between the two is determined according to the caching mechanism and configuration. For example, by default, if the service is called after the interface is delayed for 90 seconds before going offline, the service consumer will not call the offline service instance.
7. Gateway Realizes Real-time Perception of Service Downline
In software engineering, there is not a problem that the middle layer cannot solve, while the gateway is the middle layer between service providers and service consumers. Take Spring Cloud Zuul Gateway as an example. The gateway, as Eureka Client, saves service registration information. The service consumer forwards the request to the service provider through the gateway, and only needs to notify the gateway to disable the service in its own service list when the service provider goes offline. In order to maintain the independence of gateways, an independent service can receive offline notification and coordinate gateway clusters.The next article will describe in detail how the gateway realizes real-time sensing of service offline.Please look forward to it!
Author: Feng Yongbiao
Source of content:Yixin Institute of Technology