How does API Gateway Realize Real-time Perception of Service Downline

  api, Cache, Microservices

Last articleEureka Cache MechanismThis paper introduces the caching mechanism of Eureka. I believe everyone has a better understanding of Eureka. This paper will introduce in detail how API gateway realizes real-time sensing of service downline.

I introduction

In cloud-based microservice applications, the network locations of service instances are dynamically allocated. Moreover, due to automatic scaling, failure and upgrade, service instances will often change dynamically. Therefore, the client code needs to use a more complex service discovery mechanism.

At present, there are mainly two modes of service discovery: client discovery and server discovery.

  • The server found that the client initiates a request to the service registration center through the load balancer, and the load balancer queries the service registration center and routes each request to an available service instance.
  • Client discovery: the client is responsible for determining the network address of the available service instances and load balancing the requests in t he cluster. the client accesses the service registration form, i.e. a database of available services, and then the client uses a load balancing algorithm to select an available service instance and then initiates the request.

The biggest difference between client discovery and server discovery is that the client knows (caches) the registry information of available services. If the Client-side cache cannot be updated from the server in time, there may be inconsistency between the client and server-side cache data.

Second, the gateway is used in combination with Eureka.

Netflix OSS provides a good example of client service discovery. Eureka Server is the registration center. zuul is Eureka Client,Zuul compared with Eureka Server. Zuul will cache the service list of Eureka Server to the local and update the service list in the form of scheduled tasks. At the same time, Zuul will discover other services through the local list and use Ribbon to realize client load balancing.

Under normal circumstances, the caller can immediately get a response to the request initiated by the gateway. However, in the case of capacity reduction, offline and upgrade for producers, due to the design structure of multi-level cache and the mechanism of regular update, the service list b on LoadBalance side is not updated in time (from the previous articleEureka Cache MechanismIt can be seen that the service consumer’s longest perception time will approach 240s indefinitely). If the consumer initiates a request to the gateway at this time, the LoadBalance will initiate a request to a service that no longer exists, and the request will time out.

III. Solutions

3.1 Implementation Ideas

After the producer goes offline, the first thing to be sensed is readWriteCacheMap in Eureka Server, and the last thing to be sensed is LoadBalance in the gateway core. However, loadBalance’s discovery of producers is in a list maintained locally by loadBalance.

Therefore, in order to realize the gateway’s real-time perception of the producer’s offline, it can be done as follows: firstly, the producer or the deployment platform actively notifies the Eureka Server, then skips the update time between the Eureka multi-level caches, directly notifies the Eureka Client in Zuul, and finally updates the service list in the Eureka Client to the Ribbon.

However, if the logic code of offline notification is placed in the producer, it will cause code pollution, language differences and other problems.

To borrow a famous saying:

“Any problem in the field of computer science can be solved by adding an indirect intermediate layer.”

Gateway-SynchSpeed is equivalent to a proxy service. It provides REST API to respond to the caller’s offline request. At the same time, it synchronizes the producer’s state to Eureka Server and gateway core, which plays the role of state synchronization and soft things.

Train of thoughtAt the producerShrink capacity, offline, upgradeBefore, the spider platform (spider is the container management platform) will actively notify Gateway-SynchSpeed that an instance of a producer is offline, and then Gateway-SynchSpeed will notify Eureka Server producer that an instance is offline; If Eureka Server is successfully offline, Gateway-SynchSpeed will directly notify the gateway core.

design feature

  • Non-invasive and convenient to use. No matter what language the caller is based on, the caller only needs to issue an http rest request to Gateway-SynchSpeed. The real implementation logic is not intruded into the caller but delivered to the proxy to implement.
  • Atomicity. The caller first goes offline in Eureka Server, and then goes offline as the minimum work execution unit in all relevant gateway cores. Gateway-SynchSpeed is equivalent to a “soft thing” to ensure t he atomic characteristics of service downline to some extent.

3.2 Implementation Steps

Step description

  • Step 1: Do it in the producerShrink capacity, offline, upgradePreviously, the spider platform will notify the Gateway-SynchSpeed service in the form of http request, with the granularity of notification being the container IP where the service instance is located.
  • Step 2: After Gateway-SynchSpeed receives the request, it first checks the availability of IP and then notifies Eureka Server.
  • Step 3: Eureka Server sets Producer to the invalid state and returns the processing result (the offline form of Eureka is divided into two types, one is directly removed from the service registration list, and the second is offline, that is, sets Producer’s state toOUT_OF_SERVICE. If it is offline in the first form, the Spider platform cannot guarantee that the Producer process will be killed immediately after issuing the offline request. if the Producer still has a heartbeat synchronized to Eureka Server during this period, the service will re-register to Eureka Server).
  • Step 4: Gateway-SynchSpeed obtains the result of the previous step, and if the result is successful, the next step is executed; Otherwise, stop.
  • Step 5: Gateway-SynchSpeed is eurekachent. Gateway-SynchSpeed gets Producer’s Application-Name from the local service registration list via IP.
  • Step 6: Gateway-SynchSpeed queries the gateway core library through Application-Name for all related offline servicesGateway group name.
  • Step 7: Gateway-SynchSpeed passGateway group nameGo to the local service list to find all the service addresses ipAddress(ip: port) under the gateway group.
  • Step 8: Gateway-SynchSpeed asynchronously notifies all relevant gateway nodes.
  • Step 9: Gateway-Core, after receiving the notification, makes status logoff to Producer, and records all successful instance information of status logoff to cache DownServiceCache.
  • Step 10: Gateway-Core Update List of Local Ribbon Services.

Iv. compensation mechanism

Eureka provides a security protection mechanism. Before Eureka Client updates the service list from Eureka Server, it will check whether the relevant Hash value has changed (if the client service list is modified, the hash value will change). If it changes, the update method will change from incremental update to full update (byEureka Cache MechanismIt is known that the data of ReadOnycheMap and readWriteCacheMap may be different within 30s). If the client-side cache list is overwritten by ReadOnycheMap, the service list on the Ribbon side will eventually be inconsistent with the data of ReadWriteCacheAP.

For this mechanism of Eureka, the listener EurekaEventListener is introduced as a compensation mechanism. it will listen to Eureka Client full pull events and reset its state to for services not exceeding 30s in the cacheOUT_OF_SERVICE.

V. API Security Design

Considering the security of the system, if it is maliciously accessed, it may cause the producer to log off the production line without reason in the Eureka Server, resulting in the consumer not being able to find the producer through the Eureka Server.

Use black-and-white lists for security filtering. The basic process is as follows:

  • Set white list network segment (IP network segment) in Gateway-Synchspeed
  • Add a filter to Gateway-Synchspeed to check the IP of the offline requestor, and if the IP of the requestor is in the network segment, release it; On the contrary, filtering.

Six, log back

Since Gateway-SynchSpeed and Gateway-Core are deployed in the Docker container, if the container restarts, all log files will be lost. Therefore, it is necessary to write the relevant logs in Gateway-SynchSpeed and Gateway-Core to elastosearch, and Kibana will be responsible for querying the data of elastosearch and displaying them visually.

Seven, code fragment display

Gateway-SynchSpeed performs state synchronization.

EurekaEventListener processes cached data

VIII. Supplementary Notes

At present, the gateway realizes real-time sensing of service downline, and the Zuul and Eureka versions used are Spring Cloud Zuul 1.3.6.RELEASE and Spring Cloud Eureka 1.4.4.RELEASE.

At present, the gateway realizes real-time perception of downstream services of the gateway, and the following conditions must be met:

  • Producers need to be deployed on kubernetes container management platform
  • Producers Do NormalDownline, Upgrade or ShrinkOperation. If the abnormal logoff, such as abnormal service downtime, is caused by insufficient container resources, it is not supported.

The gateway service offline real-time perception is an optional solution provided by the gateway to the business party. This function is not enabled by default in the spider platform. Whether to enable this function is determined by the business party according to its own system requirements. For details, please refer to how to configure it.API gateway access guideIn the “gateway real-time perception on spider configuration document description”.

Author: Xie Guohui

Source: Yixin Institute of Technology