I. LAIN, PaaS Platform Based on Docker
In the financial scene, LAIN is a cloud platform designed to liberate the productivity of various teams and business lines. LAIN has been on-line for about two years and is basically mature. It is appropriate for all teams in the Big Data Innovation Center to provide a unified testing and production environment, which simplifies the deployment and on-line process of services and reduces the complexity of system management by operation and maintenance personnel.
LAIN standardizes the workflow of development, testing and online of an application, and provides an overall solution to devops problems such as container layout, permission control, SDN, traffic management, monitoring and alarm, backup, log, etc. (extended reading:Yixin Open Source | Explain the Function and Architecture of LAIN, PaaS Platform)
On LAIN, application is a basic concept. The developer of an application only needs to define one lain.yaml to define the compilation and running mode of the application, which is very low in intrusiveness to the application code. LAIN is based on container technology, oriented to diversified technology stack, and natural isolation system and application dependence.
When LAIN users create an application (service), they can register the application with LAIN. The current user automatically becomes the maintainer of the application and has the right to further operate the application. Building the application environment requires docker and lain command line tools. For convenience, we have created a vagrant box or lain-box. In addition to the engineering code, we also need a Docker image as the basic image, that is, the compiled environment. If it is a binary project, such as golang, you can replace a base at runtime, otherwise you will use the build image as the release image. Once you are ready to mirror and compile/run the script, you can edit lain.yaml.
Specifically, lain.yaml mainly did the following four things:
1. Determination of application name reflects the boundary of an application.
2. The basic technology stack of application, that is, the mirror image of compilation and operation.
3, the construction process (how to compile)
4. micro-service splitting and internal service configuration (how to operate, operate and maintain)
Regarding point 4, LAIN has a concept of Proc, that is, each application has one or more Proc, Proc has a unique name and type in the application, Proc corresponds to a group of containers at the bottom layer, and the network of each container of each Proc between an application is interoperable, so an application is a few Proc that can be trusted with each other, and externally represents a certain function in reality. Proc type is built in LAIN, worker type is the simplest type, LAIN will do some extra things when dealing with other Proc types.
At the application level, LAIN not only uses lain.yaml to solidify the dependence and behavior of an application, but also has the following highlights:
1. SDN Network Security Isolation
- Using calico Project to Build SDN Network
- Efficient intra-application network interworking
- Network Default Isolation Between Applications
- Explicitly Declare Service Exchange Visits between Applications
2. Control of application authority
- Sso Single Sign-On, Unified Authentication
- Using sso’s group management, console manages the rights of application maintainers, including the rights of registry mirror and application maintenance
Next, we take the simplest web service as an example to illustrate LAIN’s working principle.
Second, look at LAIN’s 9 major functions through examples.
First, write a simple web service with go, hello.go
Next, edit the lain.yaml file:
As you can see, lain.yaml defines how to compile, publish, and test an application. Note that the hello application has only one Proc. web is a short form of web, that is, Proc type and name are both Web. For each Proc, LAIN provides several killer functions:
1. Dynamic Expansion and Shrinkage
You can define the number of instances of a Proc, num_instances, in lain.yaml, you can define the memory used by each instance, and you can dynamically adjust the number of instances and the size of the memory used on the command line or the UI of the console. Dynamic expansion and contraction will automatically inject some swarm filters to ensure that the same Proc instance is dispatched to different nodes.
That is, the volume of docker. If this field is configured, each instance should have a file directory on a node. Although adding volume means adding state in most cases, which is not conducive to HA, volume is still necessary in some special cases such as database containerization. The highlight of LAIN’s volume is that a backup strategy can be configured. lain has a component that can support custom backup and can also customize scripts before and after the backup process. The configuration of when these custom scripts run is similar to crontab’s scheduling strategy, which is equivalent to defining some cronjobs in the container.
Because some data needs to be highly available, in addition, different instances in Proc may need to share a volume. LAIN integrates distributed file systems, such as ceph and MooseFS. cloud_volume has one more usage than ordinary volume, i.e. all instances share the same distributed file system directory.
In essence, it is still docker volume, but the files contained in the directory defined under the logs field will be collected by lain’s log collection system, with the effect equivalent to outputting to standard input and output. In this way, we can manage and query the logs of all applications in a unified way.
Considering that a compilation has to be run on different clusters (such as test and production clusters), it will inevitably lead to a problem, that is, how to load some configurations, such as database user name and password, writing these configurations into the code warehouse will bring obvious security problems, and it is not convenient for automatic integration and deployment on different clusters. LAIN realizes the separation of code and configuration through the built-in lvault component. Each LAIN cluster has its own configuration center lvault, which encrypts and stores configuration files applied by all clusters. Only the application manager has the right to manage the application’s configuration files. In this way, users can write the configurations of different clusters into the corresponding lvault, and then push the same image to different clusters and deploy it for operation.
6. Proc of web type
- Automation of nginx configuration: web-type Proc traffic will be load balanced by a component webrouter based on nginx. For different Proc, the cluster has a default mount point, or a new mount point, namely servername or baseurl, can be customized. You can also define many practical functions such as health check.
- Watcher automatically refreshes the nginx configuration
- The logging system will automatically collect nginx logs
7. virtual IP
Virtual IP is a set of mechanisms designed to ensure the high availability of a Proc.
- Proc can register one or more virtual IP, and applications can provide external services through virtual IP. For example, webrouter can use virtual IP mechanism to remove the single point of nginx.
- Networkd dynamically maintains virtual ip: after etcd configures virtual IP, networkd of each node will be notified. if the corresponding Proc instance is dispatched to this node, networkd will activate the node to configure vip and iptables rules to ensure traffic can reach the instance container. If a Proc registers more than one virtual IP, networkd will try its best to allocate different virtual IP to different nodes. since containers are distributed to different nodes by default, this can ensure strict high availability.
LAIN supports parameters of container scheduling such as swarm’s constraint and affinity, which makes container scheduling more reasonable. For example, by default, containers with the same Proc are scheduled to different nodes as much as possible. The latest LAIN also supports users to customize Proc’s label and Filters.
9, container monitoring and alarm
Using the collectd plug-in developed by ourselves, the basic runtime data of the monitoring container is integrated with open source components such as Carbon, Whisper, Graphite-Web, Grafana, icinga2, etc. the team has developed hedwig and hagrid components, providing a complete set of monitoring and alarm system.
For a cluster, the deployment of an application roughly goes through these stages:
1)console analyzes lain.yaml through lain-sdk, creates network for applications, creates calico profile, and creates sso application maintainer group.
2)console calls the interface of deployd, which completes the arrangement of containers and also provides API such as online capacity expansion and reduction for automatic maintenance and disaster recovery, and writes some important data into etcd.
3)lainlet monitors etcd, and watcher in networkd and webrouter get the latest configuration of the cluster through lainlet to complete automatic drift of virtual IP and automatic update of nginx configuration.
In a word, LAIN considered security more in the initial design, including the management of SDN network and application rights, and the secret file configuration system. In the implementation, considering the support of various technology stack and the advantages brought by containerization, a package of solutions including backup, log, monitoring and alarm are provided, which can enable application users to conveniently develop applications with various characteristics and improve productivity. Finally, for LAIN cluster maintainers, LAIN provides many operation and maintenance tools, including the addition and deletion of LAIN nodes, manual migration of application containers, maintenance mode of nodes, etc., which basically meet all aspects from daily operation and maintenance to disaster recovery.
The original text was published in “High Availability Architecture”
Author: Wang Chaoyi, LAIN Team, Yixin Big Data Innovation Center