LAIN is an open source PaaS platform developed by the Big Data Innovation Center of Yixin Company. In the financial scene, LAIN is a cloud platform designed to liberate the productivity of various teams and business lines. LAIN provides a unified testing and production environment for all teams in the big data innovation center, simplifies the service deployment and online process, and also reduces the complexity of system management for operation and maintenance personnel.
First, the design concept and solve the problem
LAIN standardizes the workflow of development, testing and online of an application, and provides an overall solution to devops problems such as container layout, permission control, SDN, traffic management, monitoring and alarm, backup, log, etc.
On LAIN, application is a basic concept. The developer of an application only needs to define one lain.yaml to define the compilation and running mode of the application, which is very low in intrusiveness to the application code. LAIN is based on container technology, oriented to diversified technology stack, and natural isolation system and application dependence.
When LAIN users create an application (service), they can register the application with LAIN. The current user automatically becomes the maintainer of the application and has the right to further operate the application. Building the application environment requires docker and lain command line tools. For convenience, we have created a vagrant box or lain-box. In addition to the engineering code, we also need a Docker image as the basic image, that is, the compiled environment. If it is a binary project, such as golang, you can replace a base at runtime, otherwise you will use the build image as the release image. Once you are ready to mirror and compile/run the script, you can edit lain.yaml.
Specifically, LAIN solved the following four problems:
1. the overall solution to the devops problem under application development.
- Application-level development for users is only the tip of the iceberg, under which there are a series of back work such as computer room, network, server, system management, operation and maintenance management, monitoring, alarm, log, etc., which may be more complicated than application-level development.
- IaaS has been adopted to solve the problem of server procurement and shelving, but a strong devops team is still needed to take charge of the above affairs, otherwise the infrastructure will easily become the bottleneck of development and become more and more difficult to solve.
- The above work may be homogeneous for each product, but with customization, it will consume a lot of time to do these repetitive tasks.
How does Lain do it
- Lain cluster can be built directly on the almost bare IaaS or server, which is convenient for on-line expansion, capacity reduction and other cluster bottom resource operations.
- It integrates the good overall operation and maintenance practices precipitated by the industry and provides an overall solution for this large piece of work under the iceberg.
- Encapsulate the complicated system management and operation and maintenance management behaviors into a more simple and easy-to-use toolkit, greatly simplifying most of the system work and reducing the technical threshold and manpower requirements for daily maintenance.
- Integration of homogeneous work to avoid duplication of work
- Out of the box, various management components include deployment, expansion, monitoring, alarm, log and other aspects. There are also complimentary applications, including mysql and redis cluster services
2. Standardized the workflow of application development, supplemented by appropriate SCM support
- In individual developers and startup organizations, good workflow is hardly mentioned. However, the technical debts left over from the process of development will increasingly affect the efficiency and quality of development and deployment.
- Irregularities in design, development and deployment behavior can lead to various problems.
How does Lain do it
- Provide solutions for local development environment
- Provide SDK/CLI tool chain for local development process so that the development and construction process is embedded in the solution
- It implicitly provides SCM support and restricts developers’ development and release behavior.
3. Improve overall resource utilization rate and optimize redundant resource pool
- In the traditional case of planning resource pools according to product lines, each product will be reserved with its own resource pool and redundancy for disaster preparedness and service burst traffic.
- However, each product line has different types of resource requirements and different types of redundancy, which cannot be shared in common, resulting in numerous repetitive redundancies and relatively low resource utilization rate.
- Through the redundancy of server resources, expansion and reduction of capacity, as well as the operation of resource migration, it is relatively complicated, time consuming and risky.
How does Lain do it
- Through resource isolation and control of container technology, the mixed deployment of multiple technology stack applications within the cluster without mutual influence is realized, and redundancy is carried out through a unified resource pool, thus effectively improving the resource utilization rate.
- The application of container technology enables the use of lower resources to form a completely unified form, with low cost of expansion, volume reduction and migration, and simpler operation.
4. TBD: The architecture provides the possibility and solution of service governance
At the application level, LAIN also has the following characteristics:
1. Define applications based on configuration files
- On the existing application, only one configuration file lain.yaml needs to be added to define the compilation and running of the application in lain cluster.
- The intrusiveness to application code is very low
2. SDN Network Security Isolation
- Use open sourcecalico(https://github.com/projectcal …
- Efficient intra-application network interworking
- Network Default Isolation Between Applications
- Explicitly Declare Service Exchange Visits between Applications
3. Supporting Diversified technology stack Based on Container Technology
- Building Container Cloud with Open Source docker Project
- Expand and encapsulate Dockerfile, and use custom yaml format to define clusters for application.
- Just meet the simplest lain cluster runtime interface, and you can freely choose base image.
Container technology naturally supports isolation systems and application dependencies
- The lain SDK/CLI and optional ci components support correspondence between code versions and images
- Both compile-time and run-time images can be fully customized and isolated.
4. Apply online expansion and volume reduction
- Use Open Source swarm to Schedule Application Deployment
Deep encapsulation of swarm docker API, self-development of Cluster Controller (deployd) and Application Controller (console)
- Directly support user API calls to expand and reduce the number of container instances
- Directly supports user API calls to expand and reduce the capacity of container single instance resources (CPU, MEM)
5, node online expansion capacity reduction
- Use open sourceansible(https://github.com/ansible/an …
- The server NODE of the cluster are compatible with the physical servers, virtual machines and public cloud servers in the same C segment
- Cluster management toolkit supports add NODE and remove NODE instructions to quickly expand and shrink the underlying resources.
6. Automatic Service Maintenance and Disaster Recovery
Self-developed cluster controller (deployd)
- Container instance-level service patrol and maintenance, automatic migration and service recovery
- Portal load balancer HA based on virtual ip automatic drift
- Advanced API Supports Custom Migration of Services
7. Internal Service Dependence and Discovery Mechanism
The cluster supports the Service/Resource mechanism.
- Service Application of Cluster as a Whole
- Application of Private Service (i.e. Resource) Service Application
- The cluster supports special service application types and resource application types
The Service/Resource used is explicitly declared in lain.yaml
- DNS-based Service Discovery Mechanism
- Programmable service/resource load balancer
- Load balancer of RoundRobin type is available by default.
8. Unified Certification
- The cluster develops its own unified authentication component (sso)
- Supports oauth2 authentication methods
9. Unified Management of Virtual ip and Load Balancer
- Virtual ip and application proc registration are supported. applications can register virtual ip for external services.
- Based on the virtual ip drift mecHAnism of etcd lock mechanism, load balancer can be used to realize ha.
10. automatic configuration of web load balancer
- Use open source nginx andtengine(https://github.com/alibaba/te …
The self-developed watcher detects the overall runtime data of cluster applications and automatically generates configurations for web services.
- Acquiring the time of runtime change and judging whether configuration change is needed or not
- Configuration Change Event Starting with Rendering of Configuration
- Trigger reload to take effect
11. Cluster Systematized Log Collection
- Use open sourceheka(https://github.com/mozilla-se …
- Stdout/stderr log collection applied by default collection
- Support the application of explicit declaration of landing document logs that need to be collected.
- Support application of explicit declaration structured monitoring data log
- Customized nginx Log Collection and Data Statistics for Testing web Service load balancer
12. Private docker registry and Authentication Mechanism
- Encapsulating Private registry Applications with Open Source docker registry
- Integration of Private Unified Authentication Mechanisms for Support Clusters
- Customization Support Optional moosefs Storage Backend or Ceph Storage Backend
13. Application Configuration Encrypted Storage
- Encrypted Storage Components Using Open Source Library Encapsulated Application Private Configuration
- Integrated sso Components for User Management and Privilege Isolation
- Inject configuration during application runtime phase
14. Localized Development Environment
- Use open source vagrant, free centos and virtualbox to organize a unified localization development environment.
- Even support the local use of the above tool chain bootstrap to create a lain local cluster
15. Application Deployment Operation and Maintenance API and Corresponding CLI Client
- All components of the cluster provide API for application construction, release, deployment and operation and maintenance.
- Using lain SDK/CLI to re-package the above API provides users with a good operation interface.
- Unified authentication of integrated clusters for user management and privilege isolation
16. Cluster Management CLI
- Use open source ansible to develop cluster management operation kit.
- Encapsulating ansible calls again into a simple CLI makes the operation more convenient, including adding nodes, removing nodes, migrating applications, cluster health checks, etc.
17. Standardized Development workflow
- Based on the above components, SCM is carried out in the one-to-one correspondence between code and image, and the image is published and managed.
- Using lain SDK/CLI and optional ci components for local development, build and release will naturally standardize workflow development.
- The core unit of workflow operation is mirroring. lain cli encapsulates the generation, update, push, deployment, operation and maintenance of mirroring
18. optional cluster-based backup and recovery (backupd+moosefs)
- The open source moosefs is adopted as the distributed storage backend.
- Supports explicit declaration of volume backup requirements and policies in lain.yaml and hooks for setting backup policies
- Support specified backup recovery
19. optional cluster log query component (kafka+elasticsearch+kibana)
- Open source kakfa, elasticsearch and kibana are adopted to build externally dependent Kafka cluster and elasticsearch cluster, and the optional component libana of the cluster is encapsulated.
- The rebellion cluster log collection component supports sending all logs to the external dependency kafka mentioned above
- Support the conditional combination query of cluster application log and web load balancer log on libana.
20. Optional series of preset applications
- MySQL Services(https://github.com/laincloud/ …
- MySQL Resources
- Redis service -SM(https://github.com/laincloud/ …
III. System Architecture
1. Physical View
From a physical perspective, each lain cluster is composed of one or more network interworking Node.
Each node can be assigned a different label for node selection during container scheduling.
In the current implementation, all nodes need to be located behind the same router.
2. Logical View
From a logical perspective, a lain cluster is composed of multiple applications, and the networks between applications are isolated from each other (through SDN technology).
Each application consists of multiple Docker containers, each of which may run on a different node.
Application developers can define multiple containers (called proc) in an application. each proc can be specified to run multiple copies on a cluster. each copy is a container, called proc instance. Lain cluster will try its best to ensure that a specified number of containers are running. If container crash or node fail occurs, cluster will try to restart containers or migrate containers between nodes.
3. System architecture design drawing
The goal is to make a layer-by-layer architecture diagram that can be deepened.