Architecture Design and Operation Process of Yixin Open Source | Distributed Task Scheduling Platform SIA-TASK

I. Background of Distributed Task Scheduling

Both Internet applications and enterprise applications are full of batch processing tasks. We often need some task scheduling systems to help solve problems. With the gradual evolution of the micro-service architecture, the single architecture gradually evolved into a distributed and micro-service architecture. Under this background, many original task scheduling platforms can no longer meet the needs of business systems, so some distributed task scheduling platforms have emerged.

1.1 Evolution of Distributed Task Scheduling

In the actual business development process, many times we inevitably need to use some timing tasks to solve the problem. Usually we have many solutions: use Crontab or SpringCron (of course this may be the case when there are few machines and the tasks are simple and not many). However, when the application complexity increases, the number of timed tasks increases and the dependencies between tasks occur, Crontab’s management and configuration of timed tasks will be very chaotic, seriously affecting the work efficiency. At this time, a series of problems will arise:

  • Task management is chaotic and the life cycle cannot be managed in a unified and coordinated way.
  • If there are dependencies between tasks, it is difficult to arrange them.

With the development of Internet, distributed service architecture is becoming more and more popular. Correspondingly, a distributed task scheduling system is also needed to manage the scheduled tasks in the distributed architecture.

1.2 Distributed Task Scheduling Architecture

分布式任务调度设计

When there are more and more vertical applications, the interaction between applications will become more and more complex. Usually we use distributed or micro-service architecture to extract core services and form separate services. An independent micro-service group gradually forms a stable service center, enabling business applications to respond more quickly to changing market demands.

At this time, the distributed service framework for improving service reuse and integration becomes the key. At the same time, due to the independent service, it is generally possible to achieve independent scheduled tasks, and the change of tasks has little impact on the overall system. In general, we will separate tasks from scheduling (as shown in the above figure). The execution logic of tasks does not need to pay attention to scheduling and scheduling. At the same time, it can ensure the high availability of actuators and scheduling and is easy to develop and maintain.

1.3 Advantages of Distributed Task Scheduling

On the basis of distributed service architecture, since the number of independent services may be large, if timing tasks are implemented in this service alone, it is likely to be difficult to manage, and business restart due to changes in timing tasks cannot be avoided. Therefore, an independent distributed task scheduling system is very necessary and can be used to manage all scheduled tasks as a whole. At the same time, if the task configuration is separated as a function of the distributed task scheduling system, the change of the scheduled task will not affect any business or the whole system:

  • Management is carried out through the way of separating scheduling and tasks, thus greatly reducing the development and maintenance costs;
  • Distributed deployment ensures high availability, scalability, load balancing and fault tolerance of the system.
  • Timing tasks can be deployed and managed through the console, which is convenient, flexible and efficient.
  • Tasks can be persisted to the database, avoiding hidden dangers caused by downtime and data loss. At the same time, there are perfect task failure redo mechanisms and detailed task tracking and alarm strategies.

II. Selection of Distributed Task Scheduling Technology

2.1 Distributed Task Scheduling Considerations

sia-task-设计图

  • Task Scheduling: There is a process sequence for scheduled tasks among multiple services.
  • Task sharding: For a large task, shards need to be executed in parallel.
  • Cross-platform: Apart from projects using Java technology stack (SpringBoot, Spring, etc.), there are also applications using other languages.
  • Non-intrusion: The service does not want to be highly coupled with scheduling, but only focuses on the execution logic of the service.
  • Failover: There are compensation measures to reduce manual intervention for problems encountered during task execution.
  • High availability: The scheduling system itself must ensure high availability.
  • Real-time monitoring: real-time acquisition of task execution status.
  • Visualization: Operation of task scheduling provides visual pages for easy use.
  • Dynamic editing: the task clock parameters of the business may change, and you do not want to stop the deployment.

2.2 Comparison between SIA-TASK and Other Distributed Task Scheduling Technologies

SIA is the abbreviation of Simple is Awesome, the basic development platform of Yixin Company. SIA-TASK (Microservice Task Scheduling Platform) is one of the important products. SIA-TASK conforms to the current microservice architecture mode and has the characteristics of cross-platform, orchestration, high availability, non-intrusion, consistency, asynchronous parallel, dynamic expansion, real-time monitoring, etc.

Open source address:https://github.com/siaorg/sia-task

We first compare the mainstream open source distributed task scheduling framework in the market, analyze its advantages and disadvantages, and then introduce our technology selection.

  • Quartz: Quartz is an open source project in the field of task scheduling by OpenSymphony open source organization, which is completely implemented based on Java. The project was acquired by Terracotta in 2009 and is currently a project owned by Terracotta. Compar ed with the timing tasks provided by JDK or Spring, Quartz has achieved the ultimate control of a single task. With its powerful functions and application flexibility, Quartz has played a huge role in enterprise applications. However, Quartz does not support task scheduling (there are dependencies between tasks) and does not support task fragmentation.
  • TBSchedule: TBSchedule is a distributed scheduling framework that enables a batch of tasks or changing tasks to be dynamically allocated to JVM of mult iple hosts and executed in parallel in different thread groups. ZooKeeper-based pure Java implementation, open source by Alibaba. TBSchedule focuses on task distribution and supports task fragmentation, but there is no task scheduling and it is not cross-platform.
  • Elastic-Job: Elastic-Job is a distributed scheduling solution of Dangdang Open Source, which consists of two independent subprojects Elastic-Job-Lite and Elastic-Job-Cloud. Elastic-Job supports task fragmentation (job fragmentation cons istency), but there is no task scheduling and it is not cross-platform.
  • Saturn: Saturn is a distributed, highly available scheduling service that Proview will open source. Saturn does secondary development in Elastic-Job, supporting monitoring, task segmentation and cross-platform, but there is no task scheduling.
  • Antares: Antares is a Quartz-based distributed scheduling that supports fragmentation and tree task dependency, but is not cross-platform.
  • Uncode-Schedule: Uncode-Schedule is a distributed task scheduling component based on Zookeeper. It supports the execution of all tasks without duplica tion or omission in the cluster. Supports dynamic addition and deletion of tasks. However, it does not support task segmentation or task scheduling, and is not cross-platform.
  • XXL-JOB: XXL-JOB is a lightweight distributed task scheduling platform. Its core design goal is to develop rapidly, learn easily, be lightweight and be easily expanded. XXL-JOB supports fragmentation, simple task dependency and subtask dependency, and is not cross-platform.

Let’s briefly compare SIA-TASK with these task scheduling frameworks:

Task arrangement Task fragmentation cross platform High availability Failover Real-time monitoring
SIA-TASK
Quartz × × .NET × API monitoring
TBSchedule × ×
Elastic-Job × ×
Saturn ×
Antares ×
Uncode-Schedule × × ×
XXL-JOB Subtask dependency ×

It can be found that these scheduling frameworks basically support functions such as high availability, failover and real-time monitoring, but they have different emphases on task scheduling, task fragmentation and cross-platform support. SIA-TASK will fully support these functions.

Iii. introduction to SIA-TASK

3.1 SIA-TASK technology selection

sia-task-technology

  • REST: A Style of Software Architecture. The executor is required to expose the Http calling interface to achieve the purpose of cross-platform.
  • AOP: Aspect Programming Technology. It is used in Spring project extension package Hunter to ensure that Task is called serially (single case and single thread).
  • Quartz: It has powerful functions and flexible applications. It can control a single task to the utmost and is used as the clock component of the dispatching center.
  • MySQL: Used for metadata storage and (temporary) log access.
  • Elastic: Lucene-based search server provides a distributed multi-user full-text search engine for log storage and query.
  • SpringCloud: an active community development framework and a unified development framework designated by the company. For rapid development, rapid iteration.
  • MyBatis: An excellent persistence layer framework that supports customized SQL, stored procedures, and advanced mapping. It is used to simplify the development of persistence layer.
  • Zookeeper: A Proven Registration Center. It is used to solve the problems of high availability and distributed consistency of dispatching centers.

3.2 SIA-TASK design ideas

SIA-Task uses the micro-service design idea for reference, obtains the task metadata distributed on each actuator node, reports it, and uploads it to the registration center. Online editing is used to support online task scheduling and dynamic modification of task clock. Http protocol is used as the interactive transmission protocol. Json is uniformly used in data interaction format. Users operate through the scheduler (described below) to trigger events. The scheduler receives the events, and the scheduling center analyzes the clock, executes the task flow, and notifies the tasks.

3.3 basic concepts of sia-task

SIA-TASK adopts the way of separating tasks from scheduling, and the execution task logic and scheduling logic of business are completely separated. The composition of the system involves the following core concepts:

  • Task: Basic execution unit, an HTTP calling interface exposed by the executor.
  • Job: It is composed of one or more tasks with mutual logical relationship (serial/parallel), and is the smallest unit scheduled by the task scheduling center.
  • Plan: It consists of several jobs executed in sequence. Each job has its own execution cycle. The plan has no execution cycle.
  • Task Scheduler: Dispatches according to the execution cycle of each job, that is, HTTP requests are made according to the logic of plans, jobs, and tasks.
  • Task orchestration center (Config): orchestration center uses tasks to create plans and jobs.
  • Task Executor: Receive HTTP request to execute business logic.
  • Hunter:Spring project extension package is responsible for capturing Tasks in the executor and uploading to the registration center. Businesses can rely on this component for task writing.

3.4 SIA-TASK system architecture

SIA-TASK can be divided into three modules (dispatch center, orchestration center and executor) and two components (persistent storage and registration center). The functions of these three modules and two components are as follows:

  • Task scheduling center: responsible for preempting Job, task scheduling, task migration, etc. It is the core functional module of SIA-TASK.
  • Task scheduling center: responsible for logical scheduling of online tasks, providing log viewing and real-time monitoring functions.
  • Task executor: logic responsible for receiving scheduling requests and executing tasks.
  • Task Registration Center (ZK): Coordinate the workflow of Job, Task, Scheduler, etc.
  • Persistent Storage (DB): Records Job and Task data of the project and provides log storage.

SIA-TASK uses SpringBoot system as its architecture selection, and carries out secondary development based on Quartz and Zookeeper to support corresponding feature functions. The logic architecture diagram of SIA-TASK is shown in the following figure:

逻辑架构图

3.5 SIA-TASK module description

3.5.1 Task Dispatching Center

The task scheduling center is responsible for task scheduling, managing scheduling information, issuing scheduling requests according to scheduling configuration, and does not undertake service codes. The scheduling system is decoupled from the task, which improves the availability and stability of the system, while the performance of the scheduling system is no longer limited to the task module. It supports visual, simple and dynamic management of scheduling information, including task creation, update, deletion and task alarm. All the above operations will take effect in real time. At the same time, it supports monitoring of scheduling results and execution logs, and supports actuator failure recovery.

3.5.2 Task Scheduling Center

Task scheduling center is a component of distributed scheduling center that supports online task model scheduling. Relying on UI, web-side task scheduling can be carried out.

We can arrange some complex scheduling models through the above basic models, such as:

调度模型

UI layout interface for SIA-TASK:

UI编排界面

After the arrangement, check the arrangement information of task as shown in the following figure:

编排信息

At the same time, the arrangement center also provides the functions of home page statistical data viewing, scheduling monitoring, Job management, Task management and log management.

3.5.3 Task Actuator

Logic responsible for receiving scheduling requests and executing tasks. The task module focuses on operations such as task execution, making development and maintenance simpler and more efficient.

Actuators support two types:

(1) If sia-task-hunter is used to support SpringBoot and Spring projects, sia-task-hunter is introduced to capture the client. A compliant HTTP interface (called Task) task is automatically captured and uploaded to the registry.

(2) If sia-task-hunter is not used, only HTTP interface that can be called by the task is required. At this time, the business needs to be manually entered and concurrent call control of the task is controlled by itself.

3.5.4 Zookeeper

The distributed framework uses Zookeeper as the registration center.

注册中心

(1) task registration

Both the dispatch center and the execution cluster take Zookeeper as the registration center. All data are registered in the form of nodes and node contents, and the host state is kept alive on Zookeeper through regular reporting.

(2) metadata storage

The registration center not only provides registration services, but also stores information of each actuator (including actuator instance information, Task metadata uploaded by the actuator, and some temporary state data when the task is running).

(3) Event Release

Based on the Zookeeper event push mechanism, the task is released, and the balanced distribution of scheduler task preemption is ensured through a balancing algorithm.

(4) Load balancing

Ensure that the number of jobs executed by the scheduler is balanced to avoid the pressure of a single node.

3.5.5 Persistent Storage (DB)

MySQL is used here as a data persistence solution.

In addition to the Task dynamic metadata stored in the registry, other relevant metadata are stored in MySQL, including but not limited to: manually entered Task, configured Job information, scheduled Task dependency information, scheduling log, business personnel operation log, Task execution log, etc.

3.6 SIA-TASK key operation process

3.6.1 Task Release Process

任务发布流程

(1) Users can create Job through UI. You can select Job type, set alert mailbox and set Job description. Then schedule the Task for the Job you created.

(2) After the Job is created and the Task scheduling relationship is set, the task can be published, and the corresponding job can be operated (activated, executed once, stopped and deleted) through UI.

(3) The user’s Task task can be grabbed by the gripper or manually created using UI.

3.6.2 Implementation Process

执行流程

(1) After job creation is completed, you can choose to activate the trigger timing task;

(2) after the Job reaches the scheduled time, the scheduling center triggers the job, then informs the Task executor through http according to the scheduled Task scheduling logic to execute, and asynchronously monitors the task execution result;

(3) if the execution result is successful, judging whether there is a post Task, if so, continuing the next scheduling, if not, then the Job execution is finished and the call is ended; If the execution result fails, the failure recovery strategy will be triggered: stop immediately, ignore this failure, try many times, and switch to other actuators for execution.

3.6.3 Status Transfer

Job has four states in the whole life cycle, namely: NULL, READY, RUNNING and STOP. The state flow and flow conditions are shown in the following figure.

状态流转

3.7 SIA-TASK module design

The physical network topology diagram of SIA-TASK is as follows:

网络拓扑图

SIA-TASK’s Design Ideas for Inter-module Interaction;

(1) Create a Task Task through the orchestration center or capture it automatically through Hunter, and asynchronously save the Task information to DB; Create Job and activate, create JobKey in zookeeper.

(2) Dispatching Center will monitor the JobKey creation event in zookeeper, then preempt the created Job, join quartz timing task after successful preempt, and trigger the Job to run when the time arrives. The dispatch center asynchronously calls the executor service to execute the Task in the Job (there may be multiple Tasks, and the task failure policy is followed), and returns the result to the dispatch center.

(3) Change the execution status of the Job at any time on zookeeper, which can be queried through the query interface of the orchestration center.

(4) After the JOB execution is completed, wait for the next execution.

3.7.1 Design of Task Scheduling Center

The orchestration center can interact with DB and zookeeper. Its main functions can be divided into three aspects:

  • Data persistence interface service;
  • Zookeeper metadata change;
  • Data visualization: check various statistical data of the system, etc.

The monitoring display of the homepage of the arrangement center is as follows:

首页监控

3.7.2 Design of Task Dispatching Center

The dispatching center mainly interacts with DB, ZK and actuators. Its main functions can be divided into the following aspects:

  • Job execution logging
  • Job status change in ZK
  • Call the executor service to execute the Job.
  • Dispatch Center Highly Available
  • Job scheduling thread pool

3.7.3 Task Actuator Design

Actuator can interact with ZK and dispatching center. Its main functions can be divided into two aspects:

  • Accept the dispatching of the dispatching center, execute the scheduled tasks, and return the results to the dispatching center;
  • Automatically grab the Task task on the actuator and submit it to ZK.

Example of actuator Task:

@OnlineTask(description = "在线任务示例",enableSerial=true)
@RequestMapping(value = "/example", method = { RequestMethod.POST }, produces = "application/json;charset=UTF-8")
@CrossOrigin(methods = { RequestMethod.POST }, origins = "*")
@ResponseBody
public String example(@RequestBody String json) {   
    /**
     * TODO:客户端业务逻辑处理
     */
    Map<String, String> info = new HashMap<String, String>();
    info.put("status", "success");
    info.put("result", "as you need");
    return JSONHelper.toString(info);
}

This shows that the Task task is very simple to write.

3.8 SIA-TASK high availability design

In general, distributed services should consider high availability schemes. Similarly, SIA-TASK makes different dimensional enhancements for different service components in order to ensure high availability.

3.8.1 High Availability of Task Scheduling Center

SIA-TASK achieves high availability of the orchestration center through separation of front and back ends, service splitting and other measures. When one instance in the cluster fails, other instances in the cluster will not be affected, so other available orchestration centers in the cluster can be used without special operations.

3.8.2 High Availability of Task Dispatch Center

3.8.2.1 异常转移

If the service of an instance node in the dispatching center cluster goes down, all jobs on this instance node will be smoothly migrated to the available instances in the cluster, which will not cause the execution of scheduled tasks to be missing. At the same time, when the crashed instances are successfully repaired and re-accessed to the cluster, jobs will continue to be preempted to provide services.

3.8.2.2 配置线程池

Scheduling is implemented in a thread pool mode to avoid task scheduling delay caused by single thread blocking. The default value for the number of threads in the process pool is 10. When executing tasks will concurrently execute multiple time-consuming tasks, the size of the thread pool should be selected according to business characteristics.

org.quartz.threadPool.class = org.quartz.simpl.SimpleThreadPool org.quartz.threadPool.threadCount = 60
org.quartz.threadPool.threadPriority = 5
org.quartz.threadPool.threadsInheritContextClassLoaderOfInitializingThread = true

SIA-TASK uses the threadPool again according to the thread pool provided by quartz itself. Redefining the thread pool and allocating a unique thread pool for each Job. The size of the thread pool can be dynamically scaled according to the number of tasks arranged by the Job itself, so as to ensure that the scheduling threads of each Job are completely independent, and thread resources will not be exhausted due to the sharp increase in the number of arranged tasks. At the same time, it provides the thread pool resources recycling logic to recycle the allocated thread pool resources when the Job is permanently terminated.

public static ExecutorService getExecutorService(String JobKey) {

    ExecutorService exec = executorPool.get(JobKey);
     if (exec == null) {
        LOGGER.info(Constants.LOG_PREFIX + "Initialize thread pool for running Jobs,Job is {}",JobKey);
      exec = Executors.newCachedThreadPool();
      executorPool.putIfAbsent(JobKey, exec);
      exec = executorPool.get(JobKey);
  }
    return exec;
}
3.8.2.3 全日志跟踪

SIA-TASK comprehensively tracks the entire scheduling life cycle of a Job and uses AOP to enhance logs. The scheduling center will log every time a Job is triggered. At the same time, the Task execution scheduled for the Job will also be recorded in the task log.

The logs are divided into Job logs and Task logs:

  • Job Log: Contains scheduler information, scheduling time, scheduling status, and other additional attributes.
  • Task Log: Contains actuator information, execution time, execution status, return information, and other additional attributes.
3.8.2.4 异步封装
  • SIA-TASK has considered the loss of concurrent thread resources in the dispatching center when tasks are called remotely from the beginning. For Job-e ncapsulated Task remote scheduling, asynchronous call is adopted, and the time consumption of each task request logic is very light. Http requests seen only once.
  • Task is supported to set user-defined timeout, and two modes of timeout are supported: connecttimeout and readtimeout. Users can set timeout according to the specific execution cycle of the service.
public interface RestTemplate {

/**
 * 异步Post方法 * @param request
 * @param responseType
 * @param uriVariables
 * @param <T>
 * @return
 */
 <T> ListenableFuture<ResponseEntity<T>> postAsyncForEntity(Request request, Class<T> responseType, Object... uriVariables); }
3.8.2.5 自定义调度器资源池

调度器资源池

SIA-TASK designed the scheduling resource pool from the perspective of physical resources. For some special circumstances, we pooled the scheduler. The scheduler can change the state through different operations, thus changing the capability.

  • Job Scheduler Resource Pool: manages scheduler resources that have the ability to acquire tasks and can actually acquire tasks.
  • Downline Scheduler Resource Pool: manages scheduler resources that have the ability to acquire tasks but are not actually allowed to acquire.
  • Offline scheduler resource pool: manages the scheduler resources that have been down in the offline scheduler resource pool.

3.8.3 High Availability of Task Actuators

  • Considering the instability of the network, SIA-Task has also made a very important design for the instability of the network. The test support for node connectivity and the premonition for the health of Task running instance nodes ensure that the health of Task instance nodes is sensed in advance and the scheduling Task is highly available.
  • At the same time, it also ensures that the executor instance can resume and retry after losing the link due to network problems. SIA-Task redesigned zookeeper’s reconne ction mechanism to ensure that Task running instance nodes can resume and retry until they are merged into the schedule of normal receiving tasks in the execution pool after resuming normal.
  • In general, actuators are also deployed in clusters. As the execution unit of Task, if the execution fails on one machine in the actuator cluster, the dispatching center will perform failover according to the failure strategy. Two failover strategies are provided here: polling failover and maxim um compensation failover. Polling is transferred to polling the list of available actuators. If one actuator executes successfully, Task executes successfully. If all actuators fail, Task executes unsuccessfully. The maximum compensation transfer is to execute it several times in this executor first. If the execution is successful , the transfer will not take place. If the execution is still unsuccessful, the polling transfer policy will be executed.

IV. Summary

So far, we have made a brief introduction to the microservice task scheduling platform SIA-TASK, including the design background, architecture design and product component functions and features. The microservice task scheduling platform SIA-TASK basically solves the current business requirements and provides simple and efficient scheduling services. SIA-TASK will continue to iterate and provide more perfect services. Relevant technical documents and usage documents will also be provided afterwards.

Link guide

Open source address:https://github.com/siaorg/sia-task

Expand reading:Yixin Open Source Microservice Task Scheduling Platform (SIA-TASK)

Author: Mao zhengwei/Allen lee/Liang Xin

The original starting point: SpringCloud community