UAVStack document data collection

  Big data

This week we will introduce the file data collection function in UAVStack. This function can effectively solve the problems of time-consuming log search and log file loss, help to quickly locate and solve the problems, and avoid the security risks that may be brought by log access rights.

Preface

Under the distributed microservice architecture, a single application often contains many instances. In our daily work, we often need to check the logs generated during the running of the application. However, it takes time and effort for the login server to find the target log from a large number of instances. Sometimes log files are lost during application restart.

The file data collection function of UAVStack can not only centralize logs of all businesses, but also facilitate engineers to find relevant logs more easily, thus quickly locating and solving problems. It can also save the server permission granting link and effectively avoid the possible security risks brought by log access permission.

Architecture

The collected file data includes application log, call chain trace, browser trace and thread analysis data. You can also configure any file path and any other file in the startup parameters. It also supports the collection and control of UAVStack’s own log files. The overall structure is as follows:

图片描述

● Source: the data file after the disk is dropped, including application log file, call chain data file, browser data file and thread analysis data file.

● Log collector: the file data collection terminal is responsible for data reading, filtering and uploading.

● Channel: data consumption queue, and the message queue used by UAVStack is RocketMQ.

● Sink: Pull file data from Channel and distribute it elsewhere. UAVStack is distributed to the database ElasticSearch by default, which is used to store the file data collected and sent.

Acquisition module

The file data collection of UAVStack is the Feature of MonitorAgent. MonitorAgent is deployed in the form of a daemon process. If it dies, it will restart immediately. After restarting, it can continue to collect file data. The main modules are as follows:

图片描述

● Collection Task Controller: controls the distribution of collection tasks, including running tasks and existing tasks; Regularly poll the collection task to build an executable collection task.

● Collection Task Scheduler: Regularly schedules collection tasks.

● Task: Perform file data collection task.

● Data reader: reads and filters file data.

● Data Publisher: publishes the read file data.

Collection process

图片描述

● Start collection: the file data collection module automatically finds and processes the portrait information of the application log to determine whether the portrait information of the application log is updated; Users can control the collection functions of application log, call chain tracking, browser tracking and thread analysis. AppHub supports the data collection function of start-stop files and can dynamically select files to read.

● Collection Task Distribution: When the user triggers collection, the AppHub will open the file data collection task and automatically distribute it to generate a data collection task, which will be written into the task list and persisted into the local file task.cache.

● Collection Task Control: Scheduled tasks poll the management task list, respectively construct executable collection tasks and submit them through ForkJoin.

● File data reading: read every task submitted by ForkJoin. The log files that need to be processed are all processed through specific classes. These classes include the RandomAccessFile class, where the seek () method randomly accesses the file and the read () method reads the file data. In addition, the data location timing update task can update the location of the read data to the local file position.cache for the next file read.

● File data filtering: according to the configured log policy, the read logs are matched and filtered by regular expressions to complete the trade-off of log data.

● File Data Publishing: The collected log data can be published to the designated destination according to different publishing types and different publishing implementation logic. UAVStack is published to RocketMQ by default and supports custom publishing.

In addition, the HM service pulls the log data of the topic specified in RocketMQ, processes it into the corresponding format, and stores it in the ElasticSearch database.

File data display

AppHub allows you to view the collected application log, call chain trace, browser trace and thread analysis file data. AppHub calls the http query interface of the background service, reads the database through HM log data service, and queries and displays the file data, as shown in the following figure:

图片描述

The display effect of file data is as follows:

图片描述

● Support to view collection data of different application clusters, application instances and log files.

● Support to view file data at different times by day, hour or minute.

● File data can be viewed in ascending or descending order.

● Keyword search is supported. Multiple keywords are separated by spaces to indicate “OR” connection, i.e. any keyword can be matched. Multiple keywords are separated by “+”,which means “connected with”, i.e. log information with all keywords matching; If both spaces and “+”are used, “or” connection takes precedence; If “*” is added at the beginning and end of the keyword, it indicates fuzzy matching and shows all matching results.

● Click on single-line log to support scrolling forward and backward to view log information, as shown in the following figure:

图片描述

File data association

The collected application log, call chain trace and browser trace data can be correlated with each other. Browser trace data can be associated with call chain trace data, and call chain trace data can be associated with application log data, as shown in the following figure:

图片描述

File data alert

The collected file data supports early warning, and an early warning policy can be configured according to whether keywords appear in the file data or the number of times keywords appear. After the alert policy is triggered, users can be notified via email, SMS, WeChat and other means, and third-party application systems can also be notified via Http.

Summary

The file data collection function of UAVStack has been widely used and is a distributed service with high availability and reliability. If the amount of logs that need to be collected is large, only the HM service of UAVStack itself needs to be added, with strong scalability.

Official website:https://uavorg.github.io/main/

Open source address:https://github.com/uavorg

UAVStack has opened source code on Github and provides bilingual documents such as installation and deployment, architecture description and user guide. Welcome to visit-give star-pull ~ ~ ~

By Duan DehuaYixin Institute of Technology