Open Source | Performance Optimization Tool: Selection and Practice of Database Auditing Platform Themis

  Database, Open source software

Author: feng han

Source: DBAplus community share

Themis open source address:https://github.com/CreditEaseDBA

Expand reading:Rules Analysis and Deployment Strategy of Yixin Open Source | Database Auditing Software Themis

[Technology Salon 002] Data Center: Construction Practice of Yixin Agile Data Center | Yixin Technology Salon will be broadcast live online at 8: 00 pm on May 23. Click to sign up

I. Challenges Faced

1. Scale and type of operation and maintenance

I believe that this is also a problem that many companies and many DBA are facing or will face in the future. It is the existing problems that prompt us to consider introducing a database auditing platform.

The first is the contradiction between operation scale and human resources.Judging from our situation, we have operated and maintained four types of databases, including Oracle, MySQL, MongoDB and Redis, with dozens of databases, supporting more than 1,000 developers and hundreds of business systems. Some friends may ask, from the perspective of operation scale, it is not very large.

Indeed, compared with many Internet companies, the estimation of dozens of databases is not too large. However, unlike Internet companies, financial companies such as Yixin rely more on databases. A large number of applications are database-based, and the complexity of their use is far more complex than that of Internet. In addition to daily operation and maintenance (we are also improving the operation and maintenance efficiency through the self-research platform), DBA also needs a lot of energy to deal with database design, development and optimization. When faced with a large number of development teams that need services, this contradiction becomes even more prominent.

2. Cases

Structure design

The second challenge is the uneven quality of database design and development.The above figure shows a structural design problem. Of the core tables of a core system, 28% of the SQL run by this system are related to this object. When we analyzed its structure, we found many problems:

  • The scale of the table is very large. From the beginning of the design, the split logic (e.g. split library, split table, partition design) was not considered, and there was no necessary database cleaning and archiving strategy.
  • There are more than 100 fields in the table. The number of fields is large and the usage characteristics of different fields are inconsistent. The necessary table disassembly design is not considered.
  • The table has 13 indexes, which is too many. The excessive index of the table will inevitably affect its DML efficiency.
  • There is also an index that has never been used during continuous monitoring. Obviously this is a “redundant” index.
  • There are also duplicate indexes in two fields, which also shows that the index was relatively random at the beginning.
  • The definition length of a single record is more than 5800 bytes, but the actual average storage length is less than 400 bytes and the maximum length is not long.
  • Analysis of its field contents also found that there are 3 field type definition exceptions. That is, the data is not saved with the proper type, for example, the date is saved with the numeric type.

To sum up, there are still many problems in the design of this table, and this table is very important, and a large number of statement accesses are related to it.

SQL statement

The above figure shows a problem of statement running efficiency. Literally, the two tables do association query, but no association condition is specified when specifying the condition. As can be seen from the following execution plan, the database runs in a Cartesian product manner. From the following costs and estimated time, we can see what a “huge” SQL it is. The impact of its online operation can be imagined.

Some people may say that this is a human error, which generally does not happen. But what I want to say is, first, human error cannot be avoided and no one can guarantee the running quality of SQL. Second, developers have different understanding of the database, which makes it difficult to ensure that the SQL they write is efficient. Third, developers are faced with a large number of business needs and often deal with the rush situation. It is difficult to have more energy on optimization. Because of these problems, online statement execution quality has become one of the challenges DBA often faces.

3. Shift of Center of Gravity

This is a very classic diagram, which describes the division of functions related to database work. As a DBA, in addition to the above challenges, from the perspective of the development stage of database work and its own development needs, it also faces a shift of focus: the original traditional DBA’s operation and maintenance functions are gradually weakened, a large number of tools and platforms emerge and the database’s self-operation and maintenance ability is improved, simplifying the DBA’s work; The following database architecture, structural design and SQL quality optimization have gradually become the focus. At the top level, data governance and modeling are also getting more and more attention from some companies. Thus, the center of DBA’s future work is gradually moving up. For the logical structure of intermediate data, some tools and platforms are also needed to better support DBA’s work.

In addition to the above, our company still has several imbalances.

  • From the daily work of DBA, traditional operation and maintenance work still accounts for a large proportion, while architecture optimization classes are relatively few. The introduction of this platform can help DBA to carry out architecture and optimization work more conveniently.
  • The company uses more commercial products and uses less open source. From the company’s long-term strategy, open source products will be used more and more. From a functional point of view, commercial products have advantages over open source products. Software development based on open source products requires higher technical skills of developers themselves. It is hoped that this transformation process can be completed more easily by introducing this product.
  • Before there was no platform, DBA still designed and optimized the database manually, which was very inefficient. Especially in the face of many product lines and many development teams, they often feel overwhelmed.

The company’s own team personnel are mainly junior high school level, with relatively few senior and middle-level personnel. How to quickly improve the overall design and optimization capabilities and ensure a unified optimization effect has become a problem in front of us.

It is because of the above imbalance that we consider introducing tools and platforms to solve the quality problems of databases.

When I first came to the company, I saw that these problems of the company were also considered to be solved through systems and norms. From the very beginning, we started to formulate a lot of regulations, and then went to various departments to train and publicize them. After running this way for a period of time, some problems emerged:

  • The overall effect improvement is not obvious. The effect of implementation depends on the importance attached by each department and the personal ability of employees.
  • The effect of standard landing cannot be measured and it is difficult to make quantitative analysis. Often only through the online operation results to intuitive perception.
  • Lack of long-term effective tracking mechanism. It is impossible to track the running quality of a specific system for a long time.
  • From the DBA’s point of view, facing a large number of systems, it is difficult to examine their structural design and SQL operation quality in detail according to each specification.

Faced with these challenges and various existing problems, how should they be solved?

After discussion, everyone finally agreed that the introduction of database auditing platform can help solve the above-mentioned problems.

Second, the platform selection

1. Industry practices

At the beginning of the project, I investigated how other enterprises in the industry audited their databases, which can be roughly divided into three ideas:

The first category is Internet companies represented by BAT. Through the self-developed SQL engine, they can realize cost analysis, automatic audit, access diversion, flow restriction, etc., and can realize prior audit and automatic audit. However, the technical difficulty is greater and the company’s existing technical capacity is obviously insufficient.

The second type is to collect DB operation through self-research tools, audit according to pre-defined rules, and complete the entire audit process by combining manual operation. This kind of plan can only be audited afterwards, but the technical difficulty is small and the flexibility is great. Its core is the formulation of rule sets, which can be flexibly expanded according to the situation.

The third category is some commercial products. The realization idea is similar to that of the second category. However, with some independent analysis capabilities, the function is more powerful. However, manual intervention is still required and no small capital investment is required. Moreover, when examining several commercial products, none of them can fully meet the required functions.

Based on the above practices, we finally decided to adopt the method of “tools+manual review” and develop our own review platform.

2. Our Choice-Self-study

At the beginning of launching the research and development platform, we reached some consensus within the team.

  • DBA needs to reverse the traditional idea of operation and maintenance. Everyone participates in the platform development process.
  • In the past, we accumulated some contents (such as the specifications formulated in the early stage) that can be precipitated as knowledge base and standardized, which paved the way for the formulation of the rules in the later stage.
  • In the promotion of the platform, start with the simplest part, and implement the developed part on line to observe the effect. According to the implementation effect, the following work will be revised continuously.
  • According to our own characteristics, customize the target. For some more complicated parts, we can definitely postpone or even give up.
  • Referring to the design ideas of other companies or commercial products, boldly introduce them.

Three, the audit platform practice

Let’s take a look at the basic functions, implementation principles and methods of the audit platform. This part is the focus of this sharing.

1. Platform Positioning

At the beginning of the project, we described the positioning of the platform:

  • The core capability of the platform is to quickly discover database design and SQL quality problems.
  • The platform will only conduct post-work reviews, and the independent optimization will be implemented in Phase II. Of course, the introduction of this in the design phase of the project can also play a part of the pre-audit function.
  • The main users are DBA and R&D personnel with certain database foundation.
  • It can be audited for a certain user, including data structure, SQL text, SQL execution characteristics, SQL execution plan and other dimensions.
  • The audit results are provided in the form of Web pages or export files.
  • The platform needs to support the mainstream Oracle and MySQL of the company, and other databases will be implemented in the second phase.
  • As far as possible to provide flexible customization ability, easy to expand the function in the future.

2. Platform Users

As the two main users of the platform, both researchers and DBA can benefit from the platform.

  • For research and development personnel, this platform can be used only to locate problems and modify them in time. In addition, by mastering the rules, they can also guide their design and development work.
  • For DBA, it can quickly grasp the overall situation of multiple systems, batch screen out inefficient SQL, and quickly diagnose general problems through the information provided by the platform.

3. Implementation Principle

The basic implementation principle of the whole platform is very simple, that is, to filter our audit objects (currently supporting four kinds) through the rule set. Those who meet the rules are all suspected to have problems. The platform will provide these problems and related information for manual screening. Thus, whether the platform is powerful or not depends on the abundance of rule sets. The platform also provides some expansion capabilities to facilitate the expansion of rule sets.

4. Platform Design

Audit object

Before introducing the platform implementation, let’s get familiar with the concept of “audit object”. At present, we support four types of objects, respectively.

  • Object level. The objects mentioned here refer to database objects, common tables, partitions, indexes, views, triggers, etc. Typical rules, such as large tables not partitioned, etc.
  • Statement level. The statement level here actually refers to the SQL statement text itself. Typical rules, such as multi-table association.
  • Implementation plan level. This refers to the execution plan of SQL in the database. Typical rules, such as large table full table scanning.
  • Execute feature level. This refers to the actual execution of statements on the database. Typical rules, such as the ratio of scanned blocks to returned records, are too low.

It should be noted that among the four types of audit objects, the latter three will not be captured until the system is online. The first one can be run only with data structure (individual rules also need data).

In addition, the above rules are all related to specific databases, except the second type is general rules. That is, each kind of database has its own different rules.

Architecture diagram

What is drawn here is a schematic diagram of the system architecture. I will explain it briefly.

The block in the figure is the main module of the platform. Modules with different background colors indicate that the current progress status is different. Dashed lines represent data flow and solid lines represent control flow. Its core is these modules:

  • Data acquisition module. It is responsible for capturing the basic data needed for auditing from the data source. Currently, capturing from Oracle and MySQL is supported.
  • OBJ/SQL repository. This is the common storage part of the system. The collected data, the intermediate data and the result data during the processing are all stored here. Its core data is divided into object classes and SQL classes. Physics is MongoDB.
  • Core management module. The dashed line on the right in the figure contains two modules: SQL management and OBJ management. It is mainly to complete the whole life cycle management of objects. At present, only simple object filtering function has been done, so it is still a white background, and the core function has not been completed.
  • Audit rules and audit engine module. This part is the core component of the first phase of the platform. The audit rule module is used to define and configure rules. The audit engine module is the audit execution part that completes specific rules.
  • Optimization rules and optimization engine module. This part is the core component of the second phase of the platform. At present, it has not been developed, so it has a white background.
  • System management module. This part is to complete the basic functions of the platform, such as task scheduling, space management, audit report generation, export and other functions.

Flow chart

Let’s look at the overall processing of the platform from the perspective of processing flow.

1) The “Rule Management” section, which mainly performs the following functions.

  • Initialize rules. Many rules are built into the platform itself and imported into the configuration library during this process.
  • Add new rules. The platform itself provides certain expansion capability and can add a new rule according to the specification.
  • Modify the rules. Rules can be turned on or off according to their own conditions. For each rule, some parameters are also built in and can be modified here. In addition, according to the violation of the rules, you can also set the deduction method (for example, the deduction of a few points at a time, the maximum deduction of a few points), etc.
  • The rules themselves, relevant parameters and configuration information will be stored in the configuration library.

2) “Task Management” part, which is a part of background management and mainly completes tasks related to tasks. Most interactions in the system are completed asynchronously through jobs. The background is realized through celery+flower.

3) The “data acquisition” part, which is completed by starting the acquisition job at regular intervals through task scheduling, and a small number of parts are completed by querying online databases in real time. The collected results are stored in the database for subsequent analysis.

4) “Rule Resolution” section, which is triggered by the user through the interface, and the task scheduling module will start a background asynchronous task to complete the resolution. The reason why the audit is designed to be completed asynchronously is mainly due to the fact that the audit work may take a long time (especially, there are many types of audit, many audit objects and many open audit rules). The audit results will be saved in the database.

5) In the “Task View and Export” section, after the user initiates the approval task, the user can view the progress (whether it is under approval or completed). When the audit is completed, you can select the audit task, browse the audit results or select export. If export is selected, an asynchronous background job generation file will be generated and placed on the download server.

The above is the general process of the whole audit. Details of each part will be seen later.

Module partition

To sum up, the platform is mainly composed of the above four modules: data collection, rule analysis, system management and result display. The implementation of different modules will be described in detail later.

5. Data Acquisition

Capture content

Let’s look at the data acquisition module first. It can be seen from the table that the two types of databases have different collection contents.

Oracle provides abundant information, which can be collected basically. MySQL can collect relatively little information.

The “checkmark+asterisk” in the table indicates that the non-scheduled job is completed but retrieved in real time. Let’s briefly talk about the content of each part.

  • At the object level, the statistical information, storage characteristics, structure information and access characteristics of the object are collected.
  • SQL level, collected SQL text, execution plan, cache cursor, binding variables, execution characteristics, etc.

This information will be used as the basis for subsequent audits.

Acquisition principle

The following is a brief introduction to the principle of lower acquisition:

  • The Oracle part is AWR data collected through scheduled jobs and then dumped into a set of MongoDB. This is different from some similar products. Inst ead of directly collecting the data in memory, it is taken from offline data. The aim is to minimize the impact on online operation. Oracle provides rich functions. Through accessing AWR and data dictionary, all data can be basically obtained.
  • MySQL, the situation is more complicated, because its function is not so rich. Multiple types of data are obtained from different sources. The SQL text class and the execution feature class are analyzed by pt tool. The slow query log is periodically entered into the Anemometer platform library, and then passed into MongoDB from this library. Other types of information (including data dictionary class, execution plan class, etc.) are queried through real-time database return when needed. In order to prevent the master library from being affected, it is usually obtained by routing to the slave library for execution.

6. Rule Analysis

Summary description

The following describes the most core part of the whole system-the rule analysis module. Its function is to define rules, examine collected data, and screen out data that violate rules. The screened data are scored and recorded for subsequent generation of audit reports. At the same time, additional information will be recorded to assist in some judgment work.

Here is a core concept-“rules”. The definition of a built-in rule can be seen later, and everyone will be clearer. Judging from the classification, it can be roughly divided into the following categories.

  • From the perspective of database type, rules can be divided into Oracle and MySQL. Not all rules distinguish between databases, and rules for text classes do not.
  • According to complexity, rules can be divided into simple rules and complex rules. The simplicity and complexity mentioned here actually refer to the implementation part of r ule review. Simple rules are a set of query statements that can be described as MongoDB or relational databases. However, complex rules need to be implemented externally through the program body.
  • From the perspective of auditing objects, rules can be divided into object classes, text classes, execution plan classes and execution feature classes. The following will explain each type of audit object separately.

Rule definition

This is the declaration object of a rule body. I will explain the meaning of each field, so that everyone can have a clear understanding of the rule.

  • Db_type: Database category of rules, supporting Oracle and MySQL.
  • Input_parms: input parameters. The rule is that multiple output parameters can be defined. This is a list of parameters, and each parameter itself is a dictionary class that describes various information of parameters.
  • Output_parms: output parameters. Similar to the above input parameters, it is also a list of dictionary objects. The structure of returning information according to rules is described.
  • Rule_complexity: Is the rule a complex rule or a simple rule? If it is a simple rule, the content of rule_cmd is directly taken as the implementation of rule auditing. If it is a complex rule, the rule implementation is obtained from the externally defined rule_name command script.
  • Rule_cmd: the implementation part of the rule. The rule may be mongodb’s query statement or a regular expression, depending on the rule_type.
  • Rule_desc: rule description, for display only.
  • Rule_name: name of the rule. Is the unique identifier of the rule and globally unique.
  • Rule_status: rule status, ON or OFF. For a closed rule, it will be ignored during the audit.
  • Rule_summary: a field to be discarded, with the same meaning as rule_desc.
  • Rule_text: rule type, divided into four categories: object, text, execution plan and execution characteristics. The example in the figure identifies a rule of text type, and rule_cmd is a regular expression.
  • Solution: Optimization Suggestions Triggering this Rule.
  • Weight: the weight, that is, the deduction system for a single violation of rules.
  • Max_score: the upper limit of deduction points. This parameter is set in order to avoid violating a rule and causing too much influence.

Rule Definition (Object Level)

First, let’s look at the first type of rules-object rules. This is a set of rules set for database objects. The table above shows some examples. Common objects, such as tables, partitions, indexes, fields, functions, stored procedures, triggers, constraints, sequences, etc., are audited objects. Take the table as an example, many rules are built in.

For example, the first “too many large tables”. Indicates that the number of large tables in a database exceeds the rule definition threshold. The large table here is determined by inputting parameters through rules, including the number of table records and the physical size of the table. The overall description of this rule is “if the number of tables in the database that exceed the specified size or the specified number of records exceeds the specified threshold, an audit rule will be triggered.” The rules for other objects are similar.

Rule Implementation (Object Level)

The implementation of object rules is relatively simple. Except for a few rules, it is basically to query the data dictionary information and then judge according to the rule definition. The above example is a rule implementation for indexing, querying data dictionary information.

Rule Definition (Execution Plan Level)

The second type of rule is the rule that implements the plan class, which is also divided into several categories. For example, access path class, inter-table association class, type conversion class, binding variable class, etc.

Take the most common access path class as an example to illustrate. For example, the most common rule is “large table scanning”. It means that during the execution of SQL statements, access to large tables is performed, and the access path is in the form of full table scanning. The input parameters of this rule include the definition of a large table (physical size or number of records); The output part includes the table name, table size and additional information (including the entire execution plan, statistical information of designated large tables, etc.).

The data sources targeted by such rules are retrieved from online databases. The Oracle part is extracted directly from AWR by time period, and the MySQL part is retrieved from the database by using explain command.

Information storage format

In particular, MongoDB is used to save the execution plan. The purpose is to make use of its schemaless feature to facilitate compatibility with the differences in execution plans of different databases and versions. Can be stored in a set, and subsequent rule auditing is also implemented by using the query statements in mongo. This is also the original intention of introducing mongo initially, and other information will be put into storage later. At present, the entire audit platform is in MongoDB except MySQL for the part accessed by pt tool. In addition, MySQL library can directly output json format execution plan, which is convenient to put into storage. Oracle also forms json format for warehousing.

Rule Implementation (Execution Plan)

On the left is an Oracle execution plan saved in MongoDB. In fact, it is to insert sqlplan dictionary data into mongo. On the right is a sample rule implementation, which is a query statement based on mongo. We will see a detailed example later.

7. Platform Implementation

Rule implementation

Here, we will take the “large table full table scanning” rule as an example to explain. The above is the execution plan saved in the data dictionary in Oracle, and the following is stored in Mongo. Visible, is completely copied down.

Based on this structure, how to implement rule filtering? In fact, it is realized through the find statement in mongo. The following specific interpretation of the execution steps of this statement.

  • The top find () section is used to filter execution plans. The execution plan meeting the specified user, time range and access path (“Table Access”+”Full”) is filtered out.
  • The filtered part will be associated with the object data and the part that meets the “big table” condition will be filtered out. Large table rule is that the number of records is greater than the specified parameter or the physical size is greater than the specified parameter.
  • As a result, sql_id, plan_hash_value, object_name information about the storage period will be returned. These three pieces of information will be used to extract SQL statement informatio n, execution plan information and related object information respectively.
  • All the result sets obtained will be deducted according to the previously set deduction principle.
  • The extracted three parts of information+deduction information will be returned as results and displayed at the front end.

Rule Implementation (Execution Plan)

This part is an example of implementing hierarchical result storage in MySQL.

The first figure shows the original execution plan.

The second diagram is a summary of the code implementation.

The third picture is what it really looks like in the library. The core part is the generation of item_level.

Rule Definition (Text Level)

The third type of rule is the rule of text class, which is a kind of rule that has nothing to do with the type of database and describes the text characteristics of SQL statements. In terms of implementation, it is processed by text regular matching or program. Its main purpose is to standardize developers’ SQL writing and avoid complicated, poor performance and nonstandard SQL writing.

Rule Implementation (Text Level)

This section describes the implementation of text rules. The first example bad_join is a simple rule implemented by regular text matching. The second example, sub_query, judges subqueries (or multi-level subqueries) through the nesting of program judgment brackets.

Rule Definition (Execute Feature Level)

The last category of rules is the implementation of feature classes. This part is closely related to the database and filters out statements that meet certain execution characteristics. These statements are not necessarily inefficient, but may only be the focus of future optimization, or some statements with the highest optimization benefits. This is mainly about the consumption of resources.

8. System Management

Rule management

Later, through some interface displays, the functions of the lower platform are introduced.

The first part is the rule management part of the system management module. In this section, you can add your own rules. Its core is the rule implementation part, which defines the rule implementation body in the form of SQL statement, Mongo query statement and custom Python file. Custom rules are based on existing captured data sources, and definers need to be familiar with existing data structures and meanings. Currently, custom capture data sources are not supported.

For defined rules, you can modify the rules here. It is mainly to configure the rule status, threshold value, deduction items, etc.

Task management

After the rules are configured, you can complete the task publishing here.

The above is the interface for publishing rules and tasks. After selecting data sources (ip, port, schema), select the approval type and date. At present, the timing policy for approving data sources is still based on days, so the day cannot be selected.

After the task is released, the execution status can be observed in the task result viewing interface. Depending on the type of audit, the number of data source objects, the number of statements, etc., the duration of the audit varies, usually within 5 minutes. When the status of the audit job is “successful”, it means the audit job is completed and the audit results can be viewed or exported.

9. Presentation of Results

Overview of Object Audit Results

The above figure is an example of an object audit report. At the beginning of the report is an overview page. It shows all kinds of rules and deduction in the audit report. And through a pie chart to show its proportion. This facilitates us to concentrate on the core issues first.

At the top, you can also see a display of total rule scores. This is a score obtained by converting the rule points according to the percentile system. The higher the score, the less violations it represents, and the higher the quality of the audited object. The introduction of “total score of rules” was controversial at the beginning of the design. I was worried that having this indicator would hit the enthusiasm of developers and be detrimental to the promotion and use of the platform. Here are some points to explain.

  • The total score of rules is introduced for the design, development and operation quality of the databank. In the past, in many optimizations, it was difficult to quantify the effects before and after optimization. Here is a way to make a comparison. Perhaps this method is not too scientifi c, but after all, it provides a quantifiable means.
  • Different business systems are quite different, so it is not necessary to make horizontal comparison. A system 60 points, B system 50 points, does not mean that A’s quality is higher than B’s.
  • A single system can make more longitudinal comparisons, i.e. comparing the total score of rules before and after modification and optimization. It can reflect the change of system quality to some extent.
  • The total score of rules has a lot to do with the allocation of rules. If the rule is closed or the threshold for violation of the rule is lowered, the score will be increased. This should be determined according to the situation of the system itself. The threshold value of the same rule can be different for different systems. For example, in the application of data warehouse, full scanning of large tables is a relatively normal behavior, and this rule can be considered to be closed or the threshold of single violation and the upper limit of total deduction can be reduced.

Details of Object Audit Results

This part is the detailed part of object approval. For details of each rule, you can further view the object information in the link on the left. Due to space constraints, there is no exhibition.

Overview of Audit Results of Implementation Plan

The overview of this part of the implementation plan is similar to that of the target. It is also the deduction for each rule.

Details of Audit Results of Implementation Plan

This part is the detailed part of the implementation plan.

When expanded, you can see details of violations of each rule. The above figure is the detailed part of the violation of the rules for full table scanning.

Above are some common solutions. The situations and solutions that may trigger such rules are described here. It is equivalent to a small knowledge base, which is convenient for developers to optimize. Later in the second phase of the platform, a more precise optimization engine will be made, and this part will be expanded.

The following is the situation of each statement violated. We can see the statement text, execution plan, associated information (such as the name of the large table of this rule), etc. You can also further click on the statement to expand the information.

This part is information for each SQL, including statement text, execution plan, execution characteristics, statistical information of associated objects, etc. DBA can do some preliminary optimization judgment work from this information.

In addition, the platform also provides export function. Can be exported as an excel file for users to download and view. This is what it shows.

10. The Pit We Met

In the actual development process, encountered many problems. We briefly introduce two here, for example:

MySQL Exposes Problems in Parsing json Format Execution Plans …

[Session Enters sleep State, Fake Death]

Solution: Set wait_timtout=3 before executing the session, and this time will be adjusted according to the actual situation.

[Too much data, no results for a long time]

The session is in query state, but the amount of data is very large or because the database does not support format=json very well and cannot be parsed for a long time, it will affect other sessions.

Solution: Use the pt-kill tool to kill the session. In order to prevent accidental killing, you can mark “eXplAin format=json” and then use pt-kill to identify the eXplAin keyword.

11. Promotion Process

Since the platform was put into operation in Yixin Company, it has provided audit reports for many systems, greatly accelerated the speed of database structure and SQL optimization, and relieved the daily work pressure of DBA. During the implementation of the work, we also explored a set of implementation methods. After the platform has been opened up, if friends use it, please refer to the implementation.

Information gathering phase

Collect the operation of the company’s database system in a large amount and master first-hand information. Quickly understand the quality of each business system and do a good job in pilot selection.

Manual analysis phase

Key systems, manual intervention analysis. According to the core problems exposed in the rule review, “point to area” and give targeted analysis and optimization reports.

Exchange training phase

Take the initiative to visit and communicate with the development team about the report. Taking the opportunity of analyzing the report, it can provide necessary training to the development team, and combine with the cases around them, which is more persuasive.

Feedback improvement phase

Implement the results of exchanges and urge them to improve. Improve quality through regular feedback from the audit platform. A team with a certain foundation can develop a platform for developers to use. The SQL quality problem is no longer just a DBA problem, but is related to everyone in the project.

Source of content:Yixin Institute of Technology