Optimizing SQL is one of the common tasks of DBA. How to optimize a statement efficiently and quickly is a problem that every DBA often faces. In the daily optimization work, I found many operations are essential steps in the optimization process. However, repeated execution of these steps will cost DBA a a lot of energy. So the idea of writing their own small tools to improve the efficiency of optimization emerged.
What language do you choose to develop tools?
It is very necessary for a DBA to master a language and cooperate with his work. Compared with shell’s simplicity and perl’s elegance, Python is a rigorous high-level language. It has many advantages, such as fast loading, simple grammar, rich expansion and cross-platform. Many people call it a “glue” language, through a large number of rich class libraries, modules, you can quickly build the tools you need.
As a result, this little tool became my first assignment to learn Python, and I called it “MySQL Statement Optimization Assistant”. And since then, I have fallen deeply in love with Python and developed many database-related gadgets. I will have the opportunity to introduce them to you later.
I. Optimization Methods and Steps
Before introducing the use of the tool, the following first explains the common methods, methods and problems needing attention for statement optimization in MySQL. This is also what everyone needs to know and master in daily manual optimization.
1. Execution Plan-Explain Command
Execution plan is the main entry point for statement optimization. The execution process of the statement can be understood through the interpretation of the execution plan. MySQL is obviously different from Oracle in execution plan generation. MySQL does not cache execution plans and performs “hard parsing” every time. The way to view the execution plan is to use the EXPLAIN command.
1) Basic usage
When the keyword EXPLAIN is used before a Select statement, MySQL explains how the Select statement will be run. It shows information such as how tables are joined and the order of joining.
EXPLAIN EXTENDED QUERY
When using the EXTENDED keyword, EXPLAIN generates additional information, which can be browsed by SHOW WARNINGS. This information shows how the optimizer defines the table and column names in the SELECT statement, rewrites and executes the optimization rules, and may also include other comments on the optimization process. It can be used in MySQL5.0 and later versions. In MySQL5.1, it has added an additional filtered column.
EXPLAIN PARTITIONS QUERY
What is displayed is the data fragment to be accessed by the query-if there are fragments. It can only be used in MySQL5.1 and later.
EXPLAIN FORMAT=JSON (5.6 new feature)
Another format displays the execution plan. You can see information such as the association between tables.
2) Output field
The following explains the meaning of the field of EXPLAIN output and learns how to judge an execution plan.
The serial number of the query in MySQL’s selected execution plan. If there are no subqueries in the statement, then there is only one SELECT in the entire output, thus each row will display a 1 on this column. If subqueries, set operations, temporary tables and so on are used in statements, it will bring great complexity to ID columns. In the example above, the WHERE section uses a subquery, and the row with id=2 represents an associated subquery.
The type of query used by the statement. Is it a simple SELECT or a complex SELECT (if it is the latter, show which complex type it belongs to). The following types of marks are commonly used.
- DEPENDENT SUBQUERY
The first SELECT in the inner layer of the subquery depends on the result set of the external query.
- DEPENDENT UNION
The UNION in the subquery, and all subsequent SELECTors in the UNION starting from the second select, also depends on the result set of the external query.
The outermost query in the subquery, note that it is not a primary key query.
Queries other than subqueries or UNION.
The first SELECT of the inner query of the subquery, the result does not depend on the external query result set.
- UNCACHEABLE SUBQUERY
Subqueries whose result sets cannot be cached.
The second SELECT in the UNION statement starts all subsequent SELECT, and the first SELECT is PRIMARY.
- UNION RESULT
Merge results in UNION. SELECT that gets the result from the UNION temporary table.
Derived table queries (subqueries in the FROM clause). MySQL recursively executes these subqueries and places the results in a temporary table. Internally, the server refers to it as a “derived table” because temporary tables are derived from subqueries.
The name of the table in the database accessed in this step or an alias table specified by the SQL statement. This value may be a table name, an alias of a table, or an identifier of a temporary table generated for a query, such as a derived table, subquery, or collection.
Table access method. The following is a list of different types of table joins, from the best to the worst.
The system table has only one row of records. This is a special case of const table join type.
Read constants, with at most one line of matching records. Since there is only one row of records, the field value of that row of records in the optimizer can be regarded as a constant value. Const is used when comparing with a fixed value in the PRIMARY KEY or UNIQUE index.
At most, there is only one matching result, which is usually accessed through the primary key or unique key index. A row of records will be read from the table to combine with the records read from the previous table. Different from const type, this is the best connection type. It is used to connect all parts of the index and the index is a PRIMARY KEY or UNIQUE type. Eq_ref can be used to retrieve fields when comparing “=”. The value to be compared can be a fixed value or an expression. Fields in the table can be used in the expression, and they are ready before reading the table.
Queries that drive table index references in JOIN statements. All records in this table that match the retrieved values will be retrieved and combined with the records retrieved from the previous table. Ref is used when the linker uses the leftmost prefix of the key or the key is not a PRIMARY KEY or UNIQUE index (in other words, the linker cannot obtain only one record based on the key value). This is a good connection type when only a few matching records are found according to key values. Ref can also be used when retrieving fields for comparison using the “=” operator.
The only difference with ref is that an empty query is added to the query referenced by the index. This connection type is similar to ref, except that MySQL will additionally search for records that contain NULL values during retrieval. The optimization of this connection type starts with MySQL 4.1.1, which is often used for subqueries.
The query uses two (or more) indexes at the same time, then merge the index results, and then reads the table data. This type of connection means that the Index Merge optimization method is used.
The combination of return result fields in a subquery is a primary key or a unique constraint.
The returned result field combination in the subquery is an index (or index combination), but not a primary key or unique index. This connection type is similar to unique_subquery. It replaces IN with a subquery, but it is used when there is no unique index in the subquery.
Index range scan. Only records in a given range will be retrieved and an index will be used to retrieve a record.
Full index scan. The connection type is the same as ALL, except that it only scans the index tree. It is usually faster than ALL because the index file is usually smaller than the data file. MySQL uses this connection type when the field knowledge of the query is part of a separate index.
Full-text index scanning.
Full table scan.
This field refers to which index MySQL may use when searching for table records. If no index is available, it will be displayed as null.
The index selected by the query optimizer from the possible_keys. The key field shows the index MySQL actually uses. When no index is used, the value of this field is NULL.
The length of the index key selected to use the index. The key_len field shows the length of the index MySQL uses. When the value of the key field is NULL, the length of the index is NULL.
Listing is filtered by constants or by a field in a table. The ref field shows which fields or constants are used in conjunction with key to query records from the table.
This field shows the number of result set records estimated by the query optimizer through statistical information collected by the system.
This field displays additional information about MySQL in the query.
This column is newly added in MySQL5.1 and will only appear when using EXPLAIN EXTENDED. It shows a pessimistic estimate of the percentage of records in the table that meet a certain condition (WHERE clause or join condition).
3) SQL rewrite
EXPLAIN can display SQL rewrite besides execution plan. The so-called SQL rewriting means that MySQL will rewrite the SQL statement based on some principles before optimizing the SQL statement, so as to facilitate the later optimizer to optimize and generate a better execution plan. This function is used in conjunction with EXPLAIN EXTENDED+SHOW WARNINGS. The following is an example.
From the above example, we can see that the IN subquery in the original statement has been rewritten into the way of inter-table association.
2. Statistical Information
Viewing statistics is also an essential step in optimizing statements. Statistical information can be used to quickly understand the storage characteristics of objects. The following are two main types of statistical information-tables and indexes.
1) table statistics-showtablestatus
- Name: table name
- Engine: the storage engine type of the table (ISAM, MyISAM, or InnoDB)
- Row_format: row storage format (Fixed- fixed, Dynamic- dynamic, or Compressed- compressed)
- Rows: number of rows. In some storage engines, such as MyISAM and ISAM, they store accurate records. However, in other storage engines, it may only be an approximation.
- Avg_row_length: average row length.
- Data_length: the length of the data file.
- Max_data_length: the maximum length of the data file.
- Index_length: the length of the index file.
- Data_free: number of bytes allocated but not used.
- Auto_increment: next autoincrement value.
- Create_time: the time when the table was created.
- Update_time: the time when the data file was last updated.
- Check_time: the last time a check was run on the table. Update after executing mysqlcheck command, valid only for MyISAM.
- Create_options: additional options for CREATE TABLE.
- Comment: The comment used when creating the table (or why MySQL cannot access some information of the table information).
- Version: the’. frm’ file version number of the data table.
- Collation: the character set and corrected character set of the table.
- Checksum: Real-time checksum value, if any.
3. index statistics-showindex
- Table: table name.
- Non _ unique: 0, if the index cannot contain duplicates.
- Key_name: index name
- Seq_in_index: the column sequence number in the index, starting from 1.
- Column_name: column name.
- Collation: How columns are sorted in an index. In MySQL, this can have a value of a (ascending) or NULL (not sorted).
- Cardinality: Number of unique values in the index.
- Sub_part: Number of index characters if the column is only partially indexed. When the entire field is indexed, its value is NULL.
- Packed: Indicates how the key value is compressed, NULL indicates no compression.
- Null: when the record whose field includes NULL is YES, its value is, otherwise it is”.
- Index_type: which index algorithm is used (BTREE, FULLTEXT, HASH, RTREE).
- Comment: remarks.
- System Parameters: System parameters also affect the execution efficiency of statements. To view system parameters, you can use the SHOW VARIABLES command.
1) Parameter description
There are many system parameters, which are described below.
Sort area size. Its size directly affects the algorithm used for sorting. If the sorting in the system is relatively large, the memory is sufficient and the concurrency is not very large, this parameter can be appropriately increased. This parameter is for a single Thead.
Join operation uses memory area size. Join Buffer is used only when join is ALL, index, range, or index_merge. If there are many join statements, the join_buffer_size can be appropriately increased. It should be note that this value is for a single Thread. Each Thread will create its own independent Buffer instead of the Buffer shared by the whole system. Do not set too large to cause insufficient system memory.
If the temporary table in memory exceeds this value, MySQL automatically converts it to MyISAM table on the hard disk. If you execute many advanced GROUP BY queries and have a lot of memory, you can increase the value of tmp_table_size.
The buffer size that can be used for read query operations. This parameter is for a single Thead.
4. Optimizer Switch
In MySQL, there are still some parameters that can be used to control optimizer behavior.
1) Parameter description
This parameter controls the limits of the optimizer when exhaustively executing the plan. If the query is in the “statistics” state for a long time, you can consider lowering this parameter.
The default is on, which allows the optimizer to decide whether to skip some execution plans based on the number of rows to scan.
This variable contains flags that turn on/off the optimizer feature.
Example-Intervention Optimizer Behavior (ICP Characteristics)
By default, the ICP feature is on. Check the optimizer behavior.
Filtering query based on secondary index uses ICP feature, which can be seen from “Using index condition” in Extra. What if the behavior of the optimizer is interfered by the optimizer switch?
As can be seen from Extra, ICP feature has been disabled.
5. system status (SHOW STATUS)
MySQL also has some built-in states. These state variables can also reflect some situations of statement execution, which is convenient to locate problems. If executed manually, the SHOW STATUS command can be executed before and after the execution of the statement to view the changes in status. Of course, because there are many state variables, it is not very convenient to compare them. The small tools I will introduce later can solve this problem.
1) state variables
There are many state variables, some of which are introduced here.
The number of merges that the sorting algorithm has performed. If the value of this variable is large, consider increasing the value of sort_buffer_size system variable.
The number of sorts performed within the range.
The number of rows that have been sorted.
The number of sorts completed by scanning the table.
The number of times the first item in the index has been read. The number of times the index header is read. If this value is high, it means that the full index is scanned a lot.
The number of requests to read a row based on the key. If higher, the index of the query and table is correct.
Number of requests to read the next row in key order. This value increases if you query index columns with range constraints or if you perform an index scan.
Number of requests to read the previous row in key order.
The number of requests to read a row based on a fixed location. This value is higher if a large number of queries are executed and the results need to be sorted. You may have used a large number of queries that require MySQL to scan the entire table or the connection did not use the key correctly.
The number of requests to read the next line in the data file. This value is higher if a large number of table scans are in progress. It usually indicates that the table index is incorrect or the written query does not utilize the index.
6. SQL Profiler
MySQL’s Query Profiler is a very convenient Query diagnostic analysis tool. Through this tool, we can obtain the consumption of various resources in the whole process of Query execution, such as CPU, IO, IPC, SWAP, etc., as well as the occurrence of PAGE FAULTS, CONTEXT SWITCHE, etc. At the same time, we can also obtain the positions of various functions called by MySQL in the source file during the query execution.
1) usage method
mysql> select @@profiling; mysql> set profiling=1;
By default, the value of profiling is 0, which means MySQL SQL Profiler is in the OFF state, and the value of profiling is 1 when the SQL Performance Analyzer is turned on.
Execute SQL statement
mysql> select count(*) from t1;
Get summary information
Use the “show profile” command to obtain profile information of multiple Query saved in the current system.
mysql> show profiles; +----------+------------+-----------------------+ | Query_ID | Duration | Query | +----------+------------+-----------------------+ | 1 | 0.00039300 | select count(*) from t1 | +----------+------------+-----------------------+
Get detailed profile information for a single Query
After obtaining the summary information, the detailed profile information during the execution of a Query can be obtained according to the Query_ID of the summary information.
mysql> show profile for query 1; mysql> show profile cpu,block io for query 1;
II. Description of Tools
The above mentioned various methods are helpful for tuning SQL statements. Through the following small tool, you can automatically call the command to push the above contents to DBA at one time, greatly accelerating the optimization process.
1. Preparation Conditions
Python version = 2.7.3 (version 2.6.x should be fine, version 3.x not tested)
2. Call the method
python mysql_tuning.py -p tuning_sql.ini -s 'select xxx'
1) Parameter description
-p specifies the profile name
-s specifies SQL statement
3. Configuration Files
There are two sections of information, namely [database] describes database connection information and [option] operation configuration information.
server_ip = 127.0.0.1 db_user = testuser db_pwd = testpwd db_name = test
sys_parm = ON //是否显示系统参数 sql_plan = ON //是否显示执行计划 obj_stat = ON //是否显示相关对象(表、索引)统计信息 ses_status = ON //是否显示运行前后状态信息(激活后会真实执行SQL) sql_profile = ON //是否显示PROFILE跟踪信息(激活后会真实执行SQL)
4. Output Description
1) Title section
Contains address information and data version information of the running database.
2) original SQL
The user executes the input SQL, which is mainly used for subsequent comparison of SQL rewriting. Statement is displayed using formatting.
3) system level parameters
Script selection shows some parameters related to SQL performance. This part is written to death in the code. If you need to extend it, you need to modify the script.
4) Optimizer Switch
The following are some parameters related to the optimizer. The behavior of the optimizer can be artificially interfered by adjusting these parameters.
5) Implementation Plan
Is the output of calling explain extended. If the result is too long, there may be a problem of displaying serial (temporarily unresolved).
6) SQL rewritten by optimizer
From here, it can be judged whether the optimizer has made some optimization on SQL (e.g. processing of subqueries).
7) statistical information
Statistics of all tables and their indexes involved in SQL statements will be displayed here.
8) Operation status information
At the session level, the SHOW STATUS before and after execution is compared, and the changed part is displayed. It should be noted that, because the state data is collected in a SELECT mode, errors of individual indicators (such as Com_select) will be caused.
9) PROFILE details
Details obtained by calling SHOW PROFILE.
10) PROFILE summary information
According to the PROFILE’s resource consumption, the TOP N of consumption in different stages is displayed, and the “bottleneck” is visually displayed.
Author: feng han
Source of content:Yixin Institute of Technology