Yixin Open Source Database Auditing Software Themis Deployment Strategy

  Database, mysql

I introduction

Themis is a database auditing product developed by DBA team of Yixin Company, which can help DBA and developers to quickly discover database quality problems and improve work efficiency. Its name comes from the goddess of justice and law in Greek mythology. The name of the project implies that the platform judges the quality of the database fairly and clearly.

This platform can audit Oracle and MySQL databases in multiple dimensions (object structure, SQL text, execution plan and execution characteristics) to evaluate the design quality of object structure and SQL operation efficiency. It can help DBA and developers to quickly discover positioning problems. And provide part of the auxiliary diagnosis ability, improve the optimization work efficiency. All operations can be carried out through the WEB interface, which is simple and convenient. In addition, in order to better meet personalized needs, the platform also provides expansion capabilities, and users can expand according to their needs.

Open source address:https://github.com/CreditEaseDBA

1.1 Functional Overview

  • After the audit, the independent optimization will be implemented in Phase II. It can also be introduced in the design phase of the project to play a part of pre-audit role.
  • The main users are DBA and R&D personnel with certain database foundation.
  • It can be audited for a certain user, including data structure, SQL text, SQL execution characteristics, SQL execution plan and other dimensions.
  • The audit results are provided in the form of WEB pages or export files.
  • The platform supports mainstream Oracle and MySQL databases, and other databases will be implemented in Phase II.
  • As far as possible to provide flexible customization ability, easy to expand the function in the future.

1.2 Supported Databases

  • MySQL(5.6 and above)
  • Oracle(10g and above)

1.3 Audit Dimensions

  • Database results (objects) = “refer to database objects, common tables, partitions, indexes, views, triggers, etc.
  • SQL Text (Statement) = “refers to the SQL statement text itself.
  • SQL Execution Plan = “refers to the execution plan of SQL in the database.
  • SQL Execution Characteristics = “refers to the actual execution of statements on the database.

1.4 Implementation Principle

The basic implementation principle of the whole platform is very simple, that is, to filter our audit objects (currently supporting four kinds) through the rule set. Those who meet the rules are all suspected to have problems. The platform will provide these problems and related information for manual screening. Thus, whether the platform is powerful or not depends on the abundance of rule sets. The platform also provides some expansion capabilities to facilitate the expansion of rule sets.

1.5 Platform Architecture

The block in the figure is the main module of the platform. Modules with different background colors indicate that the current progress status is different. Dashed lines represent data flow and solid lines represent control flow. Its core is these modules:

  • Data acquisition module. It is responsible for capturing the basic data needed for auditing from the data source. Currently, capturing from Oracle and MySQL is supported.
  • OBJ/SQL repository. This is the common storage part of the system. The collected data, the intermediate data and the result data during the processing are all stored here. Its core data is divided into object classes and SQL classes. Physics is MongoDB.
  • Core management module. The dashed line on the right in the figure contains two modules: SQL management and OBJ management. It is mainly to complete the whole life cycle management of objects. At present, only simple object filtering function has been done, so it is still a white background, and the core function has not been completed.
  • Audit rules and audit engine module. This part is the core component of the first phase of the platform. The audit rule module is used to define and configure rules. The audit engine module is the audit execution part that completes specific rules.
  • Optimization rules and optimization engine module. This part is the core component of the second phase of the platform. At present, it has not been developed, so it has a white background.
  • System management module. This part is to complete the basic functions of the platform, such as task scheduling, space management, audit report generation, export and other functions.

1.6 operation flow

II. Environmental Construction

Mysql, mongo and redis will be used in this project. python supports 2.6 and 2.7, but python3 is not currently supported.

Mysql is used to store slow queries of mysql captured by pt-query-digest. mongo stores our rules, oracle’s collection results, job execution, analysis of result sets, etc. redis is used as the queue for task scheduling celery.

In mysql’s data collection part, we use the pt-query-digest tool.

2.1 installation dependent

New user

In order to reduce later modifications to the supervisord.conf configuration file, we recommend using a unified user for installation.

    adduser themis-test
    su - themis-test

The following operations are installed by default under themis-test user except virtualenv installation needs to be switched to root user.

Installing cx_Oracle Dependencies

Since the oracle database needs to be connected during the audit process, the cx_Oracle dependency needs to be installed first, refer to:http://www.jianshu.com/p/pKz5K7

Installing python Dependencies

First install virtualenv, refer to link:https://pypi.python.org/simpl …, it is recommended to install version 13.0.3 or later

If networking is not convenient, or in the company’s intranet, can fromhttps://pan.baidu.com/s/1o7AI …, extraction code: 3sy3

The compressed package includes all the dependency packages that need to be used.

Install virtualenv

tar -zxvf virtualenv-13.0.3.tar.gz
cd virtualenv-13.1.0
python setup.py install

For the use of virtualenv, please refer to:https://virtualenv.pypa.io/en …

Install other dependencies

Firstly, the virtual environment is initialized

virtualenv python-project --python=python2.7
source /home/themis-test/python-project/bin/activate

Explain the above command: python-project, the second parameter of virtualenv, is the name of the virtual environment that we have established. although we can define this name casually, it is used in the configuration of supervisor later. it is recommended to use the default. if you are familiar with python, you can define it at will. Later, we specified the version of python –python can be omitted. By default, the system’s own version of python will be used to build the virtual environment. When there are multiple versions of python, you can use this command to specify the version.

The following uses source to initialize the virtual environment. The package dependencies installed in the future will be installed here at/home/themis-test/python-project/home/themis-test/python 2.7/lib/python 2.7/site-packages.

If you can connect to the network, enter the source code directory and use the following command

pip install -r requirement.txt

Pyh is installed separately, download address:https://github.com/hanxiaomax …

unzip pyh-master.zip
cd pyh-master
python setup.py install

If networking is not convenient in the LAN environment, please use the compression package provided in the above network disk.

pip install --no-index -f file:///home/themis-test/software -r requirement.txt

File:///home/themis-test/software is the location where the compression package is decompressed.

2.2 Profile Introduction

The following is an example of the configuration file settings.py to illustrate some of the required dependencies

# # set oracle ipaddress, port, sid, account, password
# ipaddres : port -> key
ORACLE_ACCOUNT = {
    # oracle
    "127.0.0.1:1521": ["cedb", "system", "password"]
}

# set mysql ipaddress, port, account, password
MYSQL_ACCOUNT = {
    "127.0.0.1:3307": ["mysql", "user", "password"]
}

# pt-query save data for mysql account, password
PT_QUERY_USER = "user"
PT_QUERY_PORT = 3306
PT_QUERY_SERVER = "127.0.0.1"
PT_QUERY_PASSWD = "password"
PT_QUERY_DB = "slow_query_log"

# celery setting
REDIS_BROKER = 'redis://:password@127.0.0.1:6379/0'
# REDIS_BROKER = 'redis://:@127.0.0.1:6379/0'


REDIS_BACKEND = 'redis://:password@127.0.0.1:6379/0'
# REDIS_BACKEND = 'redis://:@127.0.0.1:6379/0'


CELERY_CONF = {
    "CELERYD_POOL_RESTARTS": True
}

# mongo server settings
MONGO_SERVER = "127.0.0.1"
MONGO_PORT = 27017
# MONGO_USER = "sqlreview"
MONGO_USER = "sqlreview"
# MONGO_PASSWORD = ""
MONGO_PASSWORD = "sqlreview"
MONGO_DB = "sqlreview"

# server port setting
SERVER_PORT = 7000

# capture time setting
CAPTURE_OBJ_HOUR = "18"
CAPTURE_OBJ_MINUTE = 15
CAPTURE_OTHER_HOUR = "18"
CAPTURE_OTHER_MINUTE = 30

ORACLE_ACCOUNT and MYSQL_ACCOUNT are the accounts and passwords of the target machines that we need to audit. They are mainly used in the data collection part and object class audit as well as mysql’s execution plan class audit department. Therefore, the accounts should have higher permissions. In order to be safe in the production environment, we should set up proprietary accounts and permissions, or add some ip restrictions.

PT_QUERY_USER, PT_QUERY_PORT, PT_QUERY_SERVER, PT_QUERY_PASSWD, PT_QUERY_DB are some configurations of Mysql database that need to be stored after our pt-query-digest tool parses the slow SQL of the target machine.

REDIS_BROKER, REDIS_BACKEND, CELERY_CONF are configuration options for the task scheduler celly.

MONGO_SERVER, MONGO_PORT, MONGO_USER, MONGO_PASSWORD, MONGO_DB are the configuration options for mongo that needs to store the result set.

SERVER_PORT is the port that the web management terminal listens to. Do not use ports 9000 and 5555, which are assigned to the file download server and the flower management tool.

CAPTURE_OBJ_HOUR, CAPTURE_OBJ_MINUTE, CAPTURE_OTHER_HOUR, CAPTURE_OTHER_MINUTE are the acquisition times that need to be set for oracle’s data acquisition module. You can set different times according to your actual situation to avoid peak business hours.

Please follow the relevant instructions to configure the file.

2.3 Rule Import

Enter the source code directory and use the following command to initialize the rules

mongoimport -h 127.0.0.1 --port 27017 -u sqlreview -p password -d sqlreview -c rule --file script/rule.json

III. Data Acquisition

Data collection is divided into oracle part and mysql part. oracle part uses some scripts developed by itself, and mysql uses pt-query-digest tool.

The frequency of data collection is once a day by default and can be modified according to your own needs.

Oracle depends partly on celery’s task scheduling and will be managed by supervisor. pt-query-digest can be added to crontab.

3.1 oracle Section

Manual data acquisition

Manually collect oracle obj information

Json file

{
    "module": "capture",   
    "type": "OBJ",    
    "db_type": "O",    
    "db_server": "127.0.0.1",    
    "db_port": 1521,   
    "capture_date": "2017-02-28"
}

Only db_server and dbport options need to be configured. oracle’s port requirement is 1521 and capture_date specifies the date of data collection. Currently, only daily collection is supported.

executive command

python command.py -m capture_obj -c data/capture_obj.json

Manually collect oracle other information, including plan, stat, text information.

Json file.

{
    "module": "capture",   
    "type": "OTHER",    
    "db_type": "O",    
    "db_server": "127.0.0.1",    
    "db_port": 1521,    
    "capture_date": "2017-02-28"
}

The configuration method is the same as obj above.

executive command

python command.py -m capture_obj -c data/capture_obj.json

Manual data acquisition is generally used for initial acquisition, and the latter is usually completed through automatic acquisition.

Automatic data acquisition

To configure ORACLE_ACCOUNT in settings.py file, the account must have the permission to query all tables, that is, select any table.

ORACLE_ACCOUNT = {    
    # oracle    
    "127.0.0.1:1521": ["cedb", "system", "password"]
}

Configure scheduling time

# capture time setting
CAPTURE_OBJ_HOUR = "18"
CAPTURE_OBJ_MINUTE = 15
CAPTURE_OTHER_HOUR = "18"
CAPTURE_OTHER_MINUTE = 30

If you do not audit the oracle database, you do not need to configure it.

3.2 mysql section

Pt-query-digest usage

  • The slow logs can be centralized in one place and then put into storage.
  • You can also install pt-query-digest on each mysql machine, and then push the analysis results to the storage machine.

This platform adopts the second scheme

Fromhttps://www.percona.com/get/p …Download and install pt-query-digest, and use yum installation for lack of dependency.

Sql initializes the table structure using scirpt/pt_query_digest.sql instead of the default table structure.

Script/pt-query-digest.sh script is configured on the target machine:

pt-query-digest --user=root --password=password --review h=127.0.0.1,D=slow_query_log,t=global_query_review --history h=127.0.0.1,D=slow_query_log,t=global_query_review_history --no-report --limit=0% --filter=" \$event->{Bytes} = length(\$event->{arg}) and \$event->{hostname}='127.0.0.1:3306' and \$event->{client}=\$event->{ip}" slow.log

$ event-> {hostname} =’ 127.0.0.1: 3306′ is the ip address and port number of the machine that is collecting slow logs.

It is mainly to configure the account number, password, machine ip, port number and location of slow log of mysql machine that stores the analysis results.

Run pt-query-digest.sh script to start collecting mysql slow query data, which can then be added to regular tasks and collected according to a fixed time period.

IV. Task Export

4.1 Manual Task Export

Json file

{
    "module": "export",
    "type": "export",
    "task_uuid": "08d03ec6-f80a-11e6-adbc-005056a30561",
    "file_id": "08d03ec6-f80a-11e6-adbc-005056a30561"
}

Configure the task_uuid and file_id options, which are the only flags of the task and can be viewed from the job collection in the sqlreview library in mongo and then run:

python command.py -m export -c data/export.json

For manual task export, an offline html compression package will be generated and saved under task_export/downloads. It can be decompressed directly and then opened to view the report through the browser.

4.2 Automatic Task Export

This is achieved by cooperating with supervisor hosting in celery. Please refer to the supervisor’s configuration for details.

V. web management end

5.1 manually open the web management terminal

Execute the following command

python command.py -m web -c data/web.json

Accesshttp://127.0.0.1: 7000 to open the management end

VI. supervisor Configuration

6.1 supervisor configuration

;web管理端开启
[program:themis-web]
command=/home/themis-test/python-project/bin/python command.py -m web -c data/web.json
autostart=true
redirect_stderr=true
stdout_logfile=tmp/themis_web.log
loglevel=info

;开启文件下载服务器
[program:themis-download]
command=/home/themis-test/python-project/bin/python task_export/file_download.py
autostart=true
redirect_stderr=true
stdout_logfile=tmp/themis_download.log
loglevel=info

;开启任务导出模块
[program:themis-export]
command=/home/themis-test/python-project/bin/celery -A task_exports worker -E -l info
autostart=true
redirect_stderr=true
stdout_logfile=tmp/themis_export.log
loglevel=info
;开启规则解析模块
[program:themis-analysis]
command=/home/themis-test/python-project/bin/celery -A task_other worker -E -Q sqlreview_analysis -l info
autostart=true
redirect_stderr=true
stdout_logfile=tmp/themis_analysis.log
loglevel=info
;开启obj信息抓取模块
[program:themis-capture-obj]
command=/home/themis-test/python-project/bin/celery -A task_capture worker -E -Q sqlreview_obj -l debug -B -n celery-capture-obj
autostart=true
redirect_stderr=true
stdout_logfile=tmp/themis_capture_obj.log
loglevel=info
;开启plan、stat、text信息抓取模块
[program:themis-capture-other]
command=/home/themis-test/python-project/bin/celery -A task_capture worker -E -Q sqlreview_other -l info -B -n celery-capture-other
autostart=true
redirect_stderr=true
stdout_logfile=tmp/themis_capture_other.log
loglevel=info
;celery的任务管理模块,去掉前边的";"即可开启,需要配置redis的连接方式
;[program:themis-flower]
;command=/home/themis-test/python-project/bin/celery flower --address=0.0.0.0 --broker=redis://:password@127.0.0.1:6379/0
;autostart=true
;redirect_stderr=true
;stdout_logfile=tmp/themis_flower.log
;loglevel=info

Note: If the users established earlier are different or use different directories, you need to replace/home/themis-test/python-project/in this file with your own path.

Supervisor common commands

开启supervisor
supervisord -c script/supervisord.conf
重载supervisor
supervisorctl -u sqlreview -p sqlreview.themis reload
进入supervisor管理控制台,这里的-u,-p代表supervisorctl的用户名和密码,在supervisord.conf中配置
supervisorctl -u username -p password

Reference:http://www.supervisord.org/

Seven, common problems

  • Inconsistent host names cause cx_Oracle errors.
  • Celery and flower versions are not consistent, causing flower to fail to start, upgrading flower to above 0.8.1.
  • Mysql5.7 cannot be initialized. The default type of datetime is (DEFAULT ‘0000-00-00 00:00:00).
  • There is a limit on the maximum insertion data of mongodb documents, resulting in the failure of inserting documents when generating results.
  • When oracle obtains users, some systems may build users under users, so it is necessary to change NOT IN (‘USERS’, ‘SYSAUX’) to NOT IN (‘SYSAUX’).

File location: capture/sql.pywebui/utils/f _ priv _ db _ user _ list.py

In some cases, python-devel needs to be installed, and centos needs yumsTALL python-devel installed.

Mysqldb installation problem reference:http://blog.csdn.net/wklken/a …

Eight, exception handling

Errors in the middle of the program can be checked by opening flower, or the code can be checked manually.

The opening of flower can be configured in supervisor or

;celery的任务管理模块,去掉前边的";"即可开启,需要配置redis的连接方式
;[program:themis-flower]
;command=/home/themis-test/python-project/bin/celery flower --address=0.0.0.0 --broker=redis://:password@127.0.0.1:6379/0
;autostart=true
;redirect_stderr=true
;stdout_logfile=tmp/themis_flower.log
;loglevel=info

You can also open it manually:

celery flower --address=0.0.0.0 --broker=redis://:password@127.0.0.1:6379/0

However, the redis authentication option needs to be configured.

IX. Joining in Development

There are problems can be directly inhttps://github.com/CreditEase …Put forward.

This article is selected from:wikihttps://tuteng.gitbooks.io/th ….
Due to the length, please click on the link to view the original text.

Source:Yixin Institute of Technology