[case4] Talk about jdbc’s defense measures against read-write anomalies related to large data volumes

  jdbc

Order

This article mainly studies the defense measures against the anomalies related to reading and writing large amounts of data to jdbc.

Read operation

Select a large amount of data to memory at one time, and OOM exception is the most common. At this time, time and data size can be limited.

Limit the amount of data

1. Pagination query

For common functions, paging operation is necessary and the simplest way to solve this problem. When relevant functions are implemented, the amount of data produced should be estimated in advance to determine the corresponding amount of paging data.

2.maxRows

Jdbc can set maxRows of the statement to limit the maximum value of all data that the statement can pull, and discard if it exceeds. The jdbc driver implementation may be different for different data. For example, pg’s jdbc driver compares maxRows with fetchSize and takes the smallest value as the limit parameter value to query.

If this parameter is to be set universally for different sql, it may not be too good to set, it is a bit barbaric and violent, and it may be that some of the queried data do not have many columns or take up too much memory. Separate settings are required. However, jdbc is rarely used directly in actual function implementation, and jpa or mybatis is used instead. Therefore, it is necessary to see whether jpa or mybatis has exposed this parameter value to you. However, for common sql services, it is very necessary to set maxRows, such as no more than 2w, to carry out bottom-up prevention.

3.fetchSize

Jdbc provides the fetchSize parameter to set batch fetching by fetchSize for each query. Different databases have different jdbc driver implementations.

For example, mysql needs url setting useCursorFetch=true and fetchSize of statement to truly batch fetch, otherwise, it will pull data in full quantity. In fetch mode, the executeQuery method does not get the first batch of data, but is implemented in the resultSet’s next method.

For example, if pg is used, the executeQuery method will fetch the first batch of fetchSize data by default and return it, and then the resultSet’s next () method will fetch it as needed.

If fetchSize is used to avoid OOM, there is a restriction, that is, one needs to traverse data and process data while traversing resultSet. If you do not traverse while processing, or add the result set loop to the list to return, under the programming paradigm that is not reactive mode, this fetchSize will lose its effect, because in the end you still pile up all the data sets in memory before processing, so there will be OOM risk in the end.

Limit query time

If time is limited, there are multiple dimensions:

1. socketTimeout for 1.connection

This is the timeout parameter setting of the lowest connection socket in jdbc, which can be used to prevent the database from blocking the connection due to network reasons or restart of its own problems. This setting is very necessary and is usually set in the connection url.

Mysql, for example

jdbc:mysql://localhost:3306/ag_admin?useUnicode=true&characterEncoding=UTF8&connectTimeout=60000&socketTimeout=60000

For example, pg, the unit of pg is different from mysql, mysql is milliseconds, while pg is seconds

jdbc:postgresql://localhost/test?user=fred&password=secret&&connectTimeout=60&socketTimeout=60

However, database connection pool is commonly used now, so this is not set, and it is also possible to set connection pool related parameters.

2. queryTimeout of 2.statement

This is mainly to set the timeout time for the execution of the statement’s executeQuery, that is, the timeout time from the client side issuing the query instruction to receiving the first batch of data, which is usually realized by timer.

However, the implementation of jdbc driver in different databases is different. For example, mysql’s executeQuery will not get the first batch of data in fetch mode, while pg will pull the first batch of data and return it. This parameter is relatively semantic only when it is not in fetch mode, that is, all data is queried at once. If it is fetch mode, the timeout cannot limit the pull timeout of subsequent batches of data. They can only depend on the socketTimeout parameter of connection.

Mybatis can set this value through the defaultStatementTimeout parameter.
Jpa can be set via query hit.

@QueryHints(@QueryHint(name = org.hibernate.jpa.QueryHints.SPEC_HINT_TIMEOUT, value = "1000"/*ms**/))
    List<DemoUser> findAll();

Jdbc template can be set through parameters

    @Bean(name = "pgJdbcTemplate")
    public JdbcTemplate pgJdbcTemplate(
            @Qualifier("pgDataSource") DataSource dataSource) {
        JdbcTemplate jdbcTemplate = new JdbcTemplate(dataSource);
        jdbcTemplate.setQueryTimeout(10*1000);
        jdbcTemplate.setMaxRows(10*1000);
        return jdbcTemplate;
    }

3. timeout of 3.transaction

Implementing a business function in real-world programming may call many statement queries in a transaction, which can limit the timeout of these operations on a transaction-by-transaction basis.

You can set a global timeout

    @Bean
    @Qualifier("pgTransactionManager")
    PlatformTransactionManager pgTransactionManager() {
        JpaTransactionManager transactionManager = new JpaTransactionManager(pgEntityManagerFactory().getObject());
        transactionManager.setDefaultTimeout(60 /*seconds*/);
        return transactionManager;
    }

You can also set it separately in the transactional annotation, for example

@Transactional(timeout=5) /**5 seconds*/
public List<DemoUser> findAll();

4. Time taken by 4.connection

When using connection pool for database operation, the general connection pool will provide the function of connection detection, such as verifying whether the connection is ok when borrow

In addition, timeout suspect and abandon operations for connection occupation are also provided to detect connection leakage. if none of the above operations are s et or the (default) set value is too large and unreasonable, then this detection is a bottom-up operation except socketTimeout. If the connection is lent and not returned after the specified time, it is determined that the connection is leaked, and abandon, i.e. close, will be forced to close the connection, which is very violent but also very useful to prevent thread blocking from causing service 504 or 502 at the end of the database operation.

Write operation

Similar to fetchSize, jdbc provides batch method for inserting or updating large amounts of data for batch operations. Therefore, for large-scale data operations, attention should be paid to the amount of data accumulated in memory, and remember to release calls in batches. It is more suitable to use native jdbc to operate. jpa’s save method still has a large number of objects docked in memory, and batch and release are only performed when flush.

Summary

For a large number of data read and write operations of jdbc, extra attention should be paid to the accumulation of objects in memory to prevent OOM. In addition, extra attention should be paid to setting the timeout time for database operations to prevent the server thread from blocking and preventing the service from being provided.

Operation Category Parameter Remarks
Read Quantity pageSize Pagination query
Read Quantity maxRows Limit the maximum amount of all data for one or more fetch queries
Read Quantity fetchSize Limit the size of statement’s query and result’s next batch query.
Read Time connection socketTimeout Read Timeout for Bottom socket Connection
Read Time statement queryTimeout Query timeout to restrict statement
Read Time transaction timeout Limit timeout for transaction execution
Read Time connection remove abandon timeout Limit Connection Borrowing Timeout
Write Quantity batch execute Batch execution

doc