Several Key Points of Performance Test


Several definitions

Performance Test

Generally, all performance related to testing is collected and used by different people in different situations. Test the running performance of the software in the system and measure the gap between the system and the predefined target.

Focus: how much and how fast

Load Test

Load test is a kind of performance test, which refers to whether a program can bear the data running in an overloaded environment. By gradually increasing the system load, the maximum load that the system can bear under the condition of satisfying the performance index is determined.

Attention: how much

Stress Test

Stress test is a kind of load test under high load, that is to say, the system is under a load condition, then continue to pressurize it to form a double load, know that the system crashes, and pay attention to the recovery ability of the system after the crash, and a process of pressurization before to see if the system has been completely destroyed.

There is a vivid saying that you can bear 100kg and walk, but can you bear 100kg and walk for one month?

The external load is called pressure, and the internal pressure is called load. The load pays attention to some internal and system situations. But the pressure pays more attention to the appearance outside the system.

Performance test model


The performance test is carried out from light to heavy, gradually exerting pressure on the system. Generally, the performance indicators that users are most concerned about include: response time, throughput, resource utilization rate and maximum number of users. We can divide this map into three areas, namely, light load area, heavy load area and load failure area.

  • Light load area
    In this area, you can see that with the increase of the number of virtual users, the utilization rate and throughput of system resources also increase, while the response time has not changed significantly.

  • Heavy load area
    In this area, you can find that with the increase of the number of virtual users, the system resource utilization rate increases slowly, and the throughput begins to increase slowly. With the increase of the number of virtual users, the resource utilization rate remains relatively stable (meeting the system resource utilization index), and the throughput basically remains stable, and then decreases slightly, but the amplitude is not large, and the response time will increase relatively greatly.

  • Load failure area
    In this area, the utilization rate of system resources will increase and reach saturation, for example, the CPU utilization rate will reach 95% or even 100%, and the state will be maintained for a long time, while the throughput will drop sharply and the response time will increase greatly (i.e., inflection point will appear).

  • Two junction points
    The number of users at the junction of light load area and heavy load area is called the “best number of users”. The number of users at the junction of the heavy load area and the load failure area is called the “maximum number of users”.

When the load of the system is equal to the optimal number of users, the overall efficiency of the system is the highest, the utilization rate of system resources is moderate, and user requests can be responded quickly.

When the system load is between the optimal number of users and the maximum number of users, the system can continue to work, but the response time starts to get longer, the system resource utilization rate is higher, and the state is maintained continuously. If the load continues, a small number of users will eventually give up because they cannot bear it.

However, when the system load is greater than the maximum number of users, it will cause more users to give up using the system because they can’t stand the extra-long waiting, and sometimes even there will be a system crash, which will result in the failure to respond to user requests.

Number of concurrent users

Relative concurrent users (User perspective)

That is, the number of line users, the number of users who interact with the server and exert pressure on the server in a period of time. This time period can be one day or one hour.


The concurrency set by concurrency tools such as ab and wrk usually refers to this concurrency. For example, if the number of threads in a thread group is set to 100 in JMeter, is it 100 for this type of request? From a macro point of view, this understanding is also correct. It is like inviting 100 people to complete a series of tasks independently. Indeed, 100 people are working in parallel. However, the pressure felt by the server may not be 100.

The so-called 100 parallelism here is not strictly all parallelism for the server, because the execution rhythm of each virtual user is independent. Assuming that this operation requires 3 requests to complete, it is very likely that one virtual user is still waiting for the response of the first request, but another virtual user has received the response of the first request and initiated the second request. Then for the server, at a certain moment, whether for request 1 or request 2, the parallelism does not reach 100. This model is similar to that shown in the right part of the figure. Understanding this model is very important for parallel understanding.

The difference here lies in the macroscopic parallelism or the strict parallelism. For example, the strict parallelism of a request may be seen through the number of TCP connections held. The number of TCP requests that remain connected is obtained through the “netstat | grepsestablish | wc-l” command. This is the absolute number of concurrent users below.

Concurrency and parallelism are related concepts, but there are also many differences in details. Concurrency means that two or more tasks are making progress, even if they are not executed at the same time. For example, this can be achieved in the form of time slices, in which each task executes a small portion and is mixed with slices of other tasks. Such as a concurrent collector. The emergence of parallel enables tasks to be executed at the same time.

Absolute concurrent users (Server perspective)

It is mainly to test a certain operation, that is, multiple users initiate the same request at the same time.


In the figure, the line segment of each color represents an operation. At any one time, the server knows that it has 10 transactions to process, which are also generated by 10 users. However, it does not know how many users interact with the system to generate all transactions in the whole time period. At time 1, 10 current transactions are initiated by 10 users. Time 2 is still 10 ongoing transactions, but it may be initiated by 10 completely different people. During this period of time, the server is handling 10 transactions at every moment, but the number of people involved in this interactive process (exerting pressure on the server) may reach hundreds or only the first 10. So, what is the pressure on the server? Obviously, it is only these 10 simultaneous transactions at every moment.

Think Time

When calculating the number of relatively concurrent users, the setting of think time will affect the test results. Imagine a scenario in which a real user shops on an e-commerce website. A simplified process may be as follows: 1) enter the home page 2) search for a commodity 3) view the commodity details 4) join the shopping cart 5) submit the order 6) complete the payment. Based on the above discussion, we can construct a series of JMeter requests and put them into a thread group. If we want to test for such users, how many people can our system support to shop at the same time. Assuming that we have considered the parameterization of the user’s login account and purchase of goods, can we directly set the thread group value to a larger value and then execute it concurrently?

This can be implemented, but there is a big problem. The difference between real users and scripts. If the script is recorded based on the previous method, there is no other pause before the execution time of the two requests, and the interval only depends on the response time of the previous service and the time required for the tester to initiate the request. But obviously the real users are not machines, they have a time to think when they do each step above, which is also the meaning source of the word Think Time.

Of course, this thinking time is also a general term, including the user’s operation time. For example, after entering the home page, the user needs to enter the keyword of the commodity he wants to purchase in the search box, and it may take as little as a few seconds to turn on the input method and enter the relevant words. After the search results are displayed, the user needs to browse and select, find the commodity of interest and click to view the details. The following steps are similar. Just imagine, after comparing the continuous execution of requests and the addition of Think Time simulating real users between each step, the number of online shoppers that can be supported by the same system will definitely vary greatly, and the practice of thinking time is obviously closer to the real situation.

Preparatory work

Different machines, operating systems, web servers and related parameters will also affect the performance test results. It is necessary to configure the parameters before testing.
ReferencesDo a correct performance testas well asHigh load system, network parameter adjustmentTo make adjustments. Here are some important parameters of the operating system.

fs.file-max = 999999  
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_max_tw_buckets = 5000
net.ipv4.ip_local_port_range = 1024    61000
net.ipv4.tcp_rmem = 4096 32768 262142
net.ipv4.tcp_wmem = 4096 32768 262142
net.core.netdev_max_backlog = 8096
net.core.rmem_default = 262144
net.core.wmem_default = 262144
net.core.rmem_max = 2097152
net.core.wmem_max = 2097152
net.ipv4.tcp_syncookies = 1
  • File-max: This parameter indicates the maximum number of handles that a process (such as a worker process) can open at the same time. This parameter directly limits the maximum number of concurrent connections and needs to be configured according to the actual situation.

  • Tcp_tw_reuse: This parameter is set to 1, which means that socket in TIME-WAIT state is allowed to be reused for new TCP connections. This is of great significance to the server because ther e will always be a large number of TIME-WAIT state connections on the server. -tcp_keepalive_time: This parameter indicates how often TCP sends keep aliv e messages when keepalive is enabled. The default is 2 hours. If it is set smaller, invalid connections can be cleaned up faster.

  • Tcp_fin_timeout: This parameter indicates the maximum time socket will remain in FIN-WAIT-2 state when the server actively closes the connection.

  • Tcp_max_tw_buckets: This parameter indicates the maximum number of TIME_WAIT sockets allowed by the operating system. If this number is exceeded, TIME_WAIT sockets will be cleared immediately and a warning message will be printed. This parameter defaults to 180000, and too many TIME_WAIT sockets will slow down the Web server.

  • Tcp_max_syn_backlog: This parameter indicates the maximum length of the queue for receiving SYN requests in the TCP three-way handshake establishment p hase, which is 1024 by default. Setting it larger can prevent Linux from losing connection requests initiated by clients when Nginx is too busy to accept new connections.

  • Ip_local_port_range: This parameter defines the range of values for local (excluding remote) ports in UDP and TCP connections.

  • Net.ipv4.tcp_rmem: This parameter defines the minimum, default and maximum values of the TCP receive buffer (used for TCP receive sliding window).

  • Net.ipv4.tcp_wmem: This parameter defines the minimum, default, and maximum values of TCP send cache (used for TCP send sliding window).

  • Netdev_max_backlog: when the network card receives packets faster than the kernel processes them, a queue will hold them. This parameter represents the maximum value of the queue.

  • Rmem_default: This parameter indicates the default size of the kernel socket receive buffer.

  • Wmem_default: This parameter indicates the default size of the kernel socket send buffer.

  • Rmem_max: This parameter represents the maximum size of the kernel socket receive buffer.

  • Wmem_max: This parameter represents the maximum size of the kernel socket send buffer.

  • Tcp_syncookies: This parameter is independent of performance and is used to solve SYN attacks of TCP.

testing tool

Several Core Modules of Performance Test Tool

  • Virtual User Generator

  • Result Collector

  • Load Controller

  • System Resource Monitor

  • Analysis of results


It can be seen from this that tools such as wrk and apache bench are not complete performance testing tools because they do not have a system resource monitor. In this case, the results of simple testing are not very accurate and can only be said to be simple and crude.

Monitoring System Resources under Test

An important part of performance test data collection is the resource usage of the system under test, because system performance and resource usage are closely related, and the main purposes are as follows: to understand the usage of various resources of the system under current pressure, and can also be used for horizontal comparison. Through the analysis of resource usage, it can be seen whether the maximum performance of the system is currently measured. Whether a certain resource has reached the upper limit has become a bottleneck. Whether other modules of the system under test occupy resources. Usually in the performance test, the tester will collect the usage of server resources such as CPU, memory, network, etc. However, if only a general percentage is not enough, further subdivision is needed to provide more valuable data.

Therefore, if you only use wrk test, remember to pay attention to the cpu, io (network/file), memory and other usage of the tested system. For Internet applications, special attention should be paid to the number of network connections.

System resource bottleneck

  • Stabilizing System Resource State

  • System resource bottleneck

Peak flow estimation

According to historical daily average pressure, daily maximum pressure and other information, the daily average and daily maximum pressure in the next few years can be estimated. Through some common estimation methods, such as the 28 principle (80% of the work is completed in 20% of the time, equivalent to 2 hours to complete 8 hours of work a day), the daily pressure is converted into peak pressure.

Assuming that the daily average PV of the system is 8000 W, calculated according to 4w seconds, 8000w/4w=2000, with an average of about 2000qps, calculated according to 28 principles, the peak QPS is 2000*4=8000 QPS