Great God, I recently took over a project or platform created by others, which is used to monitor various services of the company (mainly some QPS,5xx, delay and other indicators of http interface) and generate reports. The index data is stored in mongodb, actually called tokumx (I am not sure whether it is optimized for mongodb at present). I feel a little strange about his deployment plan. I would like to take it out and have a look at it with the great gods and say the optimization plan.
The current deployment is that there are two IDC’s in the north and three IDC’s in the south.North and South Deploy an Independent mongodb Replica ClusterOne side is one master, one secondary and one arbitration. Guangzhou Fangji is a full-scale data center (Beijing only stores the latest). What is incredible is that the synchronization of the data on both sides does not rely on mongodb synchronization mechanism (because it is not a cross-machine room replica cluster), but Beijing’s code explicitly calls the interface to write the data to Guangzhou after writing. It seems that this problem is not big either. The big problem is that this replica set in Guangzhou is used as a full-volume data center (all business queries are here and only the master is checked, and the replica is not opened for reading), resulting in extremely high load and extremely high write-in of disk IO. The rhythm of dying at any time.
This service monitoring data delay is not allowed to exceed 1 minute, because monitoring is reported on a minute-by-minute basis. So I guess they only allowed to check master for this reason, but I checked the delay situation, which is very good and there is almost no delay.
So: I want to thoroughly optimize this thing, great gods, what should the plan be?
First of all, TokuMX is not MongoDB’s product. It was originally an independent branch of open source code based on MongoDB 2.4, maintained by TokuMX team (now acquired by Percona). However, the application of TokuMX is not very extensive (it is not intended to belittle peers), and personal understanding of it is limited. From the initial version to the latest (last year? ) TokuMX didn’t do much during Percona’s acquisition, but there was a new version based on MongoDB Storage API after the acquisition. Therefore, first find out which version you are using and upgrade it if necessary to avoid some basic bugs.
Then from MongoDB’s point of view, the above scene is a typical multi-center written, one-center read Geo Distributed MongoDB cluster. There are too many related contents that cannot be clearly explained here. The general principle is to put one slice in each center, and these slices are respectively copied to the centralized data center through a copy set, so that your Guangzhou computer room will have a full amount of data to query. The whole process is based on MongoDB’s own operating mechanism without using programs and interface calls. Details of this architecture are in MongoDB’s white paper:MongoDB Multi Datacenter DeploymentsThere is a detailed explanation in, suggest reading.
Finally, I would like to talk about the problem of high IO. According to the previous description, the data of multiple data centers are finally concentrated in Guangzhou, so its pressure is naturally the sum of Beijing and Guangzhou, and it is normal for it to be relatively high. As to whether it is too high to affect the performance, the above description does not provide any relevant indicators, so it cannot be judged. The usual step should be to optimize the query and write first. If the pressure still cannot meet the demand, consider fragmentation. Although it is inappropriate to copy data from other data centers to Guangzhou by program, it is not unacceptable. And a fait accompli, it has no obvious defects or optimization. If you are not so sensitive to the timeliness of the data, you can consider opening the copy for reading, but you should be aware of the excessive total pressure. If the pressure of each node exceeds 66.67%, you will lose high availability, that is, other nodes will crash together after losing one node.
Add: On the issue of pressure, because I don’t know how I got the IO on the graph, and I don’t know if it is what I understand. If it is MongoDB, I will use it
mongotopTake a look at the time spent reading and writing examples to determine whether the reading pressure is high or the writing pressure is high. If it is a Linux system, you can have a look.
iowaitDetermine how high the reading and writing pressure is before making any judgment. For MongoDB, if it takes more time to write it, plus
iowaitBig, that bottleneck is writing IO, there may not be much that can be done. If it is the CPU pressure caused by lack of index, it should be reading IO. In this case, it will be meaningful for you to optimize the index. Of course, I am looking at TokuMX now, and I can’t draw any conclusion about the situation.
Assuming that it is indeed a pressure problem caused by lack of indexes, it is necessary to build indexes. The safest and fastest way to build an index is as mentioned to you in the previous question. Remove each node (first from the back ruler) to build an index and add it back. The index will be built one by one in this way. The collection of 30G is not too big, so there is no problem in indexing. However, it is strongly recommended to use the above method of indexing one by one in a cluster under heavy pressure. Indexing will not only bring CPU pressure, but more importantly, it will traverse the entire set, destroying the hot data in your memory, and at the same time, it will greatly increase the disk IO. This cannot guarantee that adding indexes will not affect the online operation if the load is below 70%.