Mongo index building


Be careful when declaring indexes

Because this step is too easy, it is also easy to inadvertently trigger index building. If the data set is large, the construction will take a long time. In a production environment, this is a nightmare, because there is no way to stop indexing. If this happens, you will have to fail over to the slave node-if there is one. The wisest suggestion is to treat index construction as some kind of database migration to ensure that the application code does not automatically declare indexes.

The construction of index is divided into two steps.

The first step is to sort the values to be indexed. Sorted data sets are more efficient when inserted into the B-tree. Note that the progress of sorting is displayed in the ratio of the number of sorted documents to the total number of documents:

[conn1] building new index on { open: 1.0, close: 1.0 } for stocks.values
    1000000/4308303 23%
    2000000/4308303 46%
    3000000/4308303 69%
    4000000/4308303 92%
    Tue Jan 4 09:59:13 [conn1] external sort used : 5 files in 55 secs

In the second step, the sorted values are inserted into the index. The progress display method is the same as that in the first step. After completion, the time taken to complete index construction will be displayed as the time taken to insert system.indexes:

    1200300/4308303 27%
    2227900/4308303 51%
    2837100/4308303 65%
    3278100/4308303 76%
    3783300/4308303 87%
    4075500/4308303 94%
Tue Jan 4 10:00:16 [conn1] done building bottom layer, going to commit
Tue Jan 4 10:00:16 [conn1] done for 4308303 records 118.942secs

Also note lockType, which indicates that the index is built with a write lock, that is, other clients cannot read and write the database at this time. If it happens in a production environment, this is undoubtedly very bad, which is also the reason why the long-term index construction is maddening.

Background index

If it is in a production environment and cannot withstand such suspension of database access, you can specify to build indexes in the background. Although index building still uses write locks, the building task stops to allow other read and write operations to access the database. If the application uses MongoDB extensively, background indexing can degrade performance, but this is acceptable in some cases. For example, if you know that you can build the index within the time window with the lowest application traffic, then background indexing is a good choice.

To build the index in the background, you need to specify {background: true} when declaring the index. You can build the previous index in the background as follows:

db.values.ensureIndex({open: 1, close: 1}, {background: true})

Offline index

If the production data set is too large to complete the index in a few hours, then other schemes are needed. This usually involves taking a replica node offline, building an index on that node, and then synchronizing the data on it with the master node. Once data synchronization is completed, the node is promoted to the master node, and another slave node is taken offline to build its own index. This strategy assumes that your copy of oplog is large enough to avoid the data of offline nodes becoming too old during the index building process. The next chapter will discuss replication in detail and should help you plan this migration process.